과제 #16 K-means Clustering

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

배우고 느낀 것들

과제 #16 K-means Clustering 본문

머신러닝

과제 #16 K-means Clustering

낑깡H 2023. 1. 3. 16:02

#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#importing the Iris dataset with pandas
dataset = pd.read_csv('/content/iris.csv')

x = dataset.iloc[:, [0, 1, 2, 3]].values
#iloc : 행단위로 자료를 불러오되, 모든 행에 대해, [0,1,2,3] 열 값만 불러와라!

#클러스터 중 k-mean 분류의 최적이 되는 클러스터 개수 K 구하기

from sklearn.cluster import KMeans

wcss = []

for i in range(1, 11):
    kmeans = KMeans(n_clusters = i, max_iter = 300, n_init = 10, random_state = 0 )
    '''init: initial centroid를 몇번 샘플링한건지, 높을수록 안정화된 결과가 나옴
    max_iter: 반복 수행 횟수, k가 클 경우, 높여줘야 함 '''
    kmeans.fit(x)
    wcss.append(kmeans.inertia_)

init = 'k-means++' :

1.첫 점C1는 임의로 설정, ,

2. 이후의 점 C(t)는 C(t-1)과 거리가 멀도록 샘플링 확률을 조절하여 선택

3. k개를 고를 때까지 2단계를 반복

(+) :

(-) : 비슷한 점을 선택하지 않는다는 보장X

pairwise distance distribution 이 uniform distribution 에 가까울 시, 무의미하고 비싼 샘플링을 수행하는 셈

plt.plot(range(1,11), wcss)
plt.title('the elbow method')
plt.xlabel('클러스터 갯수')
plt.ylabel('WCSS')
plt.show()

elbow지점인 K가 3일 때 Kmeans의 수행결과

kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x) #그냥 fit 아님에 주의!

2차원 상에 시각화

#Visualising the clusters
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'purple', label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'orange', label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Iris-virginica')

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c = 'red', label = 'Centroids')

plt.legend()

# 3d scatterplot using matplotlib

fig = plt.figure(figsize = (15,15))
ax = fig.add_subplot(111, projection='3d')
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'purple', label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'orange', label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Iris-virginica')

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c = 'red', label = 'Centroids')
plt.show()

'머신러닝' 카테고리의 다른 글

과제#15 Visualizing Mnist (0)	2022.12.22
과제 # 12 Logistic Regression (0)	2022.11.29
[혼자공부하는머신러닝] 2장 (0)	2022.11.18
가공에 필요한 함수들 (0)	2022.10.29
명목형 변수를 이진형 변수로 바꾸기 (0)	2022.10.23

'머신러닝' Related Articles

Comments

배우고 느낀 것들

과제 #16 K-means Clustering 본문

과제 #16 K-means Clustering

'머신러닝' 카테고리의 다른 글

티스토리툴바