数据挖掘 - 具有一个集群的 k 均值 - 吾爱随笔录

具有一个集群的 k 均值

数据挖掘 k-均值

2022-02-24 14:41:52

K-means 可能会给出不同的结果，因为质心的初始选择是随机的。

但是，如果我选择 k=1，算法是否总是提供与我数据的“重心”相同的答案？

1个回答

是的。质心将收敛到所有数据的中心，这将在一次迭代中发生。这是由于所有数据点都属于一个质心，因此它将立即根据所有这些实例居中。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Generate data
class1 = np.random.randn(1000, 2)

class2 = np.random.randn(1000, 2)
class2[:,0] = 4+class2[:,0]
class2[:,1] = 4+class2[:,1]

class3 = np.random.randn(1000, 2)
class3[:,0] = -4+class3[:,0]
class3[:,1] = 4+class3[:,1]

data = np.append( class1, class2, axis= 0)
data = np.append( data, class3, axis= 0)
print(data.shape)

# Plot the data
plt.scatter(data[:,0], data[:,1])
plt.show()

# Cluster
kmeans = KMeans(n_clusters=1, random_state=0, verbose = 1).fit(data)

# Plot clustered results
plt.scatter(data[:,0], data[:,1], c=kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], c = 'r')
plt.show()

# Show the cluster centers
print(kmeans.cluster_centers_)

初始化完成迭代 0，惯量 81470.055 迭代 1，惯量 48841.695 在迭代 1 处收敛：中心偏移 0.000000e+00 在公差 8.140283e-04 内

其它你可能感兴趣的问题

上一篇如何在 keras 中为 csv 使用 flow_from_directory 下一篇提高二元分类器的精度 - Python 中的决策树