数据挖掘 - 有没有办法在 k-Means Clustering 的集群之间放置一条单独的线？ - 吾爱随笔录

有没有办法在 k-Means Clustering 的集群之间放置一条单独的线？

数据挖掘机器学习 Python scikit-学习 matplotlib

2022-02-10 21:06:07

k-Means Clustering 是一种矢量量化方法，最初来自信号处理，在数据挖掘中流行用于聚类分析。

这是一段执行二维 k 均值聚类的代码

from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=300, centers=3,
                       cluster_std=1.1, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')

centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200);

输出一个数字（fig_1）

这个图（fig_2）来自维基

有没有办法将 fig_2 中显示的单独一行放在 fig_1 上？

1个回答

这个问题有两个答案。

第一个是肯定的，你可以用 python 代码来做。从Sklearn tuto中，您可以使用网格网格绘制决策边界：

# Step size of the mesh. Decrease to increase the quality of the VQ.
h = .02     # point in the mesh [x_min, x_max]x[y_min, y_max].

# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Obtain labels for each point in mesh. Use last trained model.
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1)
plt.clf()
plt.imshow(Z, interpolation='nearest',
       extent=(xx.min(), xx.max(), yy.min(), yy.max()),
       cmap=plt.cm.Paired,
       aspect='auto', origin='lower')

plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)
# Plot the centroids as a white X
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1],
        marker='x', s=169, linewidths=3,
        color='w', zorder=10)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()

还有第二个答案，分析答案：

决策边界是我们无法决定标签的一组点。对于 Kmeans，它是两个质心的等距点。一个小的计算表明这是一条线。要找到它的方程，您只需要两个质心的坐标。

之后，您找到这两者之间的线段的中间，找到该线段的正交向量并将其扔到中间，您就有了方程式。你只需要绘制它，瞧！编码这个不需要很长时间，如果你愿意，我会尝试更新答案。

其它你可能感兴趣的问题

上一篇Jupyterlab 内联交互式绘图下一篇在 Keras Tokenizer 类中， word_index 究竟表示什么？