有人可以解释由sklearn.datasets.make_blobs(). 我没有理解它的含义,只在 sklearn 文档上找到了Generate isotropic Gaussian blobs for clustering 。我也经历了这个问题。
所以,这是我的疑问
from sklearn.datasets import make_blobs
# data set generate
X, y = make_blobs(n_samples = 100000, n_features = 2, centers = 2, random_state = 2, cluster_std = 1.5)
# scatter plot of blobs
plt.scatter(X[:, 0], X[:, 1], c = y, s = 50, cmap = 'RdBu')
# distribution of first feature
sns.histplot(x = X[:, 0], kde = True)
# distribuution of second feature
sns.histplot(x = X[ :, 1], kde = True, color = "green", alpha = 0.2 )
# overall distribution of values
sns.histplot(x = X.flatten(), color = "red", kde = True, alpha = .5)
这也不正常!
# Variance Covrariance Matrix of Features
np.cov(X[:, 0], X[:, 1])
输出
array([[ 3.55546911, 4.70526192],
[ 4.70526192, 19.00023664]])
这里的高斯实际上是什么意思!. 这可能是一个愚蠢的问题,所以提前道歉。




