机器算法验证 - 在R中使用kmeans预测新对象的集群 - 吾爱随笔录

在R中使用kmeans预测新对象的集群

机器算法验证聚类

2022-03-28 06:37:45

我使用我的训练数据集使用 kmenas 函数拟合集群

fit <- kmeans(ca.data, 2);

如何使用 fit 对象来预测新数据集中的集群成员？

谢谢

4个回答

您的选择之一是使用包中的cl_predictclue（注意：我通过谷歌搜索“kmeans R predict”找到了这个）。

检查这个完整的答案。您需要的代码是：

clusters <- function(x, centers) {
  # compute squared euclidean distance from each sample to each cluster center
  tmp <- sapply(seq_len(nrow(x)),
                function(i) apply(centers, 1,
                                  function(v) sum((x[i, ]-v)^2)))
  max.col(-t(tmp))  # find index of min distance
}

# create a simple data set with two clusters
set.seed(1)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
x_new <- rbind(matrix(rnorm(10, sd = 0.3), ncol = 2),
               matrix(rnorm(10, mean = 1, sd = 0.3), ncol = 2))
colnames(x_new) <- c("x", "y")

cl <- kmeans(x, centers=2)

all.equal(cl[["cluster"]], clusters(x, cl[["centers"]]))
# [1] TRUE
clusters(x_new, cl[["centers"]])
# [1] 2 2 2 2 2 1 1 1 1 1

另一种选择是在将模型转换为他的类型后使用flexclust包中的 predict 方法。stats::kmeanskcca

您可以编写一个 S3 方法来预测新数据集的类。下面最小化平方和。它与其他predict函数一样使用：newdata应该将输入的结构与 kmeans 匹配，并且method参数应该像 forfitted.kmeans

predict.kmeans <- function(object,
                           newdata,
                           method = c("centers", "classes")) {
  method <- match.arg(method)

  centers <- object$centers
  ss_by_center <- apply(centers, 1, function(x) {
    colSums((t(newdata) - x) ^ 2)
  })
  best_clusters <- apply(ss_by_center, 1, which.min)
  
  if (method == "centers") {
    centers[best_clusters, ]
  } else {
    best_clusters
  }
}

我希望predict.kmeans在现有的stats命名空间中有一个。

其它你可能感兴趣的问题

上一篇快速浏览数据集下一篇为什么 ANOVA 不是 p-hacking？