机器算法验证 - 如何测试在使用 R 遗漏数据的训练样本上开发的逻辑回归模型？ - 吾爱随笔录

如何测试在使用 R 遗漏数据的训练样本上开发的逻辑回归模型？

机器算法验证 r 回归物流交叉验证

2022-03-14 23:36:37

我有 R 中逻辑回归输出的摘要。我使用训练数据来制作模型。

如何在遗漏的数据上测试基于训练数据开发的逻辑回归模型？

我天真的猜测是创建一个函数，然后通过它运行每个测试（甚至不知道如何拉它），但我必须想象有更好的方法。

2个回答

你可以使用predict()它。您需要将模型拟合到训练数据和测试组中的数据。使用type="response"，您将获得预测概率，默认为预测 logits。

# generate some data for a logistic regression, all observations
x    <- rnorm(100, 175, 7)                     # predictor variable
y    <- 0.4*x + 10 + rnorm(100, 0, 3)          # continuous predicted variable
yFac <- cut(y, breaks=c(-Inf, median(y), Inf), labels=c("lo", "hi"))    # median split
d    <- data.frame(yFac, x)                    # data frame

# now set aside training sample and corresponding test sample
idxTrn <- 1:70                                 # training sample
idxTst <- !(1:nrow(d) %in% idxTrn)             # test sample -> all remaining obs
# if idxTrn were a logical index vector, this would just be idxTst <- !idxTrn

# fit logistic regression only to training sample
fitTrn <- glm(yFac ~ x, family=binomial(link="logit"), data=d, subset=idxTrn)

# apply fitted model to test sample (predicted probabilities)
predTst <- predict(fitTrn, d[idxTst, ], type="response")

现在，您可以根据自己的喜好将预测概率与实际类别值进行比较。您可以设置对预测概率进行分类的阈值，并将实际与预测的类别成员资格进行比较。

> thresh  <- 0.5            # threshold for categorizing predicted probabilities
> predFac <- cut(predTst, breaks=c(-Inf, thresh, Inf), labels=c("lo", "hi"))
> cTab    <- table(yFac[idxTst], predFac, dnn=c("actual", "predicted"))
> addmargins(cTab)
      predicted
actual lo hi Sum
   lo  12  4  16
   hi   5  9  14
   Sum 17 13  30

请注意，提供给的数据框predict()需要与调用中使用的 df 具有相同的变量名称glm()，并且因子需要以相同的顺序具有相同的级别。如果您对 k 折交叉验证感兴趣，请查看cv.glm()package 中的函数boot。

您可能需要仔细查看caret包，它对这种类型的分析有很多支持。它的四个小插曲很好地概述了它如何为您提供帮助。

其它你可能感兴趣的问题

上一篇区域的时间趋势可视化下一篇了解统计控制图