机器算法验证 - R 中 predict.lm 预测的标准误差是多少？ - 吾爱随笔录

在 R 中，?predict说：

如果逻辑 se.fit 为 TRUE，则计算预测的标准误差。

一个例子：

> predict(lm(mpg ~ wt + cyl, data = mtcars), se.fit=TRUE)$se.fit
 [1] 0.6011667 0.4976294 0.7252444 0.4602669 0.7752706 0.5178496 0.7267482 1.0000172 0.9793969
[10] 0.5108741 0.5108741 0.6544576 0.6819424 0.6718159 1.1525645 1.2633704 1.2125441 0.7270859
[19] 0.8820281 0.7988791 0.7380797 0.7442464 0.7773252 0.6623197 0.6616629 0.7700721 0.7322422
[28] 0.9282190 0.9023791 0.5342815 0.7267482 0.8176667

这些标准误差是如何定义的？它们是如何计算的？

我查看了predict.lm代码，它有很多分支。代码的摘录类似于：

ip[, i] <- if (any(iipiv > 0L)) 
            as.matrix(X[, iipiv, drop = FALSE] %*% Rinv[ii, 
                                                        , drop = FALSE])^2 %*% rep.int(res.var, 
                                                                                       p)
# ... se later returned as "se.fit"
se <- sqrt(ip)

Wikipedia定义了 beta 的标准误差（一维预测变量的斜率），但没有定义预测的标准误差。

如何se.fit定义，使用一些标准符号？

相关（或相同？），它是如何计算的？

# model m = lm(mpg ~ wt, data = mtcars) # model summary summary(m) # residual degrees of freedom summary(m)$df[2] # 95% confidence interval for the mean value of mpg for ALL cars # represented by the ones in the mtcars dataset which have # wt = 3.2 p <- predict(m, newdata = data.frame(wt = 3.2), se.fit=TRUE, interval="confidence") p # reported interval (lwr, upr) SHOULD be constructed as # fit +/- t(alpha/2,n-2)*se.fit # where t(alpha/2,n-2) is a critical value from the # t distribution with n-2 degrees of freedom and # alpha = 0.05 # compute the critical value crit <- qt(p= 0.05/2, df = summary(m)$df[2], lower.tail = FALSE) crit # check that computing the half-length of the interval and # dividing it by the critical value gives the same result # as that reported by se.fit (p$fit[,"upr"] - p$fit[,"fit"])/crit