机器算法验证 - R：如何解释 QQplot 的异常值？ - 吾爱随笔录

R：如何解释 QQplot 的异常值？

机器算法验证 r 数据可视化异常值 QQ图

2022-03-23 04:48:08

在 R (QQplot) 中绘制以下内容时如何解释带有异常值的标签

set.seed(1)
y <- rnorm(100)
x <- rnorm(100)
plot(lm(y ~ x), which=2)   # which = 2 gives the plot

它在顶部给出了一个数字 61。它是什么？

我想这可能是离群夫妇的指数。它似乎与大约y = 3和的分数有关x = 3。但当：

cbind(y,x)[61,]

>  y         x 
2.4016178 0.4251004

如何在 R 的 QQplot 中读取这些数字？

1个回答

图中的数字对应于标准化残差和原始数据的索引。默认情况下，R标记三个最极端的残差，即使它们与 QQ 线的偏差不大。因此，标记点的事实并不意味着拟合不好或其他任何事情。可以通过指定选项来更改此行为id.n。让我用你的例子来说明这一点

set.seed(1)
y <- rnorm(100)
x <- rnorm(100)
lm.mod <- lm(y ~ x) # linear regression model
plot(lm.mod, which=2) # QQ-Plot
lm.resid <- residuals(lm(y ~ x)) # save the residuals
sort(abs(lm.resid), decreasing=TRUE) # sort the absolute values of the residals
        14         61         24
2.32415869 2.29316200 2.09837122

前三个最极端的残差是数字 14、61 和 24。这些是图中的数字。这些索引对应于原始数据的索引。因此，数据点 14、24 和 26 是导致最极端残差的数据点。我们还可以在散点图中标记它们（蓝点）。请注意，由于您是独立生成的y，x因此回归线只是y没有任何斜率的平均值：

# The original data points corresponding to the 3 most extreme residuals

cbind(x,y)[c(14, 24, 61), ]
             x         y
[1,] -0.6506964 -2.214700
[2,] -0.1795565 -1.989352
[3,]  0.4251004  2.401618

# Make a scatterplot of the original data and mark the three points
# and add the residuals

par(bg="white", cex=1.6)
plot(y~x, pch=16, las=1)
abline(lm.mod, lwd=2) # add regression line
pre <- predict(lm.mod)

# Add the residual lines
segments(x[c(14, 24, 61)], y[c(14, 24, 61)], x[c(14, 24, 61)], 
         pre[c(14, 24, 61)], col="red", lwd=2)

# Add the points
points(x[c(14, 24, 61)], y[c(14, 24, 61)], pch=16, cex=1.1, col="steelblue", las=1)

其它你可能感兴趣的问题

上一篇偏斜但钟形仍被视为方差分析的正态分布？下一篇您何时使用 AIC 与 BIC