除了@gung 的回答,我将尝试提供该anova
函数实际测试的示例。我希望这使您能够决定哪些测试适合您有兴趣测试的假设。
假设您有一个结果和 3 个预测变量:、和。现在,如果您的逻辑回归模型是. 运行时,该函数按顺序比较以下模型。这种类型也称为 I 型方差分析或 I 型平方和(有关不同类型的比较,请参阅这篇文章):yx1x2x3my.mod <- glm(y~x1+x2+x3, family="binomial")
anova(my.mod, test="Chisq")
glm(y~1, family="binomial")
对比glm(y~x1, family="binomial")
glm(y~x1, family="binomial")
对比glm(y~x1+x2, family="binomial")
glm(y~x1+x2, family="binomial")
对比glm(y~x1+x2+x3, family="binomial")
因此,它通过在每个步骤中添加一个变量来依次将较小的模型与下一个更复杂的模型进行比较。这些比较中的每一个都是通过似然比检验(LR 检验;参见下面的示例)完成的。据我所知,这些假设很少引起人们的兴趣,但这必须由您决定。
这是一个示例R
:
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
my.mod <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(my.mod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
rank2 -0.675443 0.316490 -2.134 0.032829 *
rank3 -1.340204 0.345306 -3.881 0.000104 ***
rank4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# The sequential analysis
anova(my.mod, test="Chisq")
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 399 499.98
gre 1 13.9204 398 486.06 0.0001907 ***
gpa 1 5.7122 397 480.34 0.0168478 *
rank 3 21.8265 394 458.52 7.088e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# We can make the comparisons by hand (adding a variable in each step)
# model only the intercept
mod1 <- glm(admit ~ 1, data = mydata, family = "binomial")
# model with intercept + gre
mod2 <- glm(admit ~ gre, data = mydata, family = "binomial")
# model with intercept + gre + gpa
mod3 <- glm(admit ~ gre + gpa, data = mydata, family = "binomial")
# model containing all variables (full model)
mod4 <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
anova(mod1, mod2, test="LRT")
Model 1: admit ~ 1
Model 2: admit ~ gre
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 399 499.98
2 398 486.06 1 13.92 0.0001907 ***
anova(mod2, mod3, test="LRT")
Model 1: admit ~ gre
Model 2: admit ~ gre + gpa
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 398 486.06
2 397 480.34 1 5.7122 0.01685 *
anova(mod3, mod4, test="LRT")
Model 1: admit ~ gre + gpa
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 397 480.34
2 394 458.52 3 21.826 7.088e-05 ***
这p- 输出中的值summary(my.mod)
是Wald 测试,用于测试以下假设(请注意,它们是可互换的,测试的顺序无关紧要):
- 对于系数
x1
:glm(y~x2+x3, family="binomial")
vs.
glm(y~x1+x2+x3, family="binomial")
- 对于系数
x2
:glm(y~x1+x3, family="binomial")
vs.glm(y~x1+x2+x3, family="binomial")
- 对于系数
x3
:glm(y~x1+x2, family="binomial")
vs.glm(y~x1+x2+x3, family="binomial")
所以每个系数都针对包含所有系数的完整模型。Wald 检验是似然比检验的近似值。我们还可以进行似然比检验(LR 检验)。方法如下:
mod1.2 <- glm(admit ~ gre + gpa, data = mydata, family = "binomial")
mod2.2 <- glm(admit ~ gre + rank, data = mydata, family = "binomial")
mod3.2 <- glm(admit ~ gpa + rank, data = mydata, family = "binomial")
anova(mod1.2, my.mod, test="LRT") # joint LR test for rank
Model 1: admit ~ gre + gpa
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 397 480.34
2 394 458.52 3 21.826 7.088e-05 ***
anova(mod2.2, my.mod, test="LRT") # LR test for gpa
Model 1: admit ~ gre + rank
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 395 464.53
2 394 458.52 1 6.0143 0.01419 *
anova(mod3.2, my.mod, test="LRT") # LR test for gre
Model 1: admit ~ gpa + rank
Model 2: admit ~ gre + gpa + rank
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 395 462.88
2 394 458.52 1 4.3578 0.03684 *
这p- 来自似然比检验的值与上述 Wald 检验获得的值非常相似summary(my.mod)
。
注意: 的第三个模型比较与下面示例中rank
的anova(my.mod, test="Chisq")
比较rank
(anova(mod1.2, my.mod, test="Chisq")
)相同。每次,p-值相同,7.088⋅10−5. 每次都是没有模型rank
与包含它的模型之间的比较。