机器算法验证 - R：anova() 与 Anova() 用于测试来自 glmer 或 glm.nb 对象的分类预测器 - 吾爱随笔录

在 R 中，我想知道在使用（广义线性混合效应模型；包）和（负二项式；包）函数比较嵌套模型拟合时，函数anova()（stats包）和Anova()（包）有何不同。carglmer()lme4glm.nbMASS

我发现对于泊松混合模型或负二项式固定效应模型（无随机效应）中的固定效应测试，这两个 ANOVA 函数不会产生相同的结果。两者的结果如下所示。

我的目标：正确测试多级分类预测器（固定；物种）的整体意义。我正在寻找 III 型 SS 型p值。

第一：如果使用拟合固定效应广义线性模型（此处为泊松）glm()，则这两个函数在给定参数的情况下会产生与以下虚拟示例相同的结果：

mod01 <- glm(Count ~ Species + offset(log(Area)), data=data01, family=poisson)

####################
# Anova() function #
####################

library(car)
Anova(mod01, type=3)

# Analysis of Deviance Table (Type III Wald chisquare tests)

# Response: Count
#         LR Chisq Df Pr(>Chisq)    
# Species   255.44  8  < 2.2e-16 ***

####################
# anova() function #
####################

mod01x <- update(mod01, . ~ . - Species)
anova(mod01x, mod01, test="Chisq")

# Model 1: Count ~ offset(log(Area))
# Model 2: Count ~ Species + offset(log(Area))

#   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
# 1      1063     1456.4                          
# 2      1055     1201.0  8   255.44 < 2.2e-16 ***

# Test statistics are the SAME (255.44) for the fixed effects model

但是：对于广义线性混合效应模型（对Groupglmer()使用随机效应），类似代码在两个函数中给出了不同的测试统计量：

library(lme4)
mod02 <- glmer(Count ~ 1 + Species + (1 | Group) + offset(log(Area)), data=data01, 
               family=poisson(link="log"), nAGQ=100)

####################
# Anova() function #
####################

Anova(mod02, type=3)

# Analysis of Deviance Table (Type III Wald chisquare tests)

# Response: Count
#                Chisq Df Pr(>Chisq)    
# (Intercept)   4.0029  1    0.04542 *  
# Species     197.9012  8    < 2e-16 ***

####################
# anova() function #
####################

mod02x <- update(mod02, . ~ . - Species)
anova(mod02x, mod02, test="Chisq")

# mod02x: Count ~ (1 | Group) + offset(log(Area))
# mod02: Count ~ 1 + Species + (1 | Group) + offset(log(Area))

#        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
# mod02x  2 1423.9 1433.8 -709.95   1419.9                             
# mod02  10 1191.7 1241.4 -585.85   1171.7 248.21      8  < 2.2e-16 ***

# Now the test statistics are DIFFERENT (197.9012 vs. 248.21)

#####################################################################

# Not a matter of type I vs. III SS since whether the fixed or random
# effect is fit first in the model does not affect results:

# List random effect (Group) before fixed (Species):

mod03 <- glmer(Count ~ 1 + (1 | Group) + Species + offset(log(Area)), data=data01, 
               family=poisson(link="log"), nAGQ=100)

####################
# Anova() function #
####################

Anova(mod03, type=3)

# Analysis of Deviance Table (Type III Wald chisquare tests)

# Response: Count
#                Chisq Df Pr(>Chisq)    
# (Intercept)   4.0029  1    0.04542 *  
# Species     197.9012  8    < 2e-16 ***

####################
# anova() function #
####################

mod03x <- update(mod03, . ~ . - Species)
anova(mod03x, mod03, test="Chisq")

# mod03x: Count ~ (1 | Group) + offset(log(Area))
# mod03: Count ~ 1 + (1 | Group) + Species + offset(log(Area))

#        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
# mod03x  2 1423.9 1433.8 -709.95   1419.9                             
# mod03  10 1191.7 1241.4 -585.85   1171.7 248.21      8  < 2.2e-16 ***

# Respective test statistics are the same as above case where order of fixed
# and random effects was reversed

另一个不一致的测试统计示例：固定效应负二项式模型：

library(MASS)
mod04 <- glm.nb(Count ~ Species + offset(log(Area)), data=data01)

####################
# Anova() function #
####################

Anova(mod04, type=3)

# Analysis of Deviance Table (Type III tests)

# Response: Spiders_Tree
#         LR Chisq Df Pr(>Chisq)    
# Species   101.08  8  < 2.2e-16 ***

####################
# anova() function #
####################

mod04x <- update(mod04, . ~ . - Species)
anova(mod04x, mod04)

# Likelihood ratio tests of Negative Binomial Models

# Response: Count
#                            Model     theta Resid. df  2 x log-lik.   Test df LR stat.       Pr(Chi)
# 1           offset(log(Area_M2)) 0.2164382      1063     -1500.688                      
# 2 Species + offset(log(Area_M2)) 0.3488095      1055     -1413.651 1 vs 2  8 87.03677  1.887379e-15 

# Test statistics are also DIFFERENT here (101.08 vs. 87.03677)

总结：问题：

不限于仅混合或仅固定效应模型
不是 I 型或 III 型 SS 的问题，因为只有一个预测变量的示例（负二项式固定效应模型）显示了相同的问题，即使在多个预测变量的情况下（混合模型示例），测试也是仅用于删除一个预测变量（Species），所以我相信这两种类型的 SS 在这种情况下应该是等价的。

它可能与偏移量有关吗？也许函数被编写为与函数“表现良好” glm()，但处理其他函数（例如glmer()and glm.nb()）不一致？还有什么我没有想到的？

我没有为上面的示例代码提供数据，因为我假设有人可以在没有最小工作示例的情况下评论每个函数的不同理论。但是，如果您想验证结果确实不同（如上所示），我将添加一个虚拟数据集。