机器算法验证 - 在为二项式 GLMM 制作分箱残差图时，是否应将随机效应包含在拟合值中？ - 吾爱随笔录

R arm 包中的分箱残差图通常被推荐作为检查逻辑模型是否存在任何系统错误的方法。一般的想法是，一组具有相似拟合值的观测值的平均残差应该接近于零。或者等效地，对于具有平均拟合值 p 的一组观察，响应 = 1 的组的比例应该大致为 p。

在计算 GLMM 的拟合值时，lme4 包提供了包括或排除随机效应的选项。出于分箱残差图的目的，我认为应该包括随机效应。但是，这样做会产生一个显示清晰正弦模式的图，而排除随机效应的图看起来要好得多。我检查过的所有数据集都可以看到相同的模式；下面包括两个可重复的示例。

在下面的两个示例中，LRT 表明，相对于等效 GLM，添加随机效应显着改善了模型。鉴于这种情况，为什么在拟合值中包含随机效应表明模型存在系统误差？

例子：

请注意，拟合值和残差都在响应范围内。

使用 lme4 包中包含的言语攻击数据

library(lme4)
library(arm)

data(VerbAgg, package = 'lme4')

verb_mod <- glmer(r2 ~ (Anger + Gender + btype + situ)^2 + (1|id) + (1|item), family = binomial, data = VerbAgg)

par(mfcol=c(1, 2))
binnedplot(predict(verb_mod, type="response", re.form=NULL), resid(verb_mod, type="response"), nclass=40, main='With random effects')
binnedplot(predict(verb_mod, type="response", re.form=NA), resid(verb_mod, type="response"), nclass=40, main='Without random effects')

使用 nlme 包中包含的 bdf 语言分数数据

library(nlme)

data(bdf, package = "nlme")
bdf <- subset(bdf, select = c(schoolNR, Minority, ses, repeatgr))
bdf$repeatgr[bdf$repeatgr == 2] <- 1

bdf_mod <- glmer(repeatgr ~ Minority + ses + ses * Minority + (1 | schoolNR), data = bdf, family = binomial(link = "logit"))

par(mfcol=c(1, 2))
binnedplot(predict(bdf_mod, type="response", re.form=NULL), resid(bdf_mod, type="response"), main='With random effects', nclass=20)
binnedplot(predict(bdf_mod, type="response", re.form=NA), resid(bdf_mod, type="response"), main='Without random effects', nclass=20)

链接到 bdf 教程 http://ase.tufts.edu/gsc/gradresources/guidetomixedmodelsinr/mixed%20model%20guide.html

在为二项式 GLMM 制作分箱残差图时，是否应将随机效​​应包含在拟合值中？

例子：

在为二项式 GLMM 制作分箱残差图时，是否应将随机效应包含在拟合值中？