我已经模拟了三个具有不同均值和标准差的正态分布。假设这是三组主题。我想看看正常lm()
与brm()
回归相比如何。模型是Score ~ Group + (+1|Subject)
. 我没有故意按组添加随机拦截。
set.seed(2)
df = data.frame(Group = as.factor(rep(c("A", "B","C"), each = 120)),
Subject = rep(paste("subject", seq(1, 9), sep = "_"),
each = 40),
Score = c(rnorm(120, 5, 2), rnorm(120, 7, 4), rnorm(120, 9, 6)))
事实证明,结果完全不同。B 组和 A 组在 中显着不同lm()
,但它们的 95% HDI 通常在输出中包含零brm()
(我知道每次模型对后验进行采样时结果都会有所不同,但底线是模型中t value
的lm()
2.544,而中的 HDIbrm()
会让我保持怀疑。
我的问题是:为什么会这样?换句话说,这两个模型如何比较re。他们对异方差数据的处理?
每个模型的输出
频率论者
Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ Group + (1 | Subject)
Data: df
REML criterion at convergence: 2057.9
Scaled residuals:
Min 1Q Median 3Q Max
-4.1237 -0.6125 -0.0159 0.5968 3.8714
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.4598 0.6781
Residual 17.7157 4.2090
Number of obs: 360, groups: Subject, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.0640 0.5485 9.232
GroupB 1.9735 0.7757 2.544
GroupC 5.4684 0.7757 7.049
Correlation of Fixed Effects:
(Intr) GroupB
GroupB -0.707
GroupC -0.707 0.500
贝叶斯
在这个特定的运行中,HDI 不包括零,但您可以看到它非常接近它(与上面模型中的 t 值不同)。
Family: gaussian(identity)
Formula: Score ~ Group + (1 | Subject)
Data: df (Number of observations: 360)
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup samples = 8000
ICs: LOO = NA; WAIC = NA; R2 = NA
Group-Level Effects:
~Subject (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 0.84 0.53 0.07 2.06 1457 1
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 5.07 0.67 3.76 6.48 2602 1
GroupB 1.97 0.94 0.01 3.90 2744 1
GroupC 5.44 0.95 3.46 7.27 2851 1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma 4.23 0.16 3.92 4.56 6732 1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).