下面我展示了两种在 R 和 SAS 中编写混合模型的等效方法。两个 R 模型以及两个 SAS 模型产生相同的随机效应和固定效应估计值以及相同的固定效应估计值的标准误差。但是两个 SAS 模型没有给出相同的固定效应置信区间。我的问题是:哪些是正确的置信区间?
以下是模拟数据:
library(mvtnorm)
I <- 3
J <- 6
K <- 5
n <- I*J*K
set.seed(666)
tube <- rep(1:J, each=I)
position <- rep(LETTERS[1:I], times=J)
Mu_i <- 3*(1:I)
Mu_ij <- c(t(rmvnorm(J, mean=Mu_i, sigma=diag(I)+2)) )
tube <- rep(tube, each=K)
position <- rep(position, each=K)
Mu_ij <- rep(Mu_ij, each=K)
dat <- data.frame(tube, position)
sigmaw <- 2
dat$y <- rnorm(n, Mu_ij, sigmaw)
dat$tube <- factor(dat$tube)
> str(dat)
'data.frame': 90 obs. of 3 variables:
$ tube : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ position: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 2 2 2 2 2 ...
$ y : num 2.76 5.5 2.54 1.56 6.46 ...
> head(dat)
tube position y
1 1 A 2.759443
2 1 A 5.496689
3 1 A 2.540150
4 1 A 1.558261
5 1 A 6.462050
6 1 B 4.239749
对应的nlme模型如下:
> # firstly set position C as the "intercept" for concordance with SAS
> dat$position <- relevel(dat$position, "C")
> # load nlme
> library(nlme)
> # fit the model
> ( fit1 <- lme(y ~ position, data=dat, random= list(tube = pdCompSymm(~ 0+position ))) )
Linear mixed-effects model fit by REML
Data: dat
Log-restricted-likelihood: -199.0196
Fixed: y ~ position
(Intercept) positionA positionB
8.526544 -4.800535 -3.322507
Random effects:
Formula: ~0 + position | tube
Structure: Compound Symmetry
StdDev Corr
positionC 1.892433
positionA 1.892433 0.082
positionB 1.892433 0.082 0.082
Residual 1.932750
Number of Observations: 90
Number of Groups: 6
该模型等效于具有混合效应的 2-way ANOVA(从某种意义上说,边际模型相同),更容易与 lme4 拟合:
> library(lme4)
> lmer(y ~ position + (1|tube) + (1|tube:position), data=dat)
Linear mixed model fit by REML
Formula: y ~ position + (1 | tube) + (1 | tube:position)
Data: dat
AIC BIC logLik deviance REMLdev
410 425 -199 402.5 398
Random effects:
Groups Name Variance Std.Dev.
tube:position (Intercept) 3.28587 1.81270
tube (Intercept) 0.29543 0.54354
Residual 3.73552 1.93275
Number of obs: 90, groups: tube:position, 18; tube, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 8.5265 0.8493 10.039
positionA -4.8005 1.1595 -4.140
positionB -3.3225 1.1595 -2.866
Correlation of Fixed Effects:
(Intr) postnA
positionA -0.683
positionB -0.683 0.500
下面我检查结果是否确实相同:
> # same standard erros
> sqrt(diag(fit1$varFix))
(Intercept) positionA positionB
0.8493533 1.1594505 1.1594505
> # the total variance in the second model is given in the first model:
> sqrt(1.81270^2+ 0.54354^2)
[1] 1.892437
出色地。现在这里是两个等效的 SAS 模型:
/* First model */
PROC MIXED DATA=dat ;
CLASS POSITION TUBE ;
MODEL y = POSITION ;
RANDOM POSITION / subject=TUBE type=CS ;
RUN; QUIT;
/* Second model */
PROC MIXED DATA=dat ;
CLASS POSITION TUBE ;
MODEL y = POSITION ;
RANDOM TUBE TUBE*POSITION ;
RUN; QUIT;
结果与 R 结果相同。但是 SAS 为固定效应分配了不同的自由度,因此给出了不同的置信区间,如下所示。
第一个模型给出了 5、10、10 的自由度:
Effect position Estimate StandardError DF Alpha Lower Upper
Intercept 8.5265 0.8494 5 0.05 6.3432 10.7099
position A -4.8005 1.1595 10 0.05 -7.384 -2.2171
position B -3.3225 1.1595 10 0.05 -5.9059 -0.7391
position C 0 . . . . .
而第二个模型给出的自由度为 15、15、15:
Effect position Estimate StandardError DF Alpha Lower Upper
Intercept 8.5265 0.8494 15 0.05 6.7162 10.3369
position A -4.8005 1.1595 15 0.05 -7.2718 -2.3292
position B -3.3225 1.1595 15 0.05 -5.7938 -0.8512
position C 0 . . . . .