如何用 R 比较两组对每个人的多次测量?

机器算法验证 r 统计学意义 t检验 错误传播
2022-03-20 06:24:55

我有如下问题:

1) 每个个体有六个测量值,具有较大的主体内差异

2)有两组(治疗和控制)

3) 每组由 5 个人组成

4)我想对两组进行显着性检验,以了解组均值是否彼此不同。

数据如下所示: http://s10.postimg.org/p9krg6f3t/examp.png

我已经使用此代码运行了一些模拟,该代码进行 t 测试以比较组均值。组均值是通过取单个均值的均值来计算的。这忽略了主体内的可变性

 n.simulations<-10000
    pvals=matrix(nrow=n.simulations,ncol=1)
    for(k in 1:n.simulations){
      subject=NULL
      for(i in 1:10){
        subject<-rbind(subject,as.matrix(rep(i,6)))
      }
      #set.seed(42)

      #Sample Subject Means
      subject.means<-rnorm(10,100,2)

      #Sample Individual Measurements
      values=NULL
      for(sm in subject.means){
        values<-rbind(values,as.matrix(rnorm(6,sm,20)))
      }

      out<-cbind(subject,values)

      #Split into GroupA and GroupB
      GroupA<-out[1:30,]
      GroupB<-out[31:60,]

      #Add effect size to GroupA
      GroupA[,2]<-GroupA[,2]+0

      colnames(GroupA)<-c("Subject", "Value")
      colnames(GroupB)<-c("Subject", "Value")

      #Calculate Individual Means and SDS
      GroupA.summary=matrix(nrow=length(unique(GroupA[,1])), ncol=2)
      for(i in 1:length(unique(GroupA[,1]))){
        GroupA.summary[i,1]<-mean(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
        GroupA.summary[i,2]<-sd(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
      }
      colnames(GroupA.summary)<-c("Mean","SD")


      GroupB.summary=matrix(nrow=length(unique(GroupB[,1])), ncol=2)
      for(i in 1:length(unique(GroupB[,1]))){
        GroupB.summary[i,1]<-mean(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
        GroupB.summary[i,2]<-sd(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
      }
      colnames(GroupB.summary)<-c("Mean","SD")

      Summary<-rbind(cbind(1,GroupA.summary),cbind(2,GroupB.summary))
      colnames(Summary)[1]<-"Group"

      pvals[k]<-t.test(GroupA.summary[,1],GroupB.summary[,1], var.equal=T)$p.value
    }

这是绘图的代码:

#Plots
par(mfrow=c(2,2))
boxplot(GroupA[,2]~GroupA[,1], col="Red", main="Group A", 
        ylim=c(.9*min(out[,2]),1.1*max(out[,2])),
        xlab="Subject", ylab="Value")
stripchart(GroupA[,2]~GroupA[,1], vert=T, pch=16, add=T)
#abline(h=mean(GroupA[,2]), lty=2, lwd=3)

for(i in 1:length(unique(GroupA[,1]))){
  m<-mean(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
  ci<-t.test(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])$conf.int[1:2]

  points(i-.2,m, pch=15,cex=1.5, col="Grey")
  segments(i-.2,
           ci[1],i-.2,
           ci[2], lwd=4, col="Grey"
  )
}
legend("topleft", legend=c("Individual Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


boxplot(GroupB[,2]~GroupB[,1], col="Light Blue", main="Group B", 
        ylim=c(.9*min(out[,2]),1.1*max(out[,2])),
        xlab="Subject", ylab="Value")
stripchart(GroupB[,2]~GroupB[,1], vert=T, pch=16, add=T)
#abline(h=mean(GroupB[,2]), lty=2, lwd=3)

for(i in 1:length(unique(GroupB[,1]))){
  m<-mean(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
  ci<-t.test(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])$conf.int[1:2]

  points(i-.2,m, pch=15,cex=1.5, col="Grey")
  segments(i-.2,
           ci[1],i-.2,
           ci[2], lwd=4, col="Grey"
  )
}
legend("topleft", legend=c("Individual Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


boxplot(Summary[,2]~Summary[,1], col=c("Red","Light Blue"), xlab="Group", ylab="Average Value",
        ylim=c(.9*min(Summary[,2]),1.1*max(Summary[,2])),
        main="Individual Averages")
stripchart(Summary[,2]~Summary[,1], vert=T, pch=16, add=T)

points(.9, mean(GroupA.summary[,1]), pch=15,cex=1.5, col="Grey")
segments(.9,
         t.test(GroupA.summary[,1])$conf.int[1],.9,
         t.test(GroupA.summary[,1])$conf.int[2], lwd=4, col="Grey"
)

points(1.9, mean(GroupB.summary[,1]), pch=15,cex=1.5, col="Grey")
segments(1.9,
         t.test(GroupB.summary[,1])$conf.int[1],1.9,
         t.test(GroupB.summary[,1])$conf.int[2], lwd=4, col="Grey"
)
legend("topleft", legend=c("Group Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


hist(pvals, breaks=seq(0,1,by=.05), col="Grey",
     main=c(paste("# sims=", n.simulations),
            paste("% Sig p-values=",100*length(which(pvals<0.05))/length(pvals)))
)

现在,在我看来,因为每个单独的均值本身就是一个估计值,所以我们对组均值的确定性应该低于上图中左下方面板所示的 95% 置信区间所示。因此,计算的 p 值低估了真实的可变性,如果我们希望推断未来的数据,应该会导致假阳性增加。

那么分析这些数据的正确方法是什么?

奖金:

上面的例子是一个简化。对于实际数据:

1) 受试者内方差与均值正相关。

2) 值只能是 2 的倍数。

3)个别结果不是大致正态分布。它们受到零地板效应的影响,并且在正端有长尾。

4) 每组的科目数不一定相等。

以前的文献使用 t 检验忽略了受试者内部的变异性和其他细微差别,就像对上述模拟所做的那样。这些结果可靠吗?如果我可以从图中提取一些平均值和标准误差,我将如何计算“正确”的 p 值。

编辑:

好的,这是实际数据的样子。还有三组而不是两组:

在此处输入图像描述

dput() 的数据:

structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 
6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 
10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 
12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 
15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 
18, 18, 18, 18, 18, 2, 0, 16, 2, 16, 2, 8, 10, 8, 6, 4, 4, 8, 
22, 12, 24, 16, 8, 24, 22, 6, 10, 10, 14, 8, 18, 8, 14, 8, 20, 
6, 16, 6, 6, 16, 4, 2, 14, 12, 10, 4, 10, 10, 8, 4, 10, 16, 16, 
2, 8, 4, 0, 0, 2, 16, 10, 16, 12, 14, 12, 8, 10, 12, 8, 14, 8, 
12, 20, 8, 14, 2, 4, 8, 16, 10, 14, 8, 14, 12, 8, 14, 4, 8, 8, 
10, 4, 8, 20, 8, 12, 12, 22, 14, 12, 26, 32, 22, 10, 16, 26, 
20, 12, 16, 20, 18, 8, 10, 26), .Dim = c(108L, 3L), .Dimnames = list(
    NULL, c("Group", "Subject", "Value")))

编辑2:

回应 Henrik 的回答:因此,如果我改为对单个平均值执行 anova,然后执行 TukeyHSD 程序,如下所示,我可以将其解释为将我的 p 值低估了大约 3-4 倍?

我对这部分问题的目标是了解我作为期刊文章的读者,如何根据他们选择的分析方法更好地解释以前的结果。例如,他们有那些“权威之星”向我显示 0.01>p>.001。因此,如果我接受 0.05 作为合理的截止值,我应该接受他们的解释吗?唯一的附加信息是平均值和 SEM。

#Get Invidual Means
summary=NULL
for(i in unique(dat[,2])){
sub<-which(dat[,2]==i)
summary<-rbind(summary,cbind(
dat[sub,1][3],
dat[sub,2][4],
mean(dat[sub,3]),
sd(dat[sub,3])
)
)
}
colnames(summary)<-c("Group","Subject","Mean","SD")

TukeyHSD(aov(summary[,3]~as.factor(summary[,1])+ (1|summary[,2])))

#      Tukey multiple comparisons of means
#        95% family-wise confidence level
#    
#    Fit: aov(formula = summary[, 3] ~ as.factor(summary[, 1]) + (1 | summary[, 2]))
#    
#    $`as.factor(summary[, 1])`
#             diff       lwr       upr     p adj
#    2-1 -0.672619 -4.943205  3.597967 0.9124024
#    3-1  7.507937  1.813822 13.202051 0.0098935
#    3-2  8.180556  2.594226 13.766885 0.0046312

编辑3: 我认为我们正在接近我的理解。这是@Stephane 的评论中描述的模拟:

#Get Subject Means
means<-aggregate(Value~Group+Subject, data=dat, FUN=mean)

#Initialize "dat2" dataframe
dat2<-dat

#Initialize within-Subject sd
s<-.001
pvals=matrix(nrow=10000,ncol=2)

for(j in 1:10000){
#Sample individual measurements for each subject
temp=NULL
for(i in 1:nrow(means)){
temp<-c(temp,rnorm(6,means[i,3], s))
}

#Set new values
dat2[,3]<-temp

#Take means of sampled values and fit to model
dd2 <- aggregate(Value~Group+Subject, data=dat2, FUN=mean)
fit2 <- lm(Value~Group, data=dd2)

#Save sd and pvalue
pvals[j,]<-cbind(s,anova(fit2)[[5]][5])

#Update sd
s<-s+.001
}

plot(pvals[,1],pvals[,2], xlab="Within-Subject SD", ylab="P-value")

在此处输入图像描述

3个回答

我可以自由地回答标题中的问题,我将如何分析这些数据。

鉴于我们在样本中有重复,立即想到混合模型,它应该估计每个个体的变异性并对其进行控制。

lmer因此,我使用from拟合模型lme4然而,由于我们对 p 值感兴趣,我使用mixedwhichafex通过pbkrtest(即,自由度的 Kenward-Rogers 近似)获得这些值。(afex 也已经设置了contr.sum我在这种情况下使用的对比)

为了控制零底效应(即正偏斜),我拟合了两个替代版本,将因变量转换sqrt为轻度偏斜和log更强偏斜。

require(afex)

# read the dput() in as dat <- ...    
dat <- as.data.frame(dat)
dat$Group <- factor(dat$Group)
dat$Subject <- factor(dat$Subject)

(model <- mixed(Value ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 237.730   1  15         1  0.0000
## 2       Group   7.749   2  15         1  0.0049

(model.s <- mixed(sqrt(Value) ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 418.293   1  15         1  0.0000
## 2       Group   4.121   2  15         1  0.0375

(model.l <- mixed(log1p(Value) ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 458.650   1  15         1  0.0000
## 2       Group   2.721   2  15         1  0.0981

对于未转换和sqrtdv,效果显着。但是这些模型合理吗?让我们绘制残差。

png("qq.png", 800, 300, units = "px", pointsize = 12)
par(mfrow = c(1, 3))
par(cex = 1.1)
par(mar = c(2, 2, 2, 1)+0.1)
qqnorm(resid(model[[2]]), main = "original")
qqline(resid(model[[2]]))
qqnorm(resid(model.s[[2]]), main = "sqrt")
qqline(resid(model.s[[2]]))
qqnorm(resid(model.l[[2]]), main = "log")
qqline(resid(model.l[[2]]))
dev.off()

在此处输入图像描述

似乎具有转换的模型sqrt提供了合理的拟合(似乎仍然存在一个异常值,但我将忽略它)。因此,让我们进一步检查这个模型,multcomp以获取组间的比较:

require(multcomp)

# using bonferroni-holm correction of multiple comparison
summary(glht(model.s[[2]], linfct = mcp(Group = "Tukey")), test = adjusted("holm"))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = sqrt(Value) ~ Group + (1 | Subject), data = data)
## 
## Linear Hypotheses:
##            Estimate Std. Error z value Pr(>|z|)  
## 2 - 1 == 0  -0.0754     0.3314   -0.23    0.820  
## 3 - 1 == 0   1.1189     0.4419    2.53    0.023 *
## 3 - 2 == 0   1.1943     0.4335    2.75    0.018 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## (Adjusted p values reported -- holm method)

# using default multiple comparison correction (which I don't understand)
summary(glht(model.s[[2]], linfct = mcp(Group = "Tukey")))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = sqrt(Value) ~ Group + (1 | Subject), data = data)
## 
## Linear Hypotheses:
##            Estimate Std. Error z value Pr(>|z|)  
## 2 - 1 == 0  -0.0754     0.3314   -0.23    0.972  
## 3 - 1 == 0   1.1189     0.4419    2.53    0.030 *
## 3 - 2 == 0   1.1943     0.4335    2.75    0.016 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## (Adjusted p values reported -- single-step method)

妙语:第 3 组与其他两组不同,彼此之间没有区别。

有关信息,@Henrik 给出的随机效应模型:

> f <- function(x) sqrt(x)
> library(lme4)
> ( fit1 <- lmer(f(Value) ~ Group + (1|Subject), data=dat) )
Linear mixed model fit by REML ['lmerMod']
Formula: f(Value) ~ Group + (1 | Subject) 
   Data: dat 
REML criterion at convergence: 296.3579 
Random effects:
 Groups   Name        Std.Dev.
 Subject  (Intercept) 0.5336  
 Residual             0.8673  
Number of obs: 108, groups: Subject, 18
Fixed Effects:
(Intercept)       Group2       Group3  
    3.03718     -0.07541      1.11886  

等效于具有可交换相关结构的广义最小二乘模型:

> library(nlme)
> fit2 <-  gls(f(Value) ~ Group, data=dat, na.action=na.omit, correlation=corCompSymm(form= ~  1 | Subject)) 

拟合的方差矩阵为:

> getVarCov(fit2)
Marginal variance covariance matrix
        [,1]    [,2]    [,3]    [,4]    [,5]    [,6]
[1,] 1.03690 0.28471 0.28471 0.28471 0.28471 0.28471
[2,] 0.28471 1.03690 0.28471 0.28471 0.28471 0.28471
[3,] 0.28471 0.28471 1.03690 0.28471 0.28471 0.28471
[4,] 0.28471 0.28471 0.28471 1.03690 0.28471 0.28471
[5,] 0.28471 0.28471 0.28471 0.28471 1.03690 0.28471
[6,] 0.28471 0.28471 0.28471 0.28471 0.28471 1.03690
  Standard Deviations: 1.0183 1.0183 1.0183 1.0183 1.0183 1.0183 

如您所见,对角线条目对应于第一个模型中的总方差:

> VarCorr(fit1)
 Groups   Name        Std.Dev.
 Subject  (Intercept) 0.53358 
 Residual             0.86731 
> 0.53358^2+0.86731^2
[1] 1.036934

协方差对应于主体间方差:

> 0.53358^2
[1] 0.2847076

实际上 gls 模型更通用,因为它允许负协方差。的优点nlme是您可以更普遍地使用其他重复相关结构,并且您可以使用参数指定每组的不同方差weights

我认为残差是不同的,因为它们是用第一个模型中的随机效应构建的。为了获得多重比较,您可以使用lsmeansmultcomp包,但是p- 假设检验的值是反保守的,具有默认(太高)自由度。不幸的是,该pbkrtest包不适用于gls/lme模型。

现在,试着写下模型:yijk=...在哪里yijk是个k-个人的价值j组的i. 然后看看手段发生了什么y¯ij:你得到一个经典的高斯线性模型,具有方差同质性,因为有6对每个主题重复测量:

> xtabs(~Group+Subject, data=dat)
     Subject
Group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
    1 6 6 6 6 6 6 6 0 0  0  0  0  0  0  0  0  0  0
    2 0 0 0 0 0 0 0 6 6  6  6  6  6  6  6  0  0  0
    3 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  6  6  6

因此,由于您只对均值比较感兴趣,因此您无需求助于随机效应或广义最小二乘模型- 只需使用经典(固定效应)模型y¯ij作为观察:

tdat <- transform(dat, tvalue=f(Value))
dd <- aggregate(tvalue~Group+Subject, data=tdat, FUN=mean)
fit3 <- lm(tvalue~Group, data=dd)

我认为当我们在随机效应的水平上平均数据时,这种方法总是正确的(我在我的博客上展示了对于一个具有固定效应的示例来说这是如何失败的)。

ANOVA 提供的答案与@Henrik 的方法相同(这表明 Kenward-Rogers 近似是正确的):

> anova(fit3)
Analysis of Variance Table

Response: tvalue
          Df Sum Sq Mean Sq F value  Pr(>F)  
Group      2 3.3799 1.68994   4.121 0.03747 *

然后你可以使用TukeyHSD()orlsmeans包进行多重比较:

> TukeyHSD(aov(fit3), "Group")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = fit3)

$Group
           diff         lwr       upr     p adj
2-1 -0.07541248 -0.93627828 0.7854533 0.9719148
3-1  1.11885667 -0.02896441 2.2666777 0.0565628
3-2  1.19426915  0.06817536 2.3203629 0.0370434

> library(lsmeans)
> lsmeans(fit3, pairwise~Group)

$`Group pairwise differences`
         estimate        SE df  t.ratio p.value
1 - 2  0.07541248 0.3314247 15  0.22754 0.97191
1 - 3 -1.11885667 0.4418996 15 -2.53193 0.05656
2 - 3 -1.19426915 0.4335348 15 -2.75472 0.03704
    p values are adjusted using the tukey method for 3 means