机器算法验证 - 来自线性混合模型的条件类内相关（ICC）作为重测信度的证据？ - 吾爱随笔录

来自线性混合模型的条件类内相关（ICC）作为重测信度的证据？

机器算法验证混合模式多层次分析可靠性层次聚类类内相关

2022-04-05 13:41:17

在我的两个条件下（受试者间设计）的实验中，参与者完成了 3 次单项量表：（1）实验操作前，（2）实验操作后，（3）研究结束时。我想通过三个测量来测试我的单项量表的重测信度。

这里的问题是，已知实验操作会影响测量中的响应。我假设我需要控制操作的任何影响，所以我的猜测是使用线性混合模型（分层线性模型），测量值嵌套在参与者中，并将实验条件作为一个因素。这样，我可以从这个条件模型中得到一个类内相关性（ICC）。

是否有可能将此条件 ICC 解释为“在控制了实验效果后对重测信度的测量”？是否有任何现有的研究使用类似的方法？

2个回答

是的，您可以这样做并按照您的想法进行解释。我在 Sophia Rabe-Hesketh 和 Anders Skrondal 的 Multilevel and Longitudinal Modeling using Stata一书（第 1 卷）的第二章中读到了这种解释。

更详细的解释如下。编辑：我还添加了一个模拟来演示发生了什么。向 Ariel Muldoon 致敬，感谢他写了一篇有用的博客文章，帮助我创建了这个模拟。

在没有预测变量的随机截距模型中，我们得到两个方差，一个用于，即，另一个用于，即。

y_{i j} = β_{0} + u_{0 j} + ϵ_{i j}

$y_{ij} = \beta_0 + u_{0j} + \epsilon_{ij}$

u_{0 j}

$u_{0j}$

ψ

$\psi$

ϵ_{i j}

$\epsilon_{ij}$

θ

$\theta$

从这些我们可以将主体之间的依赖或可靠性（）表示为： $\rho$

ρ = \frac{ψ}{ψ + θ}

$\rho = \frac{\psi}{\psi+\theta}$

在这个等式中，是受试者真实分数的方差，而是测量误差方差，或测量的平方标准误差。由于重复测量而成为重测信度。 $\psi$ $\beta_0 + u_{0j}$ $\theta$ $\rho$

与 Pearson 相关系数相比，受测量的任何线性变换的影响，这可能包括从时间 1 到时间 2 的实践效果或实验引起的增加。因此，如果您知道数据中的某些内容会导致线性变化，您必须在混合模型中考虑它。 $\rho$

在您的情况下，您有一个随时间变化的实验操作（称为）。在您的随机截距模型中包含作为预测变量， $x_1$ $x_1$

y_{i j} = β_{0} + β_{1} x_{1} + u_{0 j} + ϵ_{i j}

$y_{ij} = \beta_0 + \beta_1x_1 +u_{0j} + \epsilon_{ij}$

和都有影响。这样做，得到的和估计值不再受的影响，并且你有一个对实验效果稳健的重测信度估计值。 $\psi$ $\theta$ $\psi$ $\theta$ $x_1$

模拟

set.seed(807)

npart=1000 # number of particpants
ntime=3    # numer of observations (timepoints) per participant
mu=2.5     # mean value on the Likert item
sdp=1      # standard deviation of participant random effect (variance==1)
sd=.7071   # standard deviation of within participant (residual; variance = .5)

participant = rep(rep(1:npart, each = nobs),ntime)  # creating 1000 participants w/ 3 repeats
participant = participant[order(participant)]
time = rep(rep(1:ntime, each=1),1000)    # creating a time variable

parteff = rnorm(npart, 0, sdp)     # drawing from normal for participant deviation
parteff = rep(parteff, each=ntime) # ensuring participant effect is same for three observations

timeeff = rnorm(npart*ntime, 0, sd) # drawing from normal for within-participant residual

dat=data.frame(participant, time, parteff, timeeff) # create data frame

dat$resp = with(dat, mu + parteff + timeeff ) # creating response for each individual

#Variance components model
library(lme4)

m1 <- lmer(resp ~ 1 + (1|participant), dat)
summary(m1) # estimates close to simulated values


Linear mixed model fit by REML ['lmerMod']
Formula: resp ~ 1 + (1 | participant)
   Data: dat

REML criterion at convergence: 8523.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.13381 -0.57238  0.01722  0.57846  2.84918 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 1.0110   1.0055  
 Residual                0.5314   0.7289  
Number of obs: 3000, groups:  participant, 1000

Fixed effects:
            Estimate Std. Error t value
(Intercept)  2.54142    0.03447   73.73


#Add treatment variable x1 which turns on at time 3
dat$trtmt = rep(c(0,0,1),1000)
b1 = .4 #average amount by which particpant's score increases b/c of treatment
x1 = runif(npart, .05, 1.5)


library(dplyr)
dat <- dat %>% mutate(resp2=case_when
                      (time==3 ~ (mu+b1*x1+parteff+timeeff),
                        TRUE ~ resp))
glimpse(dat)

#run m1 without covariate for trtmt
m2 <- lmer(resp2 ~ 1 + (1|participant), dat)
summary(m2)

Linear mixed model fit by REML ['lmerMod']
Formula: resp2 ~ 1 + (1 | participant)
   Data: dat

REML criterion at convergence: 8659.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.72238 -0.56861  0.01894  0.57177  3.10610 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 1.0070   1.0035  
 Residual                0.5669   0.7529  
Number of obs: 3000, groups:  participant, 1000

Fixed effects:
            Estimate Std. Error t value
(Intercept)  2.64169    0.03458   76.39


#add trtmt as a fixed effect predictor
m3 <- lmer(resp2 ~ 1 + trtmt + (1|participant), dat)
summary(m3)

Linear mixed model fit by REML ['lmerMod']
Formula: resp2 ~ 1 + trtmt + (1 | participant)
   Data: dat

REML criterion at convergence: 8546.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.06878 -0.57650  0.02712  0.57887  2.89709 

Random effects:
 Groups      Name        Variance Std.Dev.
 participant (Intercept) 1.0178   1.0088  
 Residual                0.5346   0.7311  
Number of obs: 3000, groups:  participant, 1000

Fixed effects:
            Estimate Std. Error t value
(Intercept)  2.53746    0.03585   70.78
trtmt        0.31270    0.02832   11.04

Correlation of Fixed Effects:
      (Intr)
trtmt -0.263

> texreg::screenreg(c(m1, m2, m3))

======================================================================
                              Model 1       Model 2       Model 3     
----------------------------------------------------------------------
(Intercept)                       2.54 ***      2.64 ***      2.54 ***
                                 (0.03)        (0.03)        (0.04)   
trtmt                                                         0.31 ***
                                                             (0.03)   
----------------------------------------------------------------------
AIC                            8529.83       8665.86       8554.72    
BIC                            8547.85       8683.88       8578.75    
Log Likelihood                -4261.92      -4329.93      -4273.36    
Num. obs.                      3000          3000          3000       
Num. groups: participant       1000          1000          1000       
Var: participant (Intercept)      1.01          1.01          1.02    
Var: Residual                     0.53          0.57          0.53    
======================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

这篇文章真的帮助了我，我想感谢你。如果其他用户遇到我遇到的相同问题 - 我将在上面的模拟中添加一些细微的变化。完全相同。没什么特别的 - 只是很高兴看到数字匹配 :) 此外，参与者向量中的轻微修正以使其工作。 $\rho$

干杯

日山

set.seed(807)

npart=1000 # number of particpants
ntime=2   # numer of observations (timepoints) per participant
mu=2.5     # mean value on the Likert item
sdp=1      # standard deviation of participant random effect (variance==1)
sd=.7071   # standard deviation of within participant (residual; variance = .5)

participant = rep(rep(1:npart, each = nobs),ntime)  # creating 1000 participants w/ 3 repeats
participant = participant[order(participant)]
time        = rep(rep(1:ntime, each=1),1000)        # creating a time variable

parteff = rnorm(npart, 0, sdp)     # drawing from normal for participant deviation
parteff = rep(parteff, each=ntime) # ensuring participant effect is same for three observations

timeeff = rnorm(npart*ntime, 0, sd) # drawing from normal for within-participant residual

dat=data.frame(participant, time, parteff, timeeff) # create data frame

dat$resp = with(dat, mu + parteff + timeeff ) # creating response for each individual

#Variance components model
library(lme4)

m1 <- lmer(resp ~ 1 + (1|participant), dat)
summary(m1) # estimates close to simulated values

#calculate pearson corr
library(reshape2)
df.wide   <-dcast(dat,participant~time,mean,value.var='resp')[,-1]
cor(df.wide)

#get the same from the HLM fit
print(VarCorr(m1))
.95478^2/(.95478^2+0.74685^2)
 ```

其它你可能感兴趣的问题

上一篇当数据生成过程是确定性的时不可能过拟合？下一篇显着的交互作用是否必然意味着至少一个简单的影响是显着的？