混合效应模型中的重尾误差

机器算法验证混合模式残差正态假设重尾

2022-02-05 12:25:47

我对统计建模和“R”比较陌生，所以如果我应该提供任何进一步的信息/图表，请告诉我。我最初确实在这里发布了这个问题，但不幸的是还没有收到任何回复。

我正在使用来自in的lme()函数来测试重复测量设计的固定效应的重要性。我的实验涉及受试者听一对声音并调整声音级别（以分贝为单位），直到两者都同样响亮。这是针对 40 对不同的刺激完成的，并测试了两个顺序（A/B 和 B/A）。每个受试者总共有 160 次观察（这包括每个条件 1 次重复），总共有 14 名参与者。nlmeR

该模型：

LevelDifference ~ pairOfSounds*order, random = (1|Subject/pairOfSounds/order)

我已经通过 AIC/BIC 和似然比检验（方法 =“ML”）建立了模型。组内误差的残差图如下所示：

混合效应模型的残差图

左上图显示标准化残差与拟合值。我没有在残差中看到任何系统模式，因此我假设恒定变化假设是有效的，尽管对各个主题的残差进行进一步检查确实显示出一些不均匀性。结合右上角的情节，我没有理由怀疑非线性。

我主要关心的是左下方的 qqplot，它显示残差是重尾的。我不知道从这里去哪里。通过阅读 Pinheiro 和 Bates (2000, p. 180)，当尾部对称分布时，固定效应检验往往更加保守。所以，如果 p 值非常低，也许我没问题？

二级和三级随机效应显示出类似的偏离常态。

基本上：

这种重尾残差如何影响固定效应的推断？
稳健回归会是检查的合适替代方案吗？

2个回答

我接受了自己的建议，并用一些模拟数据进行了尝试。这不是一个完整的答案，但它是我硬盘上的内容。

我模拟数据...

...带有重尾残差（ $\frac{\epsilon}{10} \sim \text{Student}(\nu=1)$ ）。

如果使用拟合两个模型brms，一个带有高斯残差，一个带有学生残差。

m_gauss = brm(y ~ x + (x|subject), data=data, file='fits/m_gauss')
m_student = brm(y ~ x + (x|subject), data=data, family=student, 
                file='fits/m_student')

在高斯模型中，固定效应估计是合理的，但有噪声，（见下面模拟代码中的真实参数），但是sigma，残差的标准差估计为 60 左右。

summary(m_gauss)
# Family: gaussian 
# Links: mu = identity; sigma = identity 
# Formula: y ~ x + (x | subject) 
# Data: data (Number of observations: 100) 
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
# 
# Group-Level Effects: 
#   ~subject (Number of levels: 5) 
#                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)       67.72     21.63    38.27   123.54 1.00     1445     2154
# sd(x)               21.72      7.33    11.58    40.11 1.00     1477     2117
# cor(Intercept,x)    -0.18      0.32    -0.73     0.48 1.00     1608     1368
# 
# Population-Level Effects: 
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     2.90     19.65   -35.38    43.25 1.00     1959     2204
# x            -2.63      8.91   -19.36    16.19 1.00     1659     1678
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma    67.02      5.08    57.84    77.94 1.00     3790     3177

Student 模型给出了一致的参数估计，因为残差或多或少是对称的，但标准误差降低了。不过，我对标准误差的实际变化如此之小感到惊讶。它正确估计 $\sigma$ (10) 和 $\nu$ （自由度：1）为残差。

summary(m_student)
# Family: student
# Links: mu = identity; sigma = identity; nu = identity
# Formula: y ~ x + (x | subject)
# Data: data (Number of observations: 100)
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
# 
# Group-Level Effects:
#   ~subject (Number of levels: 5)
#                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)       57.54     18.26    33.76   103.21 1.00     1069     1677
# sd(x)               22.99      8.29    12.19    43.89 1.00     1292     1302
# cor(Intercept,x)    -0.20      0.31    -0.73     0.45 1.00     2532     2419
# 
# Population-Level Effects:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     2.26     17.61   -32.62    39.11 1.00     1733     1873
# x            -3.42      9.12   -20.50    15.38 1.00     1641     1263
# 
# Family Specific Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma    10.46      1.75     7.34    14.23 1.00     2473     2692
# nu        1.24      0.18     1.01     1.68 1.00     2213     1412

有趣的是，固定效应和随机效应的标准误差的减少导致后验预测均值精度的显着提高（即回归线位置的不确定性，如下所示）。

因此，当您的数据具有重尾残差时，使用假设高斯残差的模型会夸大您的固定效应以及其他参数的标准误差。这主要是因为如果假设残差为高斯分布，则它们必须来自具有巨大标准偏差的高斯分布，因此数据被视为非常嘈杂。

使用正确指定高斯重尾性质的模型（这也是非贝叶斯稳健回归所做的）在很大程度上解决了这个问题，即使单个参数的标准误差变化不大，累积效应在估计的回归线上是相当可观的！

方差齐性

值得注意的是，即使所有残差都来自同一个分布，重尾意味着一些组会有很多异常值（例如第 4 组），而另一些则不会（例如第 2 组）。这里的两个模型都假设残差在每组中具有相同的方差。这给高斯模型带来了额外的问题，因为它被迫得出结论，即使是数据接近回归线的第 2 组，也具有较高的残差方差，因此具有更大的不确定性。换句话说，如果没有使用稳健的重尾残差分布正确建模某些组中存在的异常值，即使没有异常值的组也会增加不确定性。

代码

library(tidyverse)
library(brms)
dir.create('fits')
theme_set(theme_classic(base_size = 18))

# Simulate some data
n_subj = 5
n_trials = 20
subj_intercepts = rnorm(n_subj, 0, 50) # Varying intercepts
subj_slopes     = rnorm(n_subj, 0, 20) # Varying slopes

data = data.frame(subject   =   rep(1:n_subj, each=n_trials),
                  intercept = rep(subj_intercepts, each=n_trials),
                  slope     = rep(subj_slopes, each=n_trials)) %>%
  mutate(
    x = rnorm(n(), 0, 10),
    yhat = intercept + x*slope)

residuals = rt(nrow(data), df=1) * 10
hist(residuals, breaks = 50)
data$y = data$yhat + residuals

ggplot(data, aes(x, y, color=factor(subject))) +
  geom_point() +
  stat_smooth(method='lm', se=T) +
  labs(x='x', y='y', color='Group') +
  geom_hline(linetype='dashed', yintercept=0)


m_gauss = brm(y ~ x + (x|subject), data=data, file='fits/m_gauss')
m_student = brm(y ~ x + (x|subject), data=data,
                family=student, file='fits/m_student')
summary(m_gauss)
summary(m_student)

fixef(m_gauss)
fixef(m_student)

pred_gauss = data.frame(fitted(m_gauss))
names(pred_gauss) = paste0('gauss_', c('b', 'se', 'low', 'high'))

pred_student = data.frame(fitted(m_student))
names(pred_student) = paste0('student_', c('b', 'se', 'low', 'high'))

pred_df = cbind(data, pred_gauss, pred_student) %>%
  arrange(subject, x)

ggplot(pred_df, aes(x, gauss_b, 
                    ymin=gauss_low, ymax=gauss_high,
                    color=factor(subject),
                    fill=factor(subject))) +
  geom_path() + geom_ribbon(alpha=.2) +
  labs(title='Gaussian Model', color='Subject', fill='Subject', 
       y='Estimates')


ggplot(pred_df, aes(x, student_b, 
                    ymin=student_low, ymax=student_high,
                    color=factor(subject),
                    fill=factor(subject))) +
  geom_path() + geom_ribbon(alpha=.2) +
  labs(title='Heavy-tailed (Student) Model', color='Subject', fill='Subject', 
       y='Estimates')

正如其他人所写，查看基于 t 分布的模型可能会有所帮助。然而，使用高斯假设的一个原因是高斯分布最小化了给定方差的 Fisher 信息。这意味着在已知分布的情况下，不能像其他分布的参数那样精确地估计高斯参数。从某种意义上说，这是一件好事，因为使用高斯分布通常是一种保守的选择，正如 Eoin 的回答中所显示的那样。考虑到我们不知道真实分布是什么，高斯模型表明的更大的不确定性实际上可能是现实的。

请注意，此论点在技术上不适用于 $t_1$ - 和 $t_2$ -分布，因为这些没有现有的差异。但是，您的残差看起来并不像这些残差看起来那样邪恶。

如果你运行回归，将高斯拟合到重尾分布（甚至 $t_1$ 或者 $t_2$ ) 通常会使置信区间和 p 值更大，因此如果 p 值较小，这些应该仍然是可靠的。在解释变量（“杠杆点”）中有异常值/重尾要危险得多，但据我了解你的实验，你没有这些。

其它你可能感兴趣的问题

上一篇BIC 是否试图找到一个真正的模型？下一篇如何准确计算深度 Q 学习损失函数？