机器算法验证 - 嵌套随机效应项的数学表示 - 吾爱随笔录

嵌套随机效应项的数学表示

机器算法验证 r lme4-nlme random-effects-model

2022-03-28 21:38:25

假设因变量的单元（水平）中的单元水平（水平 1）上测量，并且类型的单元嵌套在类型的水平（水平）中。 $y$ $A$ $2$ $A$ $B$ $3$

假设我符合以下公式：

y ~ "FIXED EFFECTS [my syntax]" + (1 + x | B/A)

其中是级别的某个预测变量。 $x$ $1$

我的理解是这样一个公式的数学表示如下。这是正确的吗？

在下文中，是嵌套在中的个数据点的输出。该数据点具有相应的预测变量。 $y_{b,a,i}$ $i$ $a$ $A$ $b$ $B$ $x_{b,a,i}$

y_{b, a, i} = “fixed effects'' + u_{b} + u_{b, 1, a} + (β_{b} + β_{b, 1, a}) x

$y_{b,a,i} = \text{“fixed effects''} + u_b + u_{b,1,a} + (\beta_b + \beta_{b,1,a})x$

where

u_{b} \sim N (0, σ_{B})

$u_b \sim N(0, \sigma_B)$

u_{b, 1, a} \sim N (0, σ)

$u_{b,1,a} \sim N(0, \sigma)$

β_{b} \sim N (0, ρ_{B})

$\beta_b \sim N(0, \rho_B)$

β_{b, a} \sim N (0, ρ)

$\beta_{b,a} \sim N(0, \rho)$

That is, $\sigma_B$ is a standard deviation term that varies across level $3$ . On the other hand, given any $b$ , a unit in level $3$ , and $a$ , a unit contained in level $2$ , then the standard deviation term for $a$ is $\sigma$ . That is, $\sigma$ is constant for any level $2$ units.

Is this correct (I based this reasoning by inferring from a related presentation on page 136 of Linear Mixed Models: A Practical Guide Using Statistical Software))? If this is correct, then is there any way to make $\sigma$ be dependent on which unit of level $A$ the data point belongs to.

1个回答

I think you are missing a random effect in your formula. Response $y_{iab}$ depends on the fixed effects + an error term with 5 components.

ε_{i a b} + ε_{a | b} + ε_{b} + x β_{a ∣ b} + x β_{b}

$\varepsilon_{iab} + \varepsilon_{a|b} + \varepsilon_{b} + x\beta_{a\mid b} + x\beta_b$

In order, from left to right, these components have the following interpretations:

The pure error (personal to each observation)
Variation due to different levels of A within a common B level
Variation due to different levels of B
How A affects the slope of the $x$ relationship given common level B
How level B affects the slope of $x$

You can't allow $\sigma$ to vary with the level of A, because the model would no longer be identifiable (too many parameters all doing the same job). Unless, the variation depends on known weights (like group counts) -- in that case, you would still have the same number of parameters. Remember that we don't know the values of the levels of A (or B), but we estimate them under the assumption of a fixed variance. We need to assume some kind of regularity here.

Edit: @Amoeba questions this and I may have been mistaken about the possibility of different values of the variance of the observations. I misread the OP's question, actually. I was thinking of the variance of the $\alpha$ hidden effects, and not the pure error of the individual observations. Since the A and B levels are random, presumably, the variances should be considered random effects also, which means that some sort of regularization should be applied in estimating them, as is the case with the random effects of the A and B levels themselves.

It gets worse. The value of the mixed effects model is that it allows you to form confidence intervals for untested situations (Levels of A and B not included in the model), so you would definitely need to place a distribution on the variances and adjust your confidence intervals accordingly. It sounds pretty ugly.

And for sure, you are going to need a lot of data for this to work well, since we are talking about estimating variances as well as means.

As for the Welch test, that's basically a kludge applied to what used to be called the Behrens-Fisher problem - the problem of testing for the difference of two means when the variances are unequal. If memory serves, the problem is that you don't have a sufficient statistic of fixed dimension on that one.

To me, the question is why that problem should even admit to a meaningful solution. What does it actually mean to compare means when the variances are unequal? Imagine two models of car. Cars from model A typically have a limited and predictable number of repairs each year. Cars from model B are sometimes lemons and sometimes superb. What does it mean to compare the average costs of ownership in this case? But that's what we're talking about when the variances of the levels are allowed to change. How much sense does it really make to compare means when the variances are allowed to vary? It suggests you are comparing apples and oranges.

Reference. Since you seem to be using R for this, you might want to read Bates and Pinheiro's book Mixed effects models in S-plus, since they wrote the code for R's nlme and lme4 packages. That book goes into all the details you could possibly need. They do allow for correlations among the observations with a common level.

其它你可能感兴趣的问题

上一篇mlr 与插入符号相比下一篇与其他模型相比，随机森林具有近乎完美的训练 AUC