机器算法验证 - 在引起问题之前，异方差性有多严重？ - 吾爱随笔录

在引起问题之前，异方差性有多严重？

机器算法验证回归异方差假设类型 i 和 ii 错误

2022-03-20 05:28:44

我有两个关于多元回归中的异方差性的问题。

根据我值得信赖的教科书（Using Multivariate Statistics 2007, p.127），它说与异方差性的偏差只会降低测试的统计能力，而不是夸大 I 类错误率（这是真的吗？）
我想知道是否有关于如何判断异质性效应大小的指导方针，以及对它有多大影响的坏效应大小（N = 187）。因为我使用了两个分类变量，幸运的是我的残差/预测图位于两个不同的块中，我可以分析（见下文）：

多元回归，三个预测变量，两个分类预测变量。 N=187

1个回答

确实，异方差会降低您的功效（请参阅：具有异方差的 beta 估计的效率），但它也会扩大 I 类错误。考虑以下模拟（编码在中R）：

set.seed(1044)                          # this makes the example exactly reproducible
b0 = 10                                 # these are the true values of the intercept
b1 = 0                                  #  & the slope
x  = rep(c(0, 2, 4), each=10)           # these are the X values
hetero.p.vector = vector(length=10000)  # these vectors are to store the results
homo.p.vector   = vector(length=10000)  #  of the simulation

for(i in 1:10000){                      # I simulate this 10k times
  y.homo   = b0 + b1*x + rnorm(30, mean=0, sd=1)  # these are the homoscedastic y's

  y.x0     = b0 + b1*0 + rnorm(10, mean=0, sd=1)  # these are the heteroscedastic y's
  y.x2     = b0 + b1*2 + rnorm(10, mean=0, sd=2)  #  (notice the SDs of the error
  y.x4     = b0 + b1*4 + rnorm(10, mean=0, sd=4)  #   term goes from 1 to 4)
  y.hetero = c(y.x0, y.x2, y.x4)

  homo.model         = lm(y.homo~x)               # here I fit 2 models & get the
  hetero.model       = lm(y.hetero~x)             #  p-values
  homo.p.vector[i]   = summary(homo.model)$coefficients[2,4]
  hetero.p.vector[i] = summary(hetero.model)$coefficients[2,4]
}
mean(homo.p.vector<.05)    # there are ~5% type I errors in the homoscedastic case
# 0.049                    #  (as there should be)
mean(hetero.p.vector<.05)  # but there are ~8% type I errors w/ heteroscedasticity
# 0.0804

不过，线性模型（例如多元回归）往往相当稳健。一般来说，经验法则是，只要最大方差不超过最小方差的四倍，您就可以。这是一个经验法则，因此应该考虑它的价值。但是，请注意，在上面的模拟中，在异方差模型中，最高方差是最小方差，vs），结果 I 类错误率为的。 $16\times$ $4^2=16$ $1^2 = 1$ $8\%$ $5\%$

其它你可能感兴趣的问题

上一篇结合泊松估计下一篇在给定输出的情况下查找神经网络输入