机器算法验证 - 如何设置和解释方差分析与 R 中的汽车包的对比？ - 吾爱随笔录

如何设置和解释方差分析与 R 中的汽车包的对比？

机器算法验证 r 方差分析对比

2022-01-19 11:30:52

假设我有一个简单的 2x2 阶乘实验，我想对其进行 ANOVA。像这样，例如：

d   <- data.frame(a=factor(sample(c('a1','a2'), 100, rep=T)),
                  b=factor(sample(c('b1','b2'), 100, rep=T)));
d$y <- as.numeric(d$a)*rnorm(100, mean=.75, sd=1) +
       as.numeric(d$b)*rnorm(100, mean=1.2, sd=1) +
       as.numeric(d$a)*as.numeric(d$b)*rnorm(100, mean=.5, sd=1) +
       rnorm(100);

在没有显着交互的情况下，默认情况下（即contr.treatment）的输出是的所有级别和所有级别Anova()的整体显着性，对吗？abba
我应该如何指定一个对比，使我能够测试在水平 b1 上保持不变的效果a、在水平 b2 上保持不变的效果以及交互作用的显着性？baba:b

1个回答

您的示例导致单元格大小不相等，这意味着不同的“平方和类型”很重要，并且主效应的测试并不像您所说的那么简单。Anova()使用类型 II 平方和。请参阅此问题作为开始。

有不同的方法来测试对比。请注意，SS 类型并不重要，因为我们最终会在相关的单因素设计中进行测试。我建议使用以下步骤：

# turn your 2x2 design into the corresponding 4x1 design using interaction()
> d$ab <- interaction(d$a, d$b)       # creates new factor coding the 2*2 conditions
> levels(d$ab)                        # this is the order of the 4 conditions
[1] "a1.b1" "a2.b1" "a1.b2" "a2.b2"

> aovRes <- aov(y ~ ab, data=d)       # oneway ANOVA using aov() with new factor

# specify the contrasts you want to test as a matrix (see above for order of cells)
> cntrMat <- rbind("contr 01"=c(1, -1,  0,  0),  # coefficients for testing a within b1
+                  "contr 02"=c(0,  0,  1, -1),  # coefficients for testing a within b2
+                  "contr 03"=c(1, -1, -1,  1))  # coefficients for interaction

# test contrasts without adjusting alpha, two-sided hypotheses
> library(multcomp)                   # for glht()
> summary(glht(aovRes, linfct=mcp(ab=cntrMat), alternative="two.sided"),
+         test=adjusted("none"))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: User-defined Contrasts
Fit: aov(formula = y ~ ab, data = d)

Linear Hypotheses:
              Estimate Std. Error t value Pr(>|t|)
contr 01 == 0  -0.7704     0.7875  -0.978    0.330
contr 02 == 0  -1.0463     0.9067  -1.154    0.251
contr 03 == 0   0.2759     1.2009   0.230    0.819
(Adjusted p values reported -- none method)

现在手动检查第一个对比的结果。

> P       <- 2                             # number of levels factor a
> Q       <- 2                             # number of levels factor b
> Njk     <- table(d$ab)                   # cell sizes
> Mjk     <- tapply(d$y, d$ab, mean)       # cell means
> dfSSE   <- sum(Njk) - P*Q                # degrees of freedom error SS
> SSE     <- sum((d$y - ave(d$y, d$ab, FUN=mean))^2)    # error SS
> MSE     <- SSE / dfSSE                   # mean error SS
> (psiHat <- sum(cntrMat[1, ] * Mjk))      # contrast estimate
[1] -0.7703638

> lenSq <- sum(cntrMat[1, ]^2 / Njk)       # squared length of contrast
> (SE   <- sqrt(lenSq*MSE))                # standard error
[1] 0.7874602

> (tStat <- psiHat / SE)                   # t-statistic
[1] -0.9782893

> (pVal <- 2 * (1-pt(abs(tStat), dfSSE)))  # p-value
[1] 0.3303902

其它你可能感兴趣的问题

上一篇贝叶斯和频率论的 EDA 方法是否存在差异？下一篇在序数或名义数据中合并/减少类别的方法？