- 差异与级联模型的类型成对比较有关。
- 此外,aov() 函数在如何选择自由度方面存在问题。它似乎混合了两个概念:1)逐步比较的平方和,2)整体情况的自由度。
问题再现
> data <- list(value = c (8.788269,7.964719,8.204051,9.041368,8.181555,8.0414149,7.992336,7.948658,8.090211,8.031459,8.118308,7.699051,7.537120,7.268570), time = c(1,6,12,24,48,96,144,1,6,12,24,48,96,144), treat = c(0,0,0,0,0,0,0,1,1,1,1,1,1,1) )
> summary( lm(value ~ treat*time, data=data) )
> summary( aov(value ~ 1 + treat + time + I(treat*time),data=data) )
解释中使用的一些模型
#all linear models used in the explanation below
> model_0 <- lm(value ~ 1, data)
> model_time <- lm(value ~ 1 + time, data)
> model_treat <- lm(value ~ 1 + treat, data)
> model_interaction <- lm(value ~ 1 + I(treat*time), data)
> model_treat_time <- lm(value ~ 1 + treat + time, data)
> model_treat_interaction <- lm(value ~ 1 + treat + I(treat*time), data)
> model_time_interaction <- lm(value ~ 1 + time + I(treat*time), data)
> model_treat_time_interaction <- lm(value ~ 1 + time + treat + I(treat*time), data)
LM T_TEST 如何工作以及与 F-TEST 相关
# the t-test with the estimator and it's variance, mean square error, is
# related to the F test of pairwise comparison of models by dropping 1
# model parameter
> anova(model_treat_time_interaction, model_time_interaction)
Analysis of Variance Table
Model 1: value ~ 1 + time + treat + I(treat * time)
Model 2: value ~ 1 + time + I(treat * time)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 0.89985
2 11 1.21118 -1 -0.31133 3.4598 0.09251 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(model_treat_time_interaction, model_treat_interaction)
Analysis of Variance Table
Model 1: value ~ 1 + time + treat + I(treat * time)
Model 2: value ~ 1 + treat + I(treat * time)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 0.89985
2 11 1.14374 -1 -0.2439 2.7104 0.1307
> anova(model_treat_time_interaction, model_treat_time)
Analysis of Variance Table
Model 1: value ~ 1 + time + treat + I(treat * time)
Model 2: value ~ 1 + treat + time
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 0.89985
2 11 0.93245 -1 -0.032599 0.3623 0.5606
> # which is the same as
> drop1(model_treat_time_interaction, scope = ~time+treat+I(treat*time), test="F")
Single term deletions
Model:
value ~ 1 + time + treat + I(treat * time)
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 0.89985 -30.424
time 1 0.243896 1.14374 -29.067 2.7104 0.13072
treat 1 0.311333 1.21118 -28.264 3.4598 0.09251 .
I(treat * time) 1 0.032599 0.93245 -31.926 0.3623 0.56064
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AOV 如何工作并在 F 测试中选择 DF
> #the aov function makes stepwise additions/drops
>
> #first the time, then treat, then the interaction
> anova(model_0, model_time)
Analysis of Variance Table
Model 1: value ~ 1
Model 2: value ~ 1 + time
Res.Df RSS Df Sum of Sq F Pr(>F)
1 13 2.5902
2 12 1.8176 1 0.7726 5.1006 0.04333 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(model_time, model_treat_time)
Analysis of Variance Table
Model 1: value ~ 1 + time
Model 2: value ~ 1 + treat + time
Res.Df RSS Df Sum of Sq F Pr(>F)
1 12 1.81764
2 11 0.93245 1 0.8852 10.443 0.007994 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(model_treat_time, model_treat_time_interaction)
Analysis of Variance Table
Model 1: value ~ 1 + treat + time
Model 2: value ~ 1 + time + treat + I(treat * time)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 11 0.93245
2 10 0.89985 1 0.032599 0.3623 0.5606
>
> # note that the sum of squares for within model variation is the same
> # but the F values and p-values are not the same because the aov
> # function somehow chooses to use the degrees of freedom in the
> # complete model in all stepwise changes
>
重要的提示
> # Although the p and F values do not exactly match, it is this effect
> # of order and selection of cascading or not in model comparisons.
> # An important note to make is that the comparisons are made by
> # stepwise additions and changing the order of variables has an
> # influence on the outcome!
>
> # Additional note changing the order of 'treat' and 'time' has no
> # effect because they are not correlated
> summary( aov(value ~ 1 + treat + time +I(treat*time), data=data) )
Df Sum Sq Mean Sq F value Pr(>F)
treat 1 0.8852 0.8852 9.837 0.0106 *
time 1 0.7726 0.7726 8.586 0.0150 *
I(treat * time) 1 0.0326 0.0326 0.362 0.5606
Residuals 10 0.8998 0.0900
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary( aov(value ~ 1 + I(treat*time) + treat + time, data=data) )
Df Sum Sq Mean Sq F value Pr(>F)
I(treat * time) 1 1.3144 1.3144 14.606 0.00336 **
treat 1 0.1321 0.1321 1.469 0.25343
time 1 0.2439 0.2439 2.710 0.13072
Residuals 10 0.8998 0.0900
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> # This is an often forgotten quirck
> # best is to use manual comparisons such that you know
> # and understand your hypotheses
> # (which is often forgotten in the click and
> # point anova modelling tools)
> #
> # anova(model1, model2)
> # or use
> # stepAIC from the MASS library