这个问题基于库特纳、纳赫茨海姆、内特和李的应用线性统计模型一书中的示例 11.1。你可以在这里找到数据。
首先他们计算一个简单的线性模型:
Call:
lm(formula = Bloodpressure ~ Age, data = data)
Residuals:
Min 1Q Median 3Q Max
-16.4786 -5.7877 -0.0784 5.6117 19.7813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.15693 3.99367 14.061 < 2e-16 ***
Age 0.58003 0.09695 5.983 2.05e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.146 on 52 degrees of freedom
Multiple R-squared: 0.4077, Adjusted R-squared: 0.3963
F-statistic: 35.79 on 1 and 52 DF, p-value: 2.05e-07
但是残差图表明,误差项的方差随着 predictor 的增加而增加Age
。长话短说,他们使用权重来解释这种异方差性,他们得到:
Call:
lm(formula = Bloodpressure ~ Age, data = data, weights = weights)
Weighted Residuals:
Min 1Q Median 3Q Max
-2.0230 -0.9939 -0.0327 0.9250 2.2008
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.56577 2.52092 22.042 < 2e-16 ***
Age 0.59634 0.07924 7.526 7.19e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.213 on 52 degrees of freedom
Multiple R-squared: 0.5214, Adjusted R-squared: 0.5122
F-statistic: 56.64 on 1 and 52 DF, p-value: 7.187e-10
现在我的问题是,如果我得到一个新患者,我该如何计算预测误差age=25
?较低年龄的预测间隔会更小,这是否正确?