我将通过 R 调用给出我的示例。首先是一个简单的线性回归示例,它具有一个因变量“寿命”和两个连续的解释变量。
data.frame(height=runif(4000,160,200))->human.life
human.life$weight=runif(4000,50,120)
human.life$lifespan=sample(45:90,4000,replace=TRUE)
summary(lm(lifespan~1+height+weight,data=human.life))
Call:
lm(formula = lifespan ~ 1 + height + weight, data = human.life)
Residuals:
Min 1Q Median 3Q Max
-23.0257 -11.9124 -0.0565 11.3755 23.8591
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.635709 3.486426 18.252 <2e-16 ***
height 0.007485 0.018665 0.401 0.6884
weight 0.024544 0.010428 2.354 0.0186 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.41 on 3997 degrees of freedom
Multiple R-squared: 0.001425, Adjusted R-squared: 0.0009257
F-statistic: 2.853 on 2 and 3997 DF, p-value: 0.05781
为了在“重量”的值为 1 时找到“寿命”的估计值,我添加 (Intercept)+height=63.64319
现在,如果我有一个类似的数据框,但其中一个解释变量是分类变量,该怎么办?
data.frame(animal=rep(c("dog","fox","pig","wolf"),1000))->animal.life
animal.life$weight=runif(4000,8,50)
animal.life$lifespan=sample(1:10,replace=TRUE)
summary(lm(lifespan~1+animal+weight,data=animal.life))
Call:
lm(formula = lifespan ~ 1 + animal + weight, data = animal.life)
Residuals:
Min 1Q Median 3Q Max
-4.7677 -2.7796 -0.1025 3.1972 4.3691
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.565556 0.145851 38.159 < 2e-16 ***
animalfox 0.806634 0.131198 6.148 8.6e-10 ***
animalpig 0.010635 0.131259 0.081 0.9354
animalwolf 0.806650 0.131198 6.148 8.6e-10 ***
weight 0.007946 0.003815 2.083 0.0373 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.933 on 3995 degrees of freedom
Multiple R-squared: 0.01933, Adjusted R-squared: 0.01835
F-statistic: 19.69 on 4 and 3995 DF, p-value: 4.625e-16
在这种情况下,要在“重量”的值为 1 时找到“寿命”的估计值,我应该将“动物”的每个系数添加到截距:(Intercept)+animalfox+animalpig+animalwolf?或者这样做的正确方法是什么?
谢谢斯韦雷