为什么不按照因果模型创建数据?如果模型是X←Z→Y,然后只是创建随机Z, 并构建X作为一些功能Z加上噪音(对于Y)。(我留下了错误 SD=1,但只是将其设置得更高以获得更现实的意义。)
> set.seed(100)
> z = rnorm(500)
> x = z + rnorm(500, 0, 1)
> y = z + rnorm(500, 0, 1)
在不受控制的模型中,您将获得之间的虚假关联X和Y, 按照要求:
> summary(lm(y ~ x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-3.1534 -0.8674 0.0423 0.8893 4.0521
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06574 0.05513 -1.192 0.234
x 0.49819 0.03809 13.081 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.233 on 498 degrees of freedom
Multiple R-squared: 0.2557, Adjusted R-squared: 0.2542
F-statistic: 171.1 on 1 and 498 DF, p-value: < 2.2e-16
与X适当控制后消失:
> summary(lm(y ~ x + z))
Call:
lm(formula = y ~ x + z)
Residuals:
Min 1Q Median 3Q Max
-3.03113 -0.70603 0.04694 0.62110 2.89843
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01238 0.04464 -0.277 0.782
x 0.02431 0.04228 0.575 0.566
z 0.99553 0.06095 16.334 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9952 on 497 degrees of freedom
Multiple R-squared: 0.5157, Adjusted R-squared: 0.5138
F-statistic: 264.6 on 2 and 497 DF, p-value: < 2.2e-16