数据挖掘 - 从回归模型中移除常数 - 吾爱随笔录

从回归模型中移除常数

数据挖掘 r 回归预测建模线性代数

2022-02-14 04:25:35

我正在尝试校准两个变量 $(X,Y)$ 两种仪器的不同测量技术，线性回归分析的结果如图所示。

结果表明回归常数在统计上不显着，但模型显着。我试图删除回归常数（它是一个接近零的非常小的值）和 $R$ 新模型的比例提高到 90%。去除回归常数是否正确？

1个回答

当你估计一个没有常数的线性模型时，你基本上“强制”估计的函数通过 $(0,0)$ 坐标。

通过截距，您可以估计一个线性函数，例如：

y = β_{0} + β_{1} x .

$y = \beta_0 + \beta_1 x .$

如果没有截距，您估计一个线性函数，如：

y = 0 + β_{1} x .

$y = 0 + \beta_1 x .$

所以当 $x=0$ , $y$ 将会 $0$ 也是。

你不应该只看 $R^2$ 自从 $R^2$ 当你没有拦截时，通常会上升。考虑模型的结构、数据的外观以及您想要实现的目标。

R中的示例：

library(ISLR)
auto = ISLR::Auto

ols1 = lm(mpg~horsepower,data=auto)
summary(ols1)
plot(auto$horsepower, auto$mpg)
lines(auto$horsepower, predict(ols1, newdata=auto), type="l", col="red")

ols2 = lm(mpg~horsepower+0,data=auto)
summary(ols2)
plot(auto$horsepower, auto$mpg)
lines(auto$horsepower, predict(ols2, newdata=auto), type="l", col="red")

结果：

带截距的模型：

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.935861   0.717499   55.66   <2e-16 ***
horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared:  0.6059,    Adjusted R-squared:  0.6049 
F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

没有截距的模型：

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
horsepower 0.178840   0.006648    26.9   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.65 on 391 degrees of freedom
Multiple R-squared:  0.6492,    Adjusted R-squared:  0.6483 
F-statistic: 723.7 on 1 and 391 DF,  p-value: < 2.2e-16

概括：

在这个例子中，排除截距改进了 $R^2$ 但是通过强制（估计的）函数通过 $(0,0)$ ，模型结果完全不同。本质上，在这种情况下，没有截距的模型会产生废话。所以要非常小心地排除截距项。

其它你可能感兴趣的问题

上一篇在单个端点的 Sagemaker 上部署多个预训练模型（tar.gz 文件）下一篇我们可以使用句子转换器来嵌入没有标签的句子吗？