机器算法验证 - 为什么在 python 的线性回归模型中将 1 设置为截距？ - 吾爱随笔录

为什么在 python 的线性回归模型中将 1 设置为截距？

机器算法验证回归 Python 最小二乘统计模型

2022-04-15 01:06:15

我作为初学者在 Udacity 学习线性回归。我知道statsmodels.regression.linear_model.OLS()需要拦截，但你为什么设置1为拦截？

即使我们设置了该值，拟合结果也会显示不同的值作为截距。

那么设置是什么1意思呢？我们通常也1用于此设置吗？

2个回答

我不知道您所指的python函数/方法。但是您可能会感到困惑，您添加的 1 是您的变量/特征，因此它乘以参数向量中的截距参数。换句话说， 1 被添加到您的功能中，而不是您的截距值。

我不了解 Python，但正如您在中容易说明的那样R，将截距的值设置为 1 实际上只是一种约定（当然，这是一个有用的约定，它允许我们将截距解释为 )。 $x=0$

n <- 10
y <- rnorm(n)             # some random data
x <- rnorm(n)
intercept <- rep(1,n)     # a "hand-made" intercept

lm(y~x)                   # the default in R which includes an intercept
lm(y~intercept+x-1)       # removing the default intercept with -1 and re-adding it manually as another regressor
lm(y~I(2*intercept)+x-1)  # removing the default intercept with -1 and re-adding 2 as a constant term

输出：

> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   -0.07813      0.55086  


> lm(y~intercept+x-1)

Call:
lm(formula = y ~ intercept + x - 1)

Coefficients:
intercept          x  
 -0.07813    0.55086  


> lm(y~I(2*intercept)+x-1)

Call:
lm(formula = y ~ I(2 * intercept) + x - 1)

Coefficients:
I(2 * intercept)                 x  
        -0.03907           0.55086

如您所见，前两个回归完全相同（正如完全预期的那样），第三个回归在上具有相同的系数x，而常数项上的系数正好是一半，以说明我们将其乘以 2 .

其它你可能感兴趣的问题

上一篇Var(XY)，如果 X 和 Y 是独立的随机变量下一篇如何在没有超几何函数的点处逼近student-t CDF？