假设我建立了一个线性回归模型来识别数据中变量之间的线性依赖关系。其中一些变量是分类变量。
如果我想评估给定预测器的贡献,我该如何评估它?我可以直接比较系数吗?我在答案中读到 |t| value 让我们对这个预测器的强度有一个感觉,这到底是怎么回事?
我知道对于具有
K
值的给定类别,只K-1
创建虚拟变量,这是避免明显多重共线性的标准,但是我如何仍然确定与丢弃的值(预测变量)相关的贡献?
这是模型:
mod = smf.ols('dependent ~ first_category + second_category + object_price', data=df).fit()
和输出
mod.summary()
OLS Regression Results
==============================================================================
Dep. Variable: dependent R-squared: 0.227
Model: OLS Adj. R-squared: 0.226
Method: Least Squares F-statistic: 261.7
Date: Thu, 04 Sep 2014 Prob (F-statistic): 0.00
Time: 14:59:24 Log-Likelihood: -86099.
No. Observations: 17866 AIC: 1.722e+05
Df Residuals: 17845 BIC: 1.724e+05
Df Model: 20
===========================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------
Intercept 27.6888 1.017 27.235 0.000 25.696 29.682
first_category[T.o] -1.3250 0.848 -1.562 0.118 -2.987 0.337
first_category[T.v] -10.4557 1.125 -9.294 0.000 -12.661 -8.251
second_category[T.SL0004] 21.9987 0.808 27.213 0.000 20.414 23.583
second_category[T.SL0005] -2.3710 2.458 -0.965 0.335 -7.188 2.446
second_category[T.SL0006] 7.2716 3.609 2.015 0.044 0.197 14.346
second_category[T.SL0007] 20.1545 1.495 13.482 0.000 17.224 23.085
second_category[T.SL0008] 13.3333 0.794 16.788 0.000 11.777 14.890
second_category[T.SL0009] 18.5605 2.189 8.478 0.000 14.270 22.851
second_category[T.SL0010] 6.7351 1.158 5.817 0.000 4.465 9.005
second_category[T.SL0011] 2.6791 0.689 3.888 0.000 1.329 4.030
second_category[T.SL0012] -0.8159 3.811 -0.214 0.830 -8.285 6.654
second_category[T.SL0014] 8.2550 11.359 0.727 0.467 -14.010 30.520
second_category[T.SL0016] 1.6220 1.229 1.320 0.187 -0.787 4.031
second_category[T.SL0017] -14.3253 2.642 -5.422 0.000 -19.504 -9.147
second_category[T.SL0018] 1.4823 3.193 0.464 0.643 -4.777 7.741
second_category[T.SL0019] 20.0228 2.850 7.024 0.000 14.436 25.610
second_category[T.SL0020] -11.7478 8.691 -1.352 0.176 -28.782 5.287
budget -0.5682 0.014 -40.828 0.000 -0.595 -0.541
object_price 0.0037 0.000 33.192 0.000 0.003 0.004
hour -0.9244 0.040 -23.244 0.000 -1.002 -0.846
==============================================================================
Omnibus: 2997.054 Durbin-Watson: 1.001
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4758.803
Skew: 1.183 Prob(JB): 0.00
Kurtosis: 3.892 Cond. No. 1.59e+05
==============================================================================
Warnings:
[1] The condition number is large, 1.59e+05. This might indicate that there are
strong multicollinearity or other numerical problems.