ValueError:形状(1,10)和(2,)未对齐:10(dim 1)!= 2(dim 0)

数据挖掘 机器学习 Python 回归 线性回归
2022-03-01 12:50:01

我正在使用反向消除运行多元线性回归。下面是代码

import statsmodels.formula.api as sm
X = np.append(arr = np.ones((50, 1)).astype(int), values = X, axis = 
1)
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog= y, exog = X_opt).fit()
regressor_OLS.summary()
X_opt = X[:,[0,1,3,4,5]]
regressor_OLS = sm.OLS(endog= y, exog = X_opt).fit()
regressor_OLS.summary()
X_opt = X[:,[0,3,4,5]]
regressor_OLS = sm.OLS(endog= y, exog = X_opt).fit()
regressor_OLS.summary()
X_opt = X[:,[0,3,5]]
regressor_OLS = sm.OLS(endog= y, exog = X_opt).fit()
regressor_OLS.summary()
X_opt = X[:,[0,3]]
regressor_OLS = sm.OLS(endog= y, exog = X_opt).fit()
regressor_OLS.summary()

但是当我使用上述regressor_OLS模型进行预测时,

X_new = X[:, 3]
y_pred2 = regressor_OLS.predict(X_new)

我收到以下错误:

    y_pred2 = regressor_OLS.predict(X_new)
    Traceback (most recent call last):

  File "<ipython-input-18-263dee38fc26>", line 1, in <module>
    y_pred2 = regressor_OLS.predict(X_new)

  File "/Users/ritesh.satapathy/anaconda/lib/python3.6/site-packages/statsmodels/base/model.py", line 749, in predict
    return self.model.predict(self.params, exog, *args, **kwargs)

  File "/Users/ritesh.satapathy/anaconda/lib/python3.6/site-packages/statsmodels/regression/linear_model.py", line 359, in predict
    return np.dot(exog, params)

ValueError: shapes (1,50) and (2,) not aligned: 50 (dim 1) != 2 (dim 0)

我试过X_new = X_test[:,3]但仍然是同样的错误。

Traceback (most recent call last):

  File "<ipython-input-19-5020d55a4448>", line 1, in <module>
    y_pred2 = regressor_OLS.predict(X_ne1)

  File "/Users/ritesh.satapathy/anaconda/lib/python3.6/site-packages/statsmodels/base/model.py", line 749, in predict
    return self.model.predict(self.params, exog, *args, **kwargs)

  File "/Users/ritesh.satapathy/anaconda/lib/python3.6/site-packages/statsmodels/regression/linear_model.py", line 359, in predict
    return np.dot(exog, params)

ValueError: shapes (1,10) and (2,) not aligned: 10 (dim 1) != 2 (dim 0)
4个回答
X_ne1 = X_test[:,3]
y_pred2 = regressor_OLS.predict(X_ne1)
# The confusion occurs due to the two different forms of statsmodels predict() method.
# This is just a consequence of the way the statsmodels folks designed the api.
# Both forms of the predict() method demonstrated and explained below.

import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt

# random normal x, y with y = x + 10
x = np.random.randn(100)
y = x + np.random.randn(100) + 10

fig, ax = plt.subplots(figsize=(8, 4))
ax.scatter(x, y, alpha=0.6, color='blue')
# will add the regression line to the plot and display below.

# Plot a linear regression line through the points in the scatter plot, above.
# Using statsmodels.api.OLS(Y, X).fit().
# To include a regression constant, one must use sm.add_constant() to add a column of '1s'
# to the X matrix. Basically, this tells statsmodels to calculate a constant for the regression line.
#
# FYI, the sklearn.linear_model.LinearRegression model includes a fit_intercept parameter
# and does not require the X matrix to have a column of ones.
x_matrix = sm.add_constant(x)
model = sm.OLS(y, x_matrix)
# regression_results is an object: statsmodels.regression.linear_model.RegressionResults.
regression_results = model.fit()
# nice summary
# print(regression_results.summary())

#
# There are two forms of the predict() method:
# There is sm.OLS(y, x).predict(). This predict() method has no knowledge of the fitted coefficients
# produced by model.fit(), above. This is just the way the scipy developers decided to implement
# the linear model.
# The fitted coefficients for the linear model are in RegressionResults and RegressionResults
# has its own predict().
# If you use model.predict(), you need to pass the coefficients. This form of the predict() method
# basically calculates y = ax + b where you pass the coefficients, a and b. This is why the error
# occurs.
#
# Use RegressionResults.predict() which acts as you expect, except that you  still must add
# a column of ones to the x-values used to predict y. This is because the original model
# was fit with a regression constant (intercept). Generally linear models are fit with an intercept
# unless there is some problem-specific reason not to.
#
x_pred = np.linspace(x.min(), x.max(), 50)
# put the X matrix in 'standard' form, i.e. with a column of ones.
x_matrix = sm.add_constant(x_pred)
y_pred = regression_results.predict(x_matrix)

# Line from RegressionResults.predict() in 'black'.
ax.plot(x_pred, y_pred, color='black')
# one more graphic to add before display below.

# So why is model.predict() provided? The calculation for a linear model is a trivial
# linear numpy calculation. For more complex models, this will not be the case
# and model.predict() can be useful.
# Here is how to use it.

coef = regression_results.params[1]     # get the fitted model coefficients
const = regression_results.params[0]

# These two lines of code produce the same results, as expected.
# y_pred = coef * x_pred + const
y_pred = model.predict(params=[const, coef], exog=x_matrix)

# Line from model.predict() in 'purple' overlays the 'black' line from above.
ax.plot(x_pred, y_pred, color='purple')


plt.show()
plt.close()

在此处输入图像描述

在构建更优化的回归器时,您还需要删除与您删除的列相对应的列。

X_new = X_test[:, [0,3]] 
y2_pred = regressor_OLS.predict(X_new)

此外,您将需要在您的问题中不清楚的测试集上使用预测。

您不需要从 X 中获取列,因为您已经定义了 X_opt。此外,您不应该使用 3,因为您只有 2 列。首先,您需要将数据集拆分为 X_opt_train 和 X_opt_test 以及 y_train 和 y_test。然后将数据集拟合到 X_opt_train 和 y_train。然后你预测: y_pred = regressor_OLS.predict(X_opt_test)

至少这对我有用。我有同样的错误