如何导入 statsmodels 模块以使用 OLS 类?

数据挖掘 机器学习 回归 线性回归
2022-02-16 10:31:40

我正在尝试多重回归

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing Dataset
dataset = pd.read_csv(
    'C:/Users/Rupali Singh/Desktop/ML A-Z/Machine Learning A-Z Template Folder/Part 2 - Regression/Section 5 - Multiple Linear Regression/50_Startups.csv')
print(dataset)
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 4].values

# Categorical Data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features=[3])
X = onehotencoder.fit_transform(X).toarray()

# Splitting the dataset into training set and test set

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
print(Y_train)

# Fitting Multiple Linear Regression

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# predicting the test result
Y_pred = regressor.predict(X_test)

这是发生错误的部分

# Building the optimal model with Backward Elimination
import statsmodels.formula.api as sm

X = np.append(arr=np.ones((50, 1)).astype(int), values=X, axis=1)
print(X)
X_opt = X[:, [0, 1, 2, 3, 4, 5]]
regressor_ols = sm.OLS(endog=Y, exog=X_opt).fit()
print(regressor_ols.summary())

这是错误信息

Traceback (most recent call last):
  File "C:/Users/Rupali Singh/PycharmProjects/Machine_Learning/Muliple_Linear_Regression.py", line 39, in <module>
    import statsmodels.formula.api as sm
  File "C:\Users\Rupali Singh\PycharmProjects\Machine_Learning\venv\lib\site-packages\statsmodels\formula\api.py", line 15, in <module>
    from statsmodels.discrete.discrete_model import MNLogit
  File "C:\Users\Rupali Singh\PycharmProjects\Machine_Learning\venv\lib\site-packages\statsmodels\discrete\discrete_model.py", line 45, in <module>
    from statsmodels.distributions import genpoisson_p
  File "C:\Users\Rupali Singh\PycharmProjects\Machine_Learning\venv\lib\site-packages\statsmodels\distributions\__init__.py", line 2, in <module>
    from .edgeworth import ExpandedNormal
  File "C:\Users\Rupali Singh\PycharmProjects\Machine_Learning\venv\lib\site-packages\statsmodels\distributions\edgeworth.py", line 7, in <module>
    from scipy.misc import factorial
ImportError: cannot import name 'factorial'

Process finished with exit code 1
2个回答

https://stackoverflow.com/a/56284155/9524424

您需要有一个匹配的 scipy 版本(1.2 而不是 1.3)

这本质上是 statsmodels 与其使用的 scipy 版本不兼容:statsmodels 0.9 与 scipy 1.3.0 不兼容。我会称之为错误。已经报道过了如果升级到最新的statsmodels开发版本,问题就会消失:

pip install --upgrade Cython
pip install --upgrade git+https://github.com/statsmodels/statsmodels

对我来说,这解决了问题。另一种方法是将 scipy 降级到 1.2 版。