我已经使用 Python anaconda 建立了一个逻辑回归模型,并且惊讶地发现模型系数的数量与训练样本大小成正比,即
我的训练数据是:
print('Training data is type %s and shape %s' % (type(os_X_train), os_X_train.shape))
和输出:
Training data is type <class 'pandas.core.frame.DataFrame'> and shape (174146, 11)
那么模型是:
logreg = LogisticRegression(penalty='l2',solver='lbfgs',max_iter=1000)
model = make_pipeline(preprocess, logreg)
model.fit(os_X_train, os_y_train)
print(logreg.coef_.shape)
print("Model coefficients: ", logreg.intercept_, logreg.coef_)
这输出:
(1, 153024)
Model coefficients: [12.02830778] [[ 0.42926969 0.14192505 -1.89354062 ... 0.008847 0.00884372 -8.15123962]]
据我了解,模型系数的数量应该是预测变量或特征的列数加上一个截距,还是?