我正在为涉及 6 个类和四个特征的分类问题运行多项逻辑回归。
这是代码:
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20)
logreg = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')
logreg = logreg.fit(X_train, Y_train)
output2 = logreg.predict(X_test)
logreg.intercept_
logreg.coef_
logreg.classes_
我得到以下输出:
截距
array([-1.33803785, -1.55807614, -1.63809549, -0.05199907, 3.72777888, 0.85842968])
系数
array([[ 3.59830486, 5.1370334 , 1.32336325, 4.89734568],
[ 3.5507364 , 5.2459697 , 1.48523684, 4.81653704],
[ 3.35193267, 5.40124363, 2.04869296, 3.885547 ],
[ -5.4930705 , 5.49483357, 1.96479926, -6.7624365 ],
[ -8.61513183, -3.77761893, -7.79363153, -11.72171457],
[ 3.6072284 , -17.50146139, 0.97153921, 4.88472135]])
课程
array([u'Dropper', u'Flat', u'Grower', u'New User', u'Non User', u'Stopper'], dtype=object)
我无法解释模型。据我了解多项逻辑回归,对于 K 个可能的结果,运行 K-1 独立二元逻辑回归模型,其中一个结果被选为“枢轴”,然后其他 K-1 结果分别针对枢轴结果进行回归。
根据这个,6个类必须有5个方程。但是这里有6个。怎么会?
