我正在处理带有 5 个不平衡标签的肾癌患者数据。这些代码包含在特征工程部分的归一化、过采样。提供了用于分类任务的 9 种普通机器学习方法的列表。然后,我利用了硬投票和加权投票两种集成方法。利用 10 倍 CV 来验证结果。
methods = ['Support Vector Machine', 'Logistic Regression', 'K Neighbors Classifier', 'Random Forest', \
'Gaussian Naive Bayes', 'Linear Discriminant Analysis', 'Decision Tree', 'Gradient Boosting',\
'soft_VotingClassifier','hard_VotingClassifier']
我想知道如何调整软投票方法的权重?这是我现在的代码和结果:
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
if method == 'soft_VotingClassifier':
cl1 = LogisticRegression()
cl2 = KNeighborsClassifier(n_neighbors=10)
cl3 = RandomForestClassifier(max_depth=35)
cl4 = GaussianNB()
cl5 = LinearDiscriminantAnalysis()
cl6 = DecisionTreeClassifier()
cl7 = SVC(C=0.1, gamma=0.0001, kernel='poly')
cl8 = GradientBoostingClassifier()
estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),\
(method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
eclf = VotingClassifier(estimators=estimator,
voting='soft', weights=[5, 5, 10, 5, 6, 8, 4, 10])
if method == 'hard_VotingClassifier':
cl1 = LogisticRegression()
cl2 = KNeighborsClassifier(n_neighbors=10)
cl3 = RandomForestClassifier(max_depth=35)
cl4 = GaussianNB()
cl5 = LinearDiscriminantAnalysis()
cl6 = DecisionTreeClassifier()
cl7 = SVC(kernel='linear',gamma='scale')
cl8 = GradientBoostingClassifier()
estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),\
(method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
eclf = VotingClassifier(estimators=estimator, voting='hard')
测试数据的混淆矩阵结果:
[50 5 1 4 2]
[ 0 13 1 0 3]
[ 0 1 2 1 1]
[ 4 0 0 2 0]
[ 0 3 0 0 2]
测试数据的准确度结果:
对于那些想要检查我的代码甚至运行它们的好心人,我会放上存储库链接。