数据挖掘 - 如何调整加权投票集成方法？ - 吾爱随笔录

我正在处理带有 5 个不平衡标签的肾癌患者数据。这些代码包含在特征工程部分的归一化、过采样。提供了用于分类任务的 9 种普通机器学习方法的列表。然后，我利用了硬投票和加权投票两种集成方法。利用 10 倍 CV 来验证结果。

methods = ['Support Vector Machine', 'Logistic Regression', 'K Neighbors Classifier', 'Random Forest', \
          'Gaussian Naive Bayes', 'Linear Discriminant Analysis', 'Decision Tree', 'Gradient Boosting',\
          'soft_VotingClassifier','hard_VotingClassifier']

我想知道如何调整软投票方法的权重？这是我现在的代码和结果：

from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier

if method == 'soft_VotingClassifier':
    cl1 = LogisticRegression()
    cl2 = KNeighborsClassifier(n_neighbors=10)
    cl3 = RandomForestClassifier(max_depth=35)
    cl4 = GaussianNB()
    cl5 = LinearDiscriminantAnalysis()
    cl6 = DecisionTreeClassifier()
    cl7 = SVC(C=0.1, gamma=0.0001, kernel='poly')
    cl8 = GradientBoostingClassifier()
    estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),\
                 (method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
    eclf = VotingClassifier(estimators=estimator,
    voting='soft', weights=[5, 5, 10, 5, 6, 8, 4, 10])

if method == 'hard_VotingClassifier':
    cl1 = LogisticRegression()
    cl2 = KNeighborsClassifier(n_neighbors=10)
    cl3 = RandomForestClassifier(max_depth=35)
    cl4 = GaussianNB()
    cl5 = LinearDiscriminantAnalysis()
    cl6 = DecisionTreeClassifier()
    cl7 = SVC(kernel='linear',gamma='scale')
    cl8 = GradientBoostingClassifier()
    estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),\
                 (method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
    eclf = VotingClassifier(estimators=estimator, voting='hard')

测试数据的混淆矩阵结果：

[50  5  1  4  2]
[ 0 13  1  0  3]
[ 0  1  2  1  1]
[ 4  0  0  2  0]
[ 0  3  0  0  2]

测试数据的准确度结果：

对于那些想要检查我的代码甚至运行它们的好心人，我会放上存储库链接。