目前,我正在做一个项目。数据集大致以 50:50 的比例平衡。我创建了一个决策树分类器。我在验证数据上实现了不错的准确度(~75%),但目标变量的准确度存在偏差。对于 class=0,它大约是。98%,而对于 class = 1,只有 17%。
我尝试使用 MinMaxScaler 缩放数据仍然没有运气。
model = tree.DecisionTreeClassifier(class_weight={1:30}, min_samples_leaf=160, max_depth=10)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=10)
min_max_scaler = preprocessing.MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(X_train)
X_test_scaled = min_max_scaler.fit_transform(X_test)
model = model.fit(X_train_scaled, y_train)
prediction = model.predict(X_test_scaled)
print metrics.accuracy_score(y_test, prediction)
print classification_report(y_test, prediction)
Size of x_train_scaled = 12600 and x_test_scaled = 5400 Accuracy: 75% Precision: {0:100%, 1:17%} Recall: {0:74%, 1:100%} F1-Score: {0:85%, 1:29%}
如何在保持整体精度和准确度的同时提高 class=1 的精度?