在 cross_val_score() 中使用 cv=5 或 cv=KFold(n_splits=5) 的区别?

机器算法验证 机器学习 Python scikit-学习
2022-03-20 09:58:59

usingcv=5cv=KFold(n_splits=5)in和有什么不一样cross_val_score()

cross_val_score(model, X, y, cv=5)

数组([0.96666667, 0.96666667, 0.93333333, 0.93333333, 1. ])

cross_val_score(model, X, y, cv=KFold(n_splits=5))

数组([1. , 1. , 0.86666667, 0.93333333, 0.83333333])

1个回答

当一个整数被传递给 的cv参数时cross_val_score()

  • StratifiedKFold如果估计器是分类器并且y是二元或多类时使用。
  • 在所有其他情况下,KFold使用。
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score, KFold, StratifiedKFold


data = datasets.load_breast_cancer()
x, y = data.data, data.target

print(cross_val_score(DecisionTreeClassifier(random_state=1), x, y, cv=5))
print(cross_val_score(DecisionTreeClassifier(random_state=1), x, y, cv=KFold(n_splits=5)))
print(cross_val_score(DecisionTreeClassifier(random_state=1), x, y, cv=StratifiedKFold(n_splits=5)))

[0.90434783 0.90434783 0.92035398 0.94690265 0.91150442]
[0.89473684 0.92982456 0.94736842 0.95614035 0.82300885]
[0.90434783 0.90434783 0.92035398 0.94690265 0.91150442]