我正在尝试为 kNN 模型调整“n_neighbors”并且我有以下问题:
基于平均交叉验证分数,最佳 kNN 模型应该是具有 10 个邻居的模型。
另一方面,当我绘制“分数与邻居”图时,我看到有些模型的训练数据和测试数据之间的分数距离要小得多(例如具有 20 个邻居的模型)。
我是 ML 新手,这对我来说仍然很困惑。但是我应该坚持 10 个邻居模型,还是 20 个邻居模型更好?我该如何决定?任何帮助深表感谢。
这是我的代码和图表:
best_score = 0
neighbors = np.arange(1,31)
all_train_scores = []
all_test_scores = []
for n_neighbors in neighbors :
reg = KNeighborsRegressor(n_neighbors = n_neighbors, metric = 'manhattan')
score = cross_val_score(reg, X_train, y_train, cv = 5)
score = np.mean(score)
if score > best_score :
best_score = score
optimal_choice = {'n_neighbors' : n_neighbors}
reg.fit(X_train, y_train)
train_score = reg.score(X_train, y_train)
test_score = reg.score(X_test, y_test)
all_train_scores = np.append(all_train_scores, train_score)
all_test_scores = np.append(all_test_scores, test_score)
