我的学习曲线周围的标准差图表明了什么?

数据挖掘 机器学习 分类 scikit-学习 训练
2022-02-17 14:54:35

我在下面绘制了一条学习曲线。我的训练分数的顶部周围有一条粗红带。为什么一开始就这么高?

在此处输入图像描述

下面是使用的代码片段:

train_sizes, train_scores, test_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes, scoring= 'neg_brier_score')
    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
    test_scores_std = np.std(test_scores, axis=1)
    plt.grid()

    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,train_scores_mean + train_scores_std, alpha=0.1,
                     color="r")
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
                     # + test_scores_std, alpha=0.1, color="g")
    plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
             label="Training brier score")
    plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
             label="Cross-validation brier score")
1个回答

我只有一个猜测,但我怀疑这可能只是由于随机初始化。这意味着在很少的训练样本之后,模型仍然会有很大的不同。在 400k 个训练样本之后,所有模型都收敛到相同的学习路径。我当然可能是错的!