我迈出了机器学习的第一步。我使用 python sklearn 模块创建了一个简单的模型,但我无法弄清楚一个基本的东西。我希望准确率和召回率应该是预测概率阈值的单调函数。这些是我预测的概率值和标签:
p = [ 0.689, 0.816, 0.846, 0.19 , 0.527, 0.846, 0.846, 0.73 ,
0.846, 0.762, 0.22 , 0.958, 0.223, 0.658, 0.481, 0.846,
0.134, 0.77 , 0.57 , 0.846, 0.482, 0.846, 0.846, 0.846,
0.846, 0.846, 0.757, 0.457, 0.934, 0.902, 0.846, 0.326,
0.846, 0.205, 0.396, 0.143, 0.553, 0.683, 0.846, 0.706,
0.91 , 0.18 , 0.591, 0.769, 0. , 0.112, 0.546, 0.449,
0.195, 0.2 , 0.689, 0.883, 0.692, 0.812, 0.213, 0.843,
0.846, 0.155, 0.514, 0.59 , 0.495, 0.846, 0.717, 0.74 ,
0.121, 0.866, 0.266, 0.925, 0.915, 0.151, 0.846, 0.531,
0.846, 0.176, 0.846, 0.849, 0.813, 0.846, 0.543, 0.19 ,
0.875, 0.846, 0.846, 0.466, 0.846, 0.197, 0.583, 0.646,
0.186, 0.683, 0.841, 0.205, 0.725, 0.846, 0.302, 0.134,
0.846, 0.846, 0.993, 0.437, 0.663, 0.559, 0.421]
v = [False, False, False, True, False, False, True, False, False,
False, True, False, True, False, False, False, True, False,
True, False, True, False, False, True, False, False, True,
True, True, False, False, True, False, True, True, True,
True, True, False, False, False, True, True, True, True,
True, False, False, True, True, False, False, False, False,
True, False, False, True, True, True, False, False, False,
False, True, False, True, False, False, True, True, False,
False, True, False, False, True, False, True, True, False,
False, True, True, False, True, False, False, True, True,
False, True, True, False, True, True, False, False, False,
True, False, False, True]
我现在计算精度和召回值:
pre, rec, thr = metrics.precision_recall_curve(v, p)
...并绘制它们:
plt.plot(thr, pre[:-1], '-', label='precition')
plt.plot(thr, rec[:-1], '-', label='recall')
plt.legend()
这就是我得到的:
到底是怎么回事?为什么这张图不像我们在这里看到的更相似?是我的模型有问题还是我如何使用该precision_recall_curve
功能?