的重点sample_weights是赋予特定样本权重(例如,通过它们的重要性或确定性);不是特定的类。
显然,“平衡精度”是(来自用户指南):
每类召回分数的宏观平均值
因此,由于分数是跨班级平均的 - 只有班级内的权重很重要,而不是班级之间......并且您的权重在班级内是相同的,并且仅在班级之间变化。
明确地(再次来自用户指南):
w^i=wi∑j1(yj=yi)wj
即第i个样本通过将其权重除以具有相同标签的样本的总权重来重新加权。
现在,如果您愿意,您可以使用简单的准确度分数,并根据需要插入权重。
在以下示例中:
from sklearn.metrics import balanced_accuracy_score, accuracy_score
y_true = [0, 1, 0, 0, 1, 0, 1, 1, 1, 1]
y_pred = [0, 1, 0, 0, 0, 1, 1, 1, 1, 1]
some_sample_weights =[10, 1, 1, 1, 10, 1, 0.5, 0.5, 0.5, 0.5]
weights_by_class =[1 if y==1 else 1000 for y in y_true]
print('with some weights: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred, sample_weight=some_sample_weights)))
print('without weights: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred)))
print('with class weights in balanced accuracy score: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred, sample_weight=weights_by_class)))
print('with class weights in accuracy score: {:.5f}'.format(accuracy_score(y_true, y_pred, sample_weight=weights_by_class)))
class_sizes = [sum((1 for y in y_true if y==x))/len(y_true) for x in (0,1)]
weights_by_class_manually_balanced = [w/class_sizes[y] for w, y in zip(weights_by_class, y_true)]
print('with class weights in accuracy score (manually balanced): {:.5f}'.format(accuracy_score(y_true, y_pred, sample_weight=weights_by_class_manually_balanced)))
你得到:
with some weights: 0.58
without weights: 0.79
with class weights in balanced accuracy score: 0.79
with class weights in accuracy score: 0.75012
with class weights in accuracy score (manually balanced): 0.75008
如你看到的:
- 在平衡准确度得分中使用类别权重并不重要;他们刚刚调整回班级规模。
- 在准确度分数中使用类权重非常接近 75%(4 个
0标签中有 3 个被正确分类),并且根据类大小重新调整权重并不重要(准确度有点低,因为班级0更大)