我正在尝试解决称为行人检测的任务,并且我在两类正面(人,负面)-背景上训练二元分类器。
我有数据集:
- 阳性数= 3752
- 负数= 3800
我使用 train\test split 80\20 % 和RandomForestClassifier form scikit-learn 参数:
RandomForestClassifier(n_estimators=100, max_depth=50, n_jobs= -1)
我得到分数:95.896757 %
对训练数据进行测试(完美运行):
true positive: 3005
false positive: 0
false negative: 0
true negative: 3036
测试测试数据:
true positive: 742
false positive: 57
false negative: 5
true negative: 707
我的问题是如何减少误报的数量(背景归类为人)?另外,为什么我的误报错误比误报多?
我尝试使用class_weight参数,但在某些时候性能会下降(如您在 class_weight= {0:1,1:4} 中看到的那样)。
class_weight= {0:1,1:1}
true positive: 3005
false positive: 0
false negative: 0
true negative: 3036
true positive: 742
false positive: 55
false negative: 5
true negative: 709
score: 96.029120 %
class_weight= {0:1,1:2}
true positive: 3005
false positive: 0
false negative: 0
true negative: 3036
true positive: 741
false positive: 45
false negative: 6
true negative: 719
score: 96.624752 %
class_weight= {0:1,1:3}
true positive: 3005
false positive: 0
false negative: 0
true negative: 3036
true positive: 738
false positive: 44
false negative: 9
true negative: 720
score: 96.492389 %
class_weight= {0:1,1:4}
true positive: 3005
false positive: 13
false negative: 0
true negative: 3023
true positive: 735
false positive: 46
false negative: 12
true negative: 718
score: 96.161482 %
class_weight= {0:1,1:5}
true positive: 3005
false positive: 31
false negative: 0
true negative: 3005
true positive: 737
false positive: 48
false negative: 10
true negative: 716
score: 96.161482 %
class_weight= {0:1,1:6}
true positive: 3005
false positive: 56
false negative: 0
true negative: 2980
true positive: 736
false positive: 51
false negative: 11
true negative: 713
score: 95.896757 %
class_weight= {0:1,1:7}
true positive: 3005
false positive: 87
false negative: 0
true negative: 2949
true positive: 734
false positive: 59
false negative: 13
true negative: 705
score: 95.234944 %
另外值得注意的是,RandomForest 似乎不受数据集不平衡的影响:
位置= 3752 否定= 10100
class_weight= {0:1,1:1} 真阳性:3007 假阳性:0 假阴性:0 真阴性:8074
true positive: 729
false positive: 71
false negative: 16
true negative: 1955
score: 96.860339 %
class_weight= {0:1,1:2}
true positive: 3007
false positive: 0
false negative: 0
true negative: 8074
true positive: 728
false positive: 59
false negative: 17
true negative: 1967
score: 97.257308 %
class_weight= {0:1,1:3}
true positive: 3007
false positive: 0
false negative: 0
true negative: 8074
true positive: 727
false positive: 58
false negative: 18
true negative: 1968
score: 97.257308 %