三个问题:
“我们如何定义对机器学习的信心以及如何生成它(如果不是由 scikit-learn 自动生成的话)?” 这是关于机器学习中不同类型的置信度度量的一个很好的总结。使用的具体指标取决于您生成的算法/模型。
“我如何对新实例进行分类,然后只选择置信度最高的实例?并且,
“如果我有超过 2 个潜在课程,我应该在这种方法中改变什么?”
这是一个快速脚本,您可以使用它扩展您开始使用的内容,因此您可以了解如何处理任意数量的类并为每个预测示例找到可能的类。我喜欢 numpy 和 pandas(如果你使用 sklearn,你可能会使用它们)。
from sklearn import neighbors
import pandas as pd
import numpy as np
number_of_classes = 3 # number of possible classes
number_of_features = 2 # number of features for each example
train_size = 20 # number of training examples
predict_size = 5 # number of examples to predict
# Generate a random 2-variable training set with random classes assigned
X = np.random.randint(100, size=(train_size, 2))
y = np.random.randint(number_of_classes, size=train_size)
# initialize NearestNeighbor classifier
knn = neighbors.KNeighborsClassifier(n_neighbors=3)
# train model
knn.fit(X, y)
# values to predict classes for
predict = np.random.randint(100, size=(predict_size, 2))
print "generated examples to predict:\n",predict,"\n"
# predict class probabilities for each class for each value and convert to DataFrame
probs = pd.DataFrame(knn.predict_proba(predict))
print "all probabilities:\n", probs, "\n"
for c in range(number_of_classes):
likely=probs[probs[c] > 0.5]
print "class" + str(c) + " probability > 0.5:\n", likely
print "indexes of likely class" + str(c) + ":", likely.index.tolist(), "\n"