我有带有 8000 个观察值和许多不同列的 pandas 数据框 - 其中一些是日期和时间,它们被转换为虚拟变量(get_dummies),有些是数字,而 y 标签是我想要预测的,它有两个类:0 或1:
>>>y x1 x2 x3 x4 10 11 12 13
0 1 0.532 0.431 0.214 0.11 0 0 1 0
1 0 0.512 0.410 0.340 0.09 0 1 0 0
...
我在 scikit learn 中使用了随机森林模型来预测 y 列(处理)是 1 还是 0。
结果让我感到惊讶,因为当我打印出准确度时,我得到了 -inf 并且我没有预料到这一点。
这就是我计算和运行射频的方式:
X = df.drop(['y',],axis=1)
y = df['y'].astype(int)
#split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=123)
###Random forest
#function for evaluate:
def evaluate(model, test_features, test_labels):
predictions = model.predict(test_features)
errors = abs(predictions - test_labels)
mape = 100 * np.mean(errors / test_labels)
accuracy = 100 - mape
print('Model Performance')
print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
print('Accuracy = {:0.2f}%.'.format(accuracy))
return accuracy
#random forest
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)
predictions = rfc.predict(X_test)
base_model = RandomForestClassifier()
base_model.fit(X_train, y_train)
base_accuracy = evaluate(base_model, X_test, y_test)
>>>Model Performance
Average Error: 0.3290 degrees.
Accuracy = -inf%.
所以我的问题是 -inf% 准确度是什么意思?模型的问题在哪里?
