所以我是机器学习的新手,目前正在使用 iris 数据集。我浏览了一个关于预测股票价格的快速在线教程,并认为我会尝试自己做鸢尾花。
我遇到的问题是我正在使用预处理来缩放数据以训练我的分类器。但是,当我做出预测时,答案也会按比例缩放。当我注释掉所有的预处理时,我得到了准确的结果。有没有办法缩小预测?
输出四舍五入为 0、1 或 2,每个数字代表三个物种之一。
你可以在下面看到我的代码:
import pandas as pd
import numpy as np
from sklearn import preprocessing, model_selection
from sklearn.linear_model import LinearRegression
df = pd.read_csv("iris.csv")
# setosa - 0
# versicolor - 1
# virginica - 2
df = df.replace("setosa", 0)
df = df.replace("versicolor", 1)
df = df.replace("virginica", 2)
X = np.array(df.drop(['species'], 1))
y = np.array(df['species'])
# Scale features
# X = preprocessing.scale(X)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=1) # Linear regression clf
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)
print("Confidence: " + confidence)
# Inputs
sepal_length = float(input("Enter sepal length: "))
sepal_width = float(input("Enter sepal width: "))
petal_length = float(input("Enter petal length: "))
petal_width = float(input("Enter petal width: "))
# Create panda data frame with inputted data
index = [0]
d = {'sepal_length': sepal_length, 'sepal_width': sepal_width, 'petal_length': petal_length, 'petal_width': petal_width}
predict_df = pd.DataFrame(data=d, index=index)
# Create np array of features
predict_X = np.array(predict_df)
# Need to scale new X feature values
# predict_X = preprocessing.scale(predict_X, axis=1)
# Make a prediction against prediction features
prediction = clf.predict(predict_X)
print(predict_X, prediction)
rounded_prediction = int(round(prediction[0]))
if rounded_prediction == 0:
print("== Predicted as Setosa ==")
elif rounded_prediction == 1:
print("== Predicted as Versicolor ==")
elif rounded_prediction == 2:
print("== Predicted as Virginica ==")
else:
print("== Unable to make a prediction ==")
这是启用预处理的我的输出示例。我将使用 CSV 中的一条线作为示例(6.4 萼片长度、3.2 萼片宽度、4.5 花瓣长度和 1.5 花瓣宽度),它应该等于杂色物种 (1):
Confidence: 0.9449475378336242
Enter sepal length: 6.4
Enter sepal width: 3.2
Enter petal length: 4.5
Enter petal width: 1.5
[[ 1.39427847 -0.39039797 0.33462683 -1.33850733]] [0.41069281]
== Predicted as Setosa ==
现在预处理注释掉了:
Confidence: 0.9132522144785978
Enter sepal length: 6.4
Enter sepal width: 3.2
Enter petal length: 4.5
Enter petal width: 1.5
[[6.4 3.2 4.5 1.5]] [1.29119283]
== Predicted as Versicolor ==
看来我要么做错了预处理,要么我错过了一个额外的步骤。如果我弄错了一些术语,我很抱歉,并提前感谢您的回答。