我已经使用决策树和随机森林制作了一个模型。但是,当我尝试在同一个 DataFrame 上测试模型时,输出是不同的。这怎么可能?
我的仓库中的数据文件:
#This is the function to help me preparing the dataframe
def process_df_for_ml(df):
"""
Process a dataframe for model training/prediction use.
Returns X/y tensors.
"""
df = df.copy()
# Map salary to 0,1,2
df.salary = df.salary.map({"low": 0, "medium": 1, "high": 2})
# dropping left and sales X for the df, y for the left
X = df.drop(["left", "sales"], axis=1)
y = df["left"]
return (X, y)
我使用了决策树:
from sklearn import tree
from sklearn.model_selection import train_test_split
# Train a decision tree.
X, y = process_df_for_ml(df)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, stratify=y)
clftree = tree.DecisionTreeClassifier(max_depth=3)
clftree.fit(X_train, y_train)
使用 test_score:0.96。之后,我将这个决策树测试到相同的 df,我得到的输出是 [424 行 x 11 列]
然后我尝试使用随机森林算法
X, y = process_df_for_ml(df)
from sklearn.model_selection import train_test_split
# implementing train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0, stratify=y)
from sklearn.ensemble import RandomForestClassifier
# random forest model creation
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)
# predictions
rfc_predict = rfc.predict(X_test)
使用 test_score:0.99。之后,我将这个 RandomForest 测试到相同的 df,我得到的输出是 [11 行 x 11 列]。
这怎么可能?这是我的作品的链接: DecisionTree和RandomForest