数据挖掘 - MATLAB 和 scikit-learn 对决策树的准确度不同 - 吾爱随笔录

是否有可能通过使用 python 代码来改变 matlab 和 jupyter notebook 中相同数据集的准确性？

对于相同的数据集，首先我在 matlab 中应用它并获得 96% 的决策树方法准确率，然后我通过使用 python 代码在 jupyter notebook 中应用相同的数据集，我在 C4.5（决策树）中获得 53% 的准确率通过使用 k 折交叉验证。

我不明白为相同的数据集和相同的方法获得不同的准确性的问题出在哪里。

我在python代码中的过程如下：

import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold 

train=pd.read_csv('E://New.csv')
train.head()

# define X and y
feature_cols = ['Past','Family_History','Current','current or previous 
               workplace','diagnosed with a mental health condition by a 
               medical professional?','do you feel that it interferes with 
               your work when being treated effectively?','Gender']
X = train[feature_cols]

# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']

kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')

results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100

std = results.std()*100
print (result)