我正在阅读 Udacity 教程,其中给出了一些数据点,练习是测试以下哪些模型最适合数据:线性回归、决策树或 SVM。使用sklearn,我能够确定 SVM 是最适合的,其次是决策树。当应用这两种算法时,我得到了一个非常明显的决策边界:
上述形状是否有任何具体原因,还是仅取决于数据集?
代码非常简单;只需读取 CSV,分离特征,然后应用如下所示的算法:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
import pandas
import numpy
# Read the data
data = pandas.read_csv('data.csv')
# Split the data into X and y
X = numpy.array(data[['x1', 'x2']])
y = numpy.array(data['y'])
# import statements for the classification algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
# Logistic Regression Classifier
classifier = LogisticRegression()
classifier.fit(X,y)
# Decision Tree Classifier
classifier = GradientBoostingClassifier()
classifier.fit(X,y)
# Support Vector Machine Classifier
classifier = SVC()
classifier.fit(X,y)

