我尝试使用 629,145 行和 24 个特征的样本来标准化训练数据:
from sklearn import datasets
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('mydata.csv', dtype='object')
#manually choosing 24 features
X=df.loc[:, ['Bwd Pkt Len Min','Subflow Fwd Byts','TotLen Fwd Pkts','TotLen Fwd Pkts','Bwd Pkt Len Std','Flow IAT Min',
'Fwd IAT Min','Flow IAT Mean','Flow Duration','Flow IAT Std','Active Min','Active Mean','Fwd IAT Min',
'Bwd IAT Mean','Fwd IAT Mean','Init Fwd Win Byts','ACK Flag Cnt','Fwd PSH Flags','SYN Flag Cnt','Fwd Pkts/s',
'Bwd Pkts/s','Init Bwd Win Byts','PSH Flag Cnt','Pkt Size Avg']]
Y= df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4,random_state=42) # 60% training and 40% test
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
#Create a svm Classifier
clf = svm.SVC(kernel='rbf') # not linear Kernel
clf.fit(X_train, y_train)
已经 6 小时,SVM 没有收敛。与 RF 一样,相同的数据与其他算法的收敛速度非常快,我知道这很正常,因为与 KNN 和 RF 相比,SVM 被认为是计算量高的算法。我阅读了很多问题/答案和文章。
- 我想知道如何直观地跟踪和分析问题(可能将一些图表绘制为成本函数或?)。
- 参数调整(C 参数)是否有帮助并加快速度?
- 你有什么建议?
非常感谢