我有一个二元分类问题(良性/恶意),我已经应用了带有一个隐藏层的简单神经网络来解决这个问题。我的数据集中有 46 个特征,对于隐藏层,我使用的是 46/2。我的数据也没有缩放,所以我使用的是标准缩放器。我的代码准确率约为 99.79%。然而准确率出乎意料,我预计在 93-94% 左右,我担心我泄露了一些数据或犯了一些愚蠢的错误。
def create_baseline():
model = Sequential()
model.add(Dense(23, input_dim=46, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=1000, batch_size=len(X), validation_split=0.15, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
scores = cross_val_score(pipeline, X, y, cv=kfold)
print("%.2f%%" % (scores[1]*100))
我使用了大约 5871 个恶意样本和 3488 个良性样本。
只是为了交叉检查,我在没有交叉验证的情况下实现了,这是一个非常简单的实现,而且我也得到了大约 99% 的准确度。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
scale = StandardScaler(with_mean=0, with_std=1)
new_X_train = scale.fit_transform(X_train)
new_X_test = scale.transform(X_test)
model = Sequential()
model.add(Dense(23, input_dim=46, kernel_initializer="normal"))
model.add(PReLU(alpha_initializer='zero', weights=None))
model.add(Dense(1, kernel_initializer='normal'))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(new_X_train, y_train, epochs=1000, batch_size=len(X_train), validation_split=0.15)
scores = model.evaluate(new_X_test, y_test)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print(history.history.keys())
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


