在训练期间,神经网络稳定在一个位置,它总是预测 5 个类别中的 1 个。
我的训练集和测试集分布如下:
Train Set
Samples: 269,501. Features: 157
Data distribution
16.24% 'a'
39.93% 'b'
9.31% 'c'
20.86% 'd'
13.67% 'e'
Test Set
Samples: 33,967. Features: 157
Data distribution
10.83% 'a'
35.39% 'b'
19.86% 'c'
16.25% 'd'
17.66% 'e'
注意班级的百分比b
!
我正在训练一个带有 dropout 的 mlp,并且训练和测试(又名验证)的准确性都处于稳定状态,完全匹配我的 5 个班级中的 1 个的训练和测试分布,即它正在学习总是预测 5 个班级中的 1 个班级!我已经验证了分类器总是在预测b
。
我已经尝试batch_size
了 0.25 和 1.0 并确保数据被重新洗牌。我尝试了有SGD
和Adam
没有衰减和不同学习率的优化器,但结果仍然相同。尝试了 0.2 和 0.5 的 dropout。EarlyStopping
300 个时代。
每隔一段时间,我就会遇到这样一种情况,在训练过程中,它会跳出训练准确度和验证准确度的固定位置,但随后验证总是下降而训练上升——或者换句话说,过度拟合。
输出,在 6 个 epoch 后切断。仅仅使用这个特殊的 SGD 优化器,它并不总是收敛得这么快:
Epoch 1/2000
Epoch 00000: val_acc improved from -inf to 0.35387, saving model to /home/user/src/thing/models/weights.hdf
269501/269501 [==============================] - 0s - loss: 1.6094 - acc: 0.1792 - val_loss: 1.6073 - val_acc: 0.3539
Epoch 2/2000
Epoch 00001: val_acc did not improve
269501/269501 [==============================] - 0s - loss: 1.6060 - acc: 0.3993 - val_loss: 1.6042 - val_acc: 0.3539
Epoch 3/2000
Epoch 00002: val_acc did not improve
269501/269501 [==============================] - 0s - loss: 1.6002 - acc: 0.3993 - val_loss: 1.6005 - val_acc: 0.3539
Epoch 4/2000
Epoch 00003: val_acc did not improve
269501/269501 [==============================] - 0s - loss: 1.5930 - acc: 0.3993 - val_loss: 1.5967 - val_acc: 0.3539
Epoch 5/2000
Epoch 00004: val_acc did not improve
269501/269501 [==============================] - 0s - loss: 1.5851 - acc: 0.3993 - val_loss: 1.5930 - val_acc: 0.3539
Epoch 6/2000
代码:模型创建:
def create_mlp(input_dim, output_dim, dropout=0.5, arch=None):
"""Setup neural network model (keras.models.Sequential)"""
# default mlp architecture
arch = arch if arch else [64,32,32,16]
# setup densely connected NN architecture (MLP)
model = Sequential()
model.add(Dropout(dropout, input_shape=(input_dim,)))
for output in arch:
model.add(Dense(output, activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(dropout))
model.add(Dense(output_dim, activation='sigmoid'))
# compile model and save architecture to disk
sgd = SGD(lr=0.01, momentum=0.9, decay=0.0001, nesterov=True)
# adam = Adam(lr=0.001, decay=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
return model
经过一些预处理后在 main 内部:
# labels must be one-hot encoded for loss='categorical_crossentropy'
# meaning, of possible labels 0,1,2: 0->[1,0,0]; 1->[0,1,0]; 2->[0,0,1]
y_train_onehot = to_categorical(y_train, n_classes)
y_test_onehot = to_categorical(y_test, n_classes)
# get neural network architecture and save to disk
model = create_mlp(input_dim=train_dim, output_dim=n_classes)
with open(clf_file(typ='arch'), 'w') as f:
f.write(model.to_yaml())
# output logs to tensorflow TensorBoard
# NOTE: don't use param histogram_freqs until keras issue fixed
# https://github.com/fchollet/keras/pull/5175
tensorboard = TensorBoard(log_dir=opts.tf_dir)
# only save model weights for best performing model
checkpoint = ModelCheckpoint(clf_file(typ='weights'),
monitor='val_acc',
verbose=1,
save_best_only=True)
# stop training early if validation accuracy doesn't improve for long enough
early_stopping = EarlyStopping(monitor='val_acc', patience=300)
# shuffle data for good measure before fitting
x_train, y_train_onehot = shuffle(x_train, y_train_onehot)
np.random.seed(seed)
model.fit(x_train, y_train_onehot,
nb_epoch=opts.epochs,
batch_size=train_batch_size,
shuffle=True,
callbacks=[tensorboard, checkpoint, early_stopping],
validation_data=(x_test,y_test_onehot))