在 Lasagne 教程(此处和此处的源代码)中,在 MNIST 数据集上训练了一个简单的多层感知器。数据分为训练集和验证集,训练计算每个时期的验证误差,表示为每批次的平均交叉熵误差。
但是,验证误差总是低于训练误差。为什么会这样?训练误差不应该更低,因为它是网络训练的数据吗?这可能是 dropout 层的结果(在训练期间启用,但在验证错误计算期间禁用)?
前几个时期的输出:
Epoch 1 of 500 took 1.858s
training loss: 1.233348
validation loss: 0.405868
validation accuracy: 88.78 %
Epoch 2 of 500 took 1.845s
training loss: 0.571644
validation loss: 0.310221
validation accuracy: 91.24 %
Epoch 3 of 500 took 1.845s
training loss: 0.471582
validation loss: 0.265931
validation accuracy: 92.35 %
Epoch 4 of 500 took 1.847s
training loss: 0.412204
validation loss: 0.238558
validation accuracy: 93.05 %