CNN 中的准确性和损失不会改变。是不是过拟合了?

数据挖掘 机器学习 深度学习 美国有线电视新闻网 word2vec 过拟合
2021-10-08 01:48:08

我的任务是将新闻文章分类为有趣 [1] 或无趣 [0]。我的训练集有4053篇文章,其中179篇是Interesting验证集有664篇文章,其中17篇是Interesting我已经对文章进行了预处理并使用 word2vec 转换为向量。

CNN架构如下:

sentence_length, vector_length = 500, 100
def create_convnet(img_path='../new_out/cnn_model_word2vec.png'):
    input_shape = Input(shape=(sentence_length, vector_length, 1))

    tower_1 = Conv2D(8, (vector_length, 3), padding='same', activation='relu')(input_shape)
    tower_1 = MaxPooling2D((1,vector_length-3+1), strides=(1, 1), padding='same')(tower_1)
    tower_1 = Dropout(0.25)(tower_1)

    tower_2 = Conv2D(8, (vector_length, 4), padding='same', activation='relu')(input_shape)
    tower_2 = MaxPooling2D((1,vector_length-4+1), strides=(1, 1), padding='same')(tower_2)
    tower_2 = Dropout(0.25)(tower_2)

    tower_3 = Conv2D(8, (vector_length, 5), padding='same', activation='relu')(input_shape)
    tower_3 = MaxPooling2D((1, vector_length-5+1), strides=(1, 1), padding='same')(tower_3)
    tower_3 = Dropout(0.25)(tower_3)

    merged = concatenate([tower_1, tower_2, tower_3], axis=1)
    merged = Flatten()(merged)
    dropout1 = Dropout(0.5)(merged)
    out = Dense(1, activation='sigmoid')(dropout1)

    model = Model(input_shape, out)
    plot_model(model, to_file=img_path)
    return model

some_model = create_convnet()
some_model.compile(loss=keras.losses.binary_crossentropy,
              optimizer='adam',
              metrics=['accuracy'])

该模型将验证集中的所有文章预测为Uninteresting [0]。准确率为 97.44%,与验证集中无趣文章的比率相同。我已经尝试过这种架构的变体,但问题仍然存在。

对于实验,我对训练数据本身进行了预测,为此,它也将所有预测预测为Uninteresting [0]。以下是 10 个 epoch 的日志:

some_model.fit_generator(train_gen, train_steps, epochs=num_epoch, verbose=1, callbacks=callbacks_list, validation_data=val_gen, validation_steps=val_steps)
Epoch 1/10
254/253 [==============================] - 447s 2s/step - loss: 0.7119 - acc: 0.9555 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00001: val_loss improved from inf to 0.41266, saving model to ../new_out/cnn_model_word2vec
Epoch 2/10
254/253 [==============================] - 440s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00002: val_loss did not improve
Epoch 3/10
254/253 [==============================] - 440s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00003: val_loss did not improve

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513.
Epoch 4/10
254/253 [==============================] - 448s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00004: val_loss did not improve

Epoch 00004: ReduceLROnPlateau reducing learning rate to 1.0000000474974514e-05.
Epoch 5/10
254/253 [==============================] - 444s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00005: val_loss did not improve

Epoch 00005: ReduceLROnPlateau reducing learning rate to 1.0000000656873453e-06.
Epoch 6/10
254/253 [==============================] - 443s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00006: val_loss did not improve

Epoch 00006: ReduceLROnPlateau reducing learning rate to 1.0000001111620805e-07.
Epoch 7/10
254/253 [==============================] - 443s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00007: val_loss did not improve

Epoch 00007: ReduceLROnPlateau reducing learning rate to 1e-07.
Epoch 8/10
254/253 [==============================] - 443s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00008: val_loss did not improve

Epoch 00008: ReduceLROnPlateau reducing learning rate to 1e-07.
Epoch 9/10
254/253 [==============================] - 444s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00009: val_loss did not improve

Epoch 00009: ReduceLROnPlateau reducing learning rate to 1e-07.
Epoch 10/10
254/253 [==============================] - 440s 2s/step - loss: 0.7099 - acc: 0.9560 - val_loss: 0.4127 - val_acc: 0.9744

Epoch 00010: val_loss did not improve

Epoch 00010: ReduceLROnPlateau reducing learning rate to 1e-07.
Out[3]: <keras.callbacks.History at 0x7f19898b90f0>
1个回答

您的数据集高度不平衡。您的优化过程只是最小化损失函数,并且由于您的训练集非常不平衡,因此无法比预测无兴趣的模型做得更好。此外,您并没有过度拟合,因为您的训练准确度低于验证准确度。

为了让模型学习的东西比你的模型少一些(你可能不得不付出准确性较低的代价),我会做以下事情:当为你的优化器提供一个小批量时,生成一个 mini-更平衡的批次,即使您选择的元素偏向于有趣的文章。例如,如果您的批量大小为 64,请确保它具有 32 个有趣的元素和 32 个不感兴趣的元素。使用它,您的网络可能会开始学习有关其中单词的一些特征,并且原则上它应该可以帮助您实现一个不那么虚拟的预测器。