即使训练集的准确性很高,Keras 1D CNN 也总是预测相同的结果

数据挖掘 机器学习 深度学习 分类 喀拉斯 美国有线电视新闻网
2022-02-18 04:24:02

我的 1D CNN 的验证准确度停留在 0.5,这是因为我总是从平衡的数据集中得到相同的预测。同时,我的训练准确度不断提高,损失按预期减少。

奇怪的是,如果我model.evaluate()在我的训练集上做(在最后一个 epoch 的准确率接近 1),准确率也将是 0.5。这里的准确率与上一个 epoch 的训练准确率有何不同?我还尝试使用 1 的批量大小进行训练和评估,但问题仍然存在。

好吧,我一直在寻找不同的解决方案,但仍然没有运气。我已经调查过的可能问题:

  1. 我的数据集得到了适当的平衡和洗牌;
  2. 我的标签是正确的;
  3. 尝试添加全连接层;
  4. 尝试从全连接层添加/删除 dropout;
  5. 尝试了相同的架构,但最后一层有 1 个神经元和 sigmoid 激活;
  6. 尝试改变学习率(下降到 0.0001 但仍然是同样的问题)。

这是我的代码:

import pathlib
import numpy as np
import ipynb.fs.defs.preprocessDataset as preprocessDataset
import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import Input
from tensorflow.keras.layers import Conv1D, BatchNormalization, Activation, MaxPooling1D, Flatten, Dropout, Dense
from tensorflow.keras.optimizers import SGD

main_folder = pathlib.Path.cwd().parent
datasetsFolder=f'{main_folder}\\datasets'
trainDataset = preprocessDataset.loadDataset('DatasetTime_Sg12p5_Ov75_Train',datasetsFolder)
testDataset = preprocessDataset.loadDataset('DatasetTime_Sg12p5_Ov75_Test',datasetsFolder)

X_train,Y_train,Names_train=trainDataset[0],trainDataset[1],trainDataset[2]
X_test,Y_test,Names_test=testDataset[0],testDataset[1],testDataset[2]

model = Sequential()

model.add(Input(shape=X_train.shape[1:]))

model.add(Conv1D(16, 61, strides=1, padding="same"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(2, strides=2, padding="valid"))

model.add(Conv1D(32, 3, strides=1, padding="same"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(2, strides=2, padding="valid"))

model.add(Conv1D(64, 3, strides=1, padding="same"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(2, strides=2, padding="valid"))

model.add(Conv1D(64, 3, strides=1, padding="same"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(2, strides=2, padding="valid"))

model.add(Conv1D(64, 3, strides=1, padding="same"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dropout(0.5))

model.add(Dense(200))
model.add(Activation('relu'))

model.add(Dense(2))
model.add(Activation('softmax'))

opt = SGD(learning_rate=0.01)

model.compile(loss='binary_crossentropy',optimizer=opt,metrics=['accuracy'])

model.summary()

model.fit(X_train,Y_train,epochs=10,shuffle=False,validation_data=(X_test, Y_test))

model.evaluate(X_train,Y_train)

这是model.fit():

model.fit(X_train,Y_train,epochs=10,shuffle=False,validation_data=(X_test, Y_test))

Epoch 1/10
914/914 [==============================] - 277s 300ms/step - loss: 0.6405 - accuracy: 0.6543 - val_loss: 7.9835 - val_accuracy: 0.5000
Epoch 2/10
914/914 [==============================] - 270s 295ms/step - loss: 0.3997 - accuracy: 0.8204 - val_loss: 19.8981 - val_accuracy: 0.5000
Epoch 3/10
914/914 [==============================] - 273s 298ms/step - loss: 0.2976 - accuracy: 0.8730 - val_loss: 1.9558 - val_accuracy: 0.5002
Epoch 4/10
914/914 [==============================] - 278s 304ms/step - loss: 0.2897 - accuracy: 0.8776 - val_loss: 20.2678 - val_accuracy: 0.5000
Epoch 5/10
914/914 [==============================] - 277s 303ms/step - loss: 0.2459 - accuracy: 0.8991 - val_loss: 5.4945 - val_accuracy: 0.5000
Epoch 6/10
914/914 [==============================] - 268s 294ms/step - loss: 0.2008 - accuracy: 0.9181 - val_loss: 32.4579 - val_accuracy: 0.5000
Epoch 7/10
914/914 [==============================] - 271s 297ms/step - loss: 0.1695 - accuracy: 0.9317 - val_loss: 14.9538 - val_accuracy: 0.5000
Epoch 8/10
914/914 [==============================] - 276s 302ms/step - loss: 0.1423 - accuracy: 0.9452 - val_loss: 1.4420 - val_accuracy: 0.4988
Epoch 9/10
914/914 [==============================] - 266s 291ms/step - loss: 0.1261 - accuracy: 0.9497 - val_loss: 4.3830 - val_accuracy: 0.5005
Epoch 10/10
914/914 [==============================] - 272s 297ms/step - loss: 0.1142 - accuracy: 0.9548 - val_loss: 1.6054 - val_accuracy: 0.5009

这是model.evaluate():

model.evaluate(X_train,Y_train)

914/914 [==============================] - 35s 37ms/step - loss: 1.7588 - accuracy: 0.5009

这是model.summary():

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 4096, 16)          992       
_________________________________________________________________
batch_normalization (BatchNo (None, 4096, 16)          64        
_________________________________________________________________
activation (Activation)      (None, 4096, 16)          0         
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 2048, 16)          0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 2048, 32)          1568      
_________________________________________________________________
batch_normalization_1 (Batch (None, 2048, 32)          128       
_________________________________________________________________
activation_1 (Activation)    (None, 2048, 32)          0         
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1024, 32)          0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1024, 64)          6208      
_________________________________________________________________
batch_normalization_2 (Batch (None, 1024, 64)          256       
_________________________________________________________________
activation_2 (Activation)    (None, 1024, 64)          0         
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 512, 64)           0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 512, 64)           12352     
_________________________________________________________________
batch_normalization_3 (Batch (None, 512, 64)           256       
_________________________________________________________________
activation_3 (Activation)    (None, 512, 64)           0         
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 256, 64)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 256, 64)           12352     
_________________________________________________________________
batch_normalization_4 (Batch (None, 256, 64)           256       
_________________________________________________________________
activation_4 (Activation)    (None, 256, 64)           0         
_________________________________________________________________
flatten (Flatten)            (None, 16384)             0         
_________________________________________________________________
dropout (Dropout)            (None, 16384)             0         
_________________________________________________________________
dense (Dense)                (None, 200)               3277000   
_________________________________________________________________
activation_5 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 402       
_________________________________________________________________
activation_6 (Activation)    (None, 2)                 0         
=================================================================
Total params: 3,311,834
Trainable params: 3,311,354
Non-trainable params: 480
_________________________________________________________________

下面是 X_train 和 Y_train 的前 5 行:

[[ 3.602187e-04]
 [ 8.075248e-04]
 [ 4.319834e-04]
 ...
 [ 3.011377e-05]
 [-1.693150e-04]
 [-8.542318e-05]] [0. 1.]

[[ 2.884359e-04]
 [-6.340756e-05]
 [-5.905452e-06]
 ...
 [-9.305983e-05]
 [ 1.345304e-04]
 [-1.366256e-04]] [0. 1.]

[[ 7.720405e-04]
 [ 6.031118e-05]
 [ 6.691568e-04]
 ...
 [-6.443140e-05]
 [ 1.998355e-04]
 [ 5.839724e-05]] [1. 0.]

[[-3.294961e-04]
 [ 6.234528e-05]
 [-2.861797e-04]
 ...
 [-4.983012e-04]
 [ 3.897884e-04]
 [-1.014846e-05]] [0. 1.]

[[-0.0001479 ]
 [ 0.00037975]
 [-0.00024007]
 ...
 [ 0.00018743]
 [ 0.00044564]
 [-0.00025613]] [0. 1.]
2个回答
  • 二元交叉熵损失函数是基于只有一个输出节点的假设,它的值可以在 0-1 之间。

  • 如果您有两个以上的输出并使用 softmax 激活函数,则应使用分类交叉熵损失函数来处理多类情况。

  • 但是在您的场景中,存在二进制分类问题,因此您只需要 1 个输出节点,并且最后一个激活函数需要是 sigmoid 函数,用于在 0 和 1 之间调整输出。

      y_train = y_train[:,0]
      y_test = y_test[:,0]
    

模型应该是这样的:

    model.add(Dense(200))
    model.add(Activation('relu'))    
    model.add(Dense(20))
    model.add(Activation('relu')) 
    model.add(Dense(1))
    model.add(Activation('sigmoid'))    
    opt = SGD(learning_rate=0.01)    
    model.compile(loss='binary_crossentropy',optimizer=opt,metrics=['accuracy'])

我的问题的解决方案是实现批量重规范化:BatchNormalization(renorm=True). 此外,规范化输入有助于大大提高神经网络的整体性能。