如何解决我的高验证损失和不准确性

数据挖掘 张量流 机器学习模型 过拟合 显卡
2022-03-10 02:36:51

机器学习新手,尝试训练我的鸟类识别模型,发现验证损失和不准确性非常高。

我正在使用这个数据集:http: //www.vision.caltech.edu/visipedia/CUB-200-2011.html

我的模型是否过拟合?我能做些什么来修复它?

这是图表 在此处输入图像描述

在此处输入图像描述

这是我的代码。

def train_CNN(train_directory, target_size=(200, 200), classes=None,
              batch_size=128, num_epochs=20, num_classes=5, verbose=0, show_graph=False):

    CHECKPOINT_DIRECTORY = './checkpoints'
    SAVE_CHECKPOINT_PATH = CHECKPOINT_DIRECTORY + \
        '/{epoch:02d}_{val_acc:.4f}.h5'
    if not os.path.exists(CHECKPOINT_DIRECTORY):
            os.makedirs(CHECKPOINT_DIRECTORY)

    train_datagen = ImageDataGenerator(rescale=1. / 255, validation_split=0.1)

    train_generator = train_datagen.flow_from_directory(
        train_directory,  # This is the source directory for training images
        target_size=target_size,  # All images will be resized to 200 x 200
        batch_size=batch_size,
        classes=classes,
        subset='training',
        class_mode='categorical')

    val_generator = train_datagen.flow_from_directory(
        train_directory,  
        target_size=target_size,  # All images will be resized to 200 x 200
        batch_size=batch_size,
        classes=classes,
        subset='validation',
        class_mode='categorical')

    input_shape = tuple(list(target_size)+[3])

    # Model architecture
    model = tf.keras.models.Sequential([
        # Note the input shape is the desired size of the image 200x 200 with 3 bytes color
        # The first convolution
        tf.keras.layers.Conv2D(
            16, (3, 3), activation='relu', input_shape=input_shape),
        tf.keras.layers.MaxPooling2D(2, 2),
        # The second convolution
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        # The third convolution
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        # The fourth convolution
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        # The fifth convolution
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        # Flatten the results to feed into a dense layer
        tf.keras.layers.Flatten(),
        # 512 neuron in the fully-connected layer
        tf.keras.layers.Dense(512, activation='relu'),

        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])

    # Optimizer and compilation
    model.compile(loss='categorical_crossentropy',
                  optimizer=RMSprop(lr=0.001),
                  metrics=['accuracy'])

    # Create a callback that saves the model's weights
    model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=SAVE_CHECKPOINT_PATH,
                                                     save_weights_only=False,
                                                     save_best_only=True,
                                                     monitor='val_acc', 
                                                     mode='max',  # related to the value of monitor
                                                     verbose=1)

    tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir='./logs/',
        histogram_freq=1,
        batch_size=batch_size)

    reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3,
        min_lr=1e-6)


    INITIAL_EPOCH = 0

    # Training
    history = model.fit_generator(
            train_generator,
            steps_per_epoch=train_generator.samples // batch_size, #int(total_sample/batch_size),
            validation_data=val_generator,
            validation_steps=val_generator.samples // batch_size,
            epochs=num_epochs,
            verbose=verbose,
            initial_epoch= INITIAL_EPOCH,
            callbacks=[model_checkpoint_callback, tensorboard_callback, reduce_lr_callback])

    if show_graph == True:
        visualizeTraining(history)

    return model


def visualizeTraining(history):
  graphFolder = 'graph'
  graphViz = graphFolder + '/graph.jpeg'
  graphVizLoss = graphFolder + '/loss.jpeg'
  if not os.path.exists(graphFolder):
    os.makedirs(graphFolder)
  plt.figure()
  plt.plot(history.history['acc'])
  plt.plot(history.history['val_acc'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Validation'], loc='upper left')
  plt.savefig(graphViz)

  # Plot training & validation loss values
  plt.figure()
  plt.plot(history.history['loss'])
  plt.plot(history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Validation'], loc='upper left')
  plt.savefig(graphVizLoss)

这就是我所说的

target_size = (200, 200)
CLASSES = getClassLines(CLASSES_FILE)
model = train_CNN(IMAGES_DIR, target_size, CLASSES, 128, 30, 200, 1, True)
3个回答

训练损失为零,而验证损失增加是过度拟合的明显迹象——同样,准确度结果也表明过度拟合。

我会尝试稍微简化模型:只有 2 层 Conv-MaxPool 对将是一个很好的起点,每层可能有 128 个过滤器?之后可能是一个有 64 或 128 个神经元的密集层。您的模型中似乎有太多参数。

最后,如果您的目标是进行图像分类,您可能需要研究“迁移学习”。

您也可以尝试实施 dropout,作为防止过度拟合的附加策略的一部分

答案可能与您的训练和测试准确度从 0.0 开始这一事实有关,这是不正常的。你有 5 个类,所以准确度应该从 0.2 开始。您的模型似乎不是问题,您在某个地方犯了错误。