数据挖掘 - 为什么我的训练损失没有改变？ - 吾爱随笔录

我正在尝试基于此架构训练语义分割模型，并以此为基础。基本模型使用了大约 10 个 ReLU 激活，当按照第一篇论文实现时，这个数字跃升至 14。

输入图像的尺寸为 216 x 64，输出标签可以是 8 个类别之一。

这是完整的模型实现。

我已经编写了一个自定义训练步骤，因为论文需要它：

@tf.function
def train_step(batch_size, x_batch, y_batch, loss_func):
    with tf.GradientTape() as tape:

        # print(y_batch_train.shape)
        
        logits_strong, logits_weak = model(x_batch, training=True)  # Logits for this minibatch
        # logits = tf.concat([logits_strong[0:batch_size//2], logits_weak[batch_size//2:batch_size]], 0)
        loss_strong_value = loss_func(y_batch[0:batch_size//2], logits_strong[0:batch_size//2])
        loss_weak_value = loss_func(y_batch[batch_size//2:batch_size], logits_weak[batch_size//2:batch_size])
        loss_value = loss_strong_value + loss_weak_value
        
        # loss_value = loss_func(y_batch, logits)
        # tf.print(loss_value.shape)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        train_acc_metric.update_state(y_batch[0:batch_size//2], logits_strong[0:batch_size//2])
        train_acc_metric.update_state(y_batch[batch_size//2:], logits_weak[batch_size//2:])
        # train_acc_metric.update_state(y_batch, logits)
        return loss_value

def train(model, start_epoch, num_epochs,train_dataset, optimizer, model_path, train_acc_metric, loss_fn=customized_loss, model_weights=None,):
    """
    Run a for loop with number of epochs. Run an inner for loop for each minibatch and get logits_strong and logits_weak. 
    Drop second half of logits_strong, and first half of logits_weak. Compute cross entropy loss separately and add.
    Finally, compute grads and apply. 
    
    Save model and weights after every 20 or so epochs.
    Save losses and acc for each epoch and plot after epochs are done.

    NOTE: All minibatches need to contain strong labels for first half and weak labels for second half. DO NOT SHUFFLE.
    
    Parameters: model, start_epoch, no. of epochs, optimizer, path to model, metric for train acc, model weights, loss func.

    """
    train_acc=[]
    batch_size=16
    epochs = num_epochs
    end_epoch=start_epoch + num_epochs

    if model_weights:
        load_status = model.load_weights(model_path + f"/weights/{model_weights}")
        load_status.assert_consumed()
    


    for epoch in range(start_epoch, end_epoch):
        print(f"\nStart of epoch {epoch}")
        start = time.time()
        # Iterate over the batches of the dataset.
        for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

            loss_value = train_step(batch_size, x_batch_train, y_batch_train, loss_fn)
            

            # print(loss_value.shape, len(model.trainable_weights))

            # Log every 200 batches.
            if step % 5 == 0:
                print(
                    "Training loss (for one batch) at step %d: %.4f"
                    % (step, float(np.sum(loss_value)))
                )
                print("Seen so far: %s samples" % ((step + 1) * batch_size))
                
        
        train_acc_epoch = train_acc_metric.result()
        train_acc.append(train_acc_epoch)
        print("Training acc over epoch: %.4f" % (float(train_acc_epoch),))
        print("Time taken: %.2fs" % (time.time() - start))

        # Reset training metrics at the end of each epoch
        train_acc_metric.reset_states()
        if epoch % 10 == 0:
            model.save_weights(model_path + f"/weights/ckpt_DB_{start_epoch}_{end_epoch}")

以下是优化器和损失详细信息：

optimizer = tf.keras.optimizers.SGD(learning_rate = 0.001, momentum = 0.9, nesterov = True)

#Calculation of the dice co-efficient based on actual and predicted labels
def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred):
    return 1-dice_coef(y_true, y_pred)

#Combined loss of weighted multi-class logistic loss and dice loss
def customized_loss(y_true,y_pred):
    # print("Shape of ground truth:", y_true.shape)
    # print("Shape of prediction:", y_pred.shape)
    return (1*K.categorical_crossentropy(y_true, y_pred))+(0.5*dice_coef_loss(y_true, y_pred)) # + 0.01*np.linalg.norm())

当我尝试训练它（在一个相对较小的数据集上）时，输出如下：

Start of epoch 0
Training loss (for one batch) at step 0: 566345.3750
Seen so far: 16 samples
Training loss (for one batch) at step 5: 1526504.7500
Seen so far: 96 samples
Training loss (for one batch) at step 10: 1538868.5000
Seen so far: 176 samples
Training loss (for one batch) at step 15: 1445873.7500
Seen so far: 256 samples
Training loss (for one batch) at step 20: 1514306.7500
Seen so far: 336 samples
Training loss (for one batch) at step 25: 1492221.5000
Seen so far: 416 samples
Training loss (for one batch) at step 30: 1438761.3750
Seen so far: 496 samples
Training acc over epoch: 0.8664
Time taken: 13.09s

Start of epoch 1
Training loss (for one batch) at step 0: 1411657.2500
Seen so far: 16 samples
Training loss (for one batch) at step 5: 1526504.7500
Seen so far: 96 samples
Training loss (for one batch) at step 10: 1538868.5000
Seen so far: 176 samples
Training loss (for one batch) at step 15: 1445873.7500
Seen so far: 256 samples
Training loss (for one batch) at step 20: 1514306.7500
Seen so far: 336 samples
Training loss (for one batch) at step 25: 1492221.5000
Seen so far: 416 samples
Training loss (for one batch) at step 30: 1438761.3750
Seen so far: 496 samples
Training acc over epoch: 0.8944
Time taken: 10.71s

Start of epoch 2
Training loss (for one batch) at step 0: 1411657.2500
Seen so far: 16 samples
Training loss (for one batch) at step 5: 1526504.7500
Seen so far: 96 samples
Training loss (for one batch) at step 10: 1538868.5000
Seen so far: 176 samples
Training loss (for one batch) at step 15: 1445873.7500
Seen so far: 256 samples
Training loss (for one batch) at step 20: 1514306.7500
Seen so far: 336 samples
Training loss (for one batch) at step 25: 1492221.5000
Seen so far: 416 samples
Training loss (for one batch) at step 30: 1438761.3750
Seen so far: 496 samples
Training acc over epoch: 0.8944
Time taken: 10.69s

2 到 20 之后的每个 epoch 对相同的批次给出相同的损失和准确度。我也怀疑从一开始的高精度。我已经尝试将 lr 降低到 0.001，但没有变化。

这可能是一个分歧问题吗？垂死的 ReLU？ 最重要的是，我该如何解决它？