训练损失和验证损失是按样本还是按批次绘制的?

人工智能 卷积神经网络 目标函数 监督学习 火炬
2021-10-31 20:11:10

我正在使用 CNN 对一些数据进行训练,其中训练大小 = 21700 个样本,测试大小为 653 个样本,并且说我使用的 batch_size 为 500(我也在考虑批量大小的样本)。

我一直在寻找这个很长时间,但无法得到明确的答案,但是在绘制损失函数以检查模型是否过度拟合时,我是否绘制如下

for j in range(num_epochs):
  <some training code---Take gradient descent step do wonders>
  batch_loss=0
  for i in range(num_batches_train):
       batch_loss = something....criterion(target,output)...
       total_loss += batch_loss
  Losses_Train_Per_Epoch.append(total_loss/num_samples_train)#and this is 

我需要帮助的地方

Losses_Train_Per_Epoch.append(total_loss/num_batches_train)
and doing the same for Losses_Validation_Per_Epoch.
plt.plot(Losses_Train_Per_Epoch, Losses_Validation_Per_epoch)

所以,基本上,我要问的是,我应该除以 num_samples 还是 num_batches 还是 batch_size?哪一个?

1个回答

您想计算所有批次的平均损失。您需要做的是将批次损失的总和除以批次数!

在你的情况下:

你有一个训练集21700样本和批量大小500. 这意味着你采取21700/50043训练迭代。这意味着对于每个时期,模型都会更新43次!所以你计算训练损失的方式,就是你需要除以的。

注意:我不确定您到底要绘制什么,但我假设您要绘制训练损失和验证损失

training_loss = []
validation_loss = []
training_steps = num_samples // batch_size
validation_steps = num_validation_samples // batch_size

for epoch in range(num_epochs):

    # Training steps
    total_loss = 0
    for b in range(training_steps):
        batch_loss = ...  # compute batch loss
        total_loss += batch_loss
    training_loss.append(total_loss / training_steps)

    # Validation steps
    total_loss = 0
    for b in range(validation_steps):
        batch_loss = ...  # compute batch validation loss
        total_loss += batch_loss
    training_loss.append(total_loss / validation_steps)

# Plot training and validation curves
plt.plot(range(num_epochs), training_loss)
plt.plot(range(num_epochs), validation_loss)

另一种方法是将损失存储在列表中并计算平均值。如果您不确定要划分什么,可以使用它。

...

for epoch in range(num_epochs):

    list_of_batch_losses = []  # initialize list that is going to store batch losses

    # Training steps
    for b in range(training_steps):
        batch_loss = ...  # compute batch loss
        list_of_batch_losses.append(batch_loss)  # store loss in a list

    epoch_loss = np.mean(list_of_batch_losses)
    training_loss.append(epoch_loss)

    ...

plt.plot(range(num_epochs), training_loss)