我正在尝试使用 Pytorch 闪电(只有一个神经元的网络)执行简单的线性回归。网络应该学习一个简单的功能:y=-4x.
我的数据集大小为 1000,包含来自y=-4x具有少量高斯噪声的线的点。数据集如下所示:
我面临一个奇怪的问题,即模型仅在批量足够小且我不对每个批次中的随机数据进行混洗时才会收敛。
下图显示了模型在 100 个 epoch 后收敛到的斜率、截距和最终训练损失,作为批量大小的函数,当数据没有打乱时:


我们可以看到,只有当batch size足够小时,模型才会收敛到正确的解(正确的解是intercept=0,slope=-4)。
下面显示的是相同的实验,但这次我在每个批次中使用了混洗数据:
我们可以看到,无论批次大小如何,模型甚至都没有收敛到正确的解决方案!

以下是用于生成此实验结果的代码:
import torch
import pytorch_lightning as pl
from torch.utils.data import DataLoader, Dataset
from torch.nn import functional as F
import numpy as np
import matplotlib.pyplot as plt
class LineFunction(Dataset):
def __init__(self, a, b, n, x0=0, x1=1, noise=0):
self._a = a
self._b = b
self._x = torch.linspace(x0, x1, n, requires_grad=False)
self._x = self._x.unsqueeze(1)
self._noise = torch.distributions.normal.Normal(torch.tensor([0.0]), torch.tensor([1.0])).sample((n,)) * noise
self._noise = self._noise.unsqueeze(1)
def __len__(self):
return len(self._x)
def __getitem__(self, idx):
x = self._x[idx]
y = self._a * x + self._b + self._noise[idx]
return x, y
class LinReg(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer_1 = torch.nn.Linear(1, 1)
def forward(self, x):
return self.layer_1(x)
def training_step(self, train_batch, batch_idx):
x, y = train_batch
preds = self.forward(x)
loss = F.mse_loss(preds, y)
self.log('train_loss', loss)
return loss
def validation_step(self, val_batch, batch_idx):
x, y = val_batch
preds = self.forward(x)
loss = F.mse_loss(preds, y)
self.log('val_loss', loss)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-2)
return optimizer
batch_sizes = [int(x) for x in np.linspace(10, 1000, 100)]
slopes = []
intercepts = []
train_loss = []
shuffle = True
for batch_size in batch_sizes:
train_dataloader = DataLoader(LineFunction(-4, 0, 1000, noise=0.05), batch_size=batch_size, shuffle=shuffle)
test_dataloader = DataLoader(LineFunction(-4, 0, 1000, noise=0.05), batch_size=batch_size, shuffle=shuffle)
model = LinReg()
trainer = pl.Trainer(max_epochs=100)
trainer.fit(model, train_dataloader, test_dataloader)
slopes.append(float(model.layer_1.weight.data))
intercepts.append(float(model.layer_1.bias.data))
train_loss.append(trainer.logged_metrics['train_loss'])
fig, axes = plt.subplots(nrows=3, figsize=(15, 10))
for ax, measure, measure_name in zip(axes, [slopes, intercepts, train_loss], ['slope', 'intercept', 'train_loss']):
ax.set_xlabel('batch size')
ax.set_ylabel(measure_name)
ax.plot(batch_sizes, measure)
plt.show()
我在这里有点难过。为什么这个简单的模型并不总是收敛?我也尝试过尝试不同的学习率,但这似乎并没有解决我的问题。