我正在构建一个神经网络来识别来自 Kaggle 的 jumpGestRecog 数据集的手势。在训练时,我遇到了一些问题。
我通过添加带有相应标签的每个图像的镜像版本来扩充我的数据。每个图像都是 120x320 像素,灰度,我的批量大小约为 100(我的内存不允许我有更多)。我正在使用 pytorch,我已将数据分成 24000 张训练图像、10000 张验证图像和 6000 张测试集图像。我建立的模型如下所示:
class Model(nn.Module):
def __init__(self,input_size=32, hidden_size=64,n_classes=10):
""" Define our model """
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1,input_size, kernel_size=(3,3),stride=(1,1),padding=1)
self.relu1 = nn.ReLU()
self.maxp1 = nn.MaxPool2d(kernel_size=(2,2))
self.conv2 = nn.Conv2d(input_size,hidden_size,kernel_size=(3,3),padding=1)
self.relu2 = nn.ReLU()
self.maxp2 = nn.MaxPool2d(kernel_size=(2,2))
self.conv3 = nn.Conv2d(hidden_size,128,kernel_size=3,padding=1)
self.maxp3 = nn.MaxPool2d(kernel_size=(2,2))
self.l1 = nn.Linear(128 * 15 * 40,128)
self.relul = nn.LeakyReLU()
self.l2 = nn.Linear(128,n_classes)
self.soft = nn.Softmax(1)
def forward(self, x):
""" The forward pass of our model """
x = self.conv1(x)
x = self.relu1(x)
x = self.maxp1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.maxp2(x)
x = self.conv3(x)
x = self.maxp3(x)
x = x.view(x.size(0),-1)
x = self.l1(x)
x = self.relu2(x)
x = self.l2(x)
x = self.relul(x)
x = self.soft(x)
return x
训练过程对于 pytorch 神经网络来说是相当标准的。
def train_model(model,train_data,valid_data,learning_rate,num_epochs,optimizer,criterion):
""" Training procedure of the model together with accuracy and loss for both data sets """
train_loss = np.zeros(num_epochs)
valid_loss = np.zeros(num_epochs)
train_accuracy = np.zeros(num_epochs)
valid_accuracy = np.zeros(num_epochs)
"""begin training"""
for epoch in range(num_epochs):
train_losses = []
train_correct= 0
total_items = 0
valid_losses = []
valid_correct = 0
for images,labels in train_data:
images = images.float()
labels = labels.long()
optimizer.zero_grad()
"""add to GPU hopefully"""
images = images.to(device)
labels = labels.to(device)
"""Forward pass"""
outputs = model.forward(images)
loss = criterion(outputs,labels)
"""Backward pass"""
loss.backward()
optimizer.step()
"""staticstics"""
train_losses.append(loss.item())
_, predicted = torch.max(outputs.data,1)
train_correct += (predicted == labels).sum().item()
total_items += labels.size(0)
train_loss[epoch] = np.mean(train_losses)
train_accuracy[epoch] = (1 * train_correct/total_items)
with torch.no_grad():
correct_val = 0
total_val = 0
for images,labels in valid_data:
images = images.float()
labels = labels.long()
images = images.to(device)
labels = labels.to(device)
outputs = model.forward(images)
loss = criterion(outputs, labels)
valid_losses.append(loss.item())
_, predicted = torch.max(outputs.data, 1)
correct_val += (predicted == labels).sum().item()
total_val += labels.size(0)
valid_loss[epoch] = np.mean(valid_losses)
valid_accuracy[epoch] = (1 * correct_val/total_val)
print("Epoch: [{},{}], train accuracy: {:.4f}, valid accuracy: {:.4f}, train loss: {:.4f}, valid loss: {:.4f}"
.format(num_epochs,epoch+1,train_accuracy[epoch],valid_accuracy[epoch],train_loss[epoch],valid_accuracy[epoch]))
return model, train_accuracy, train_loss, valid_accuracy, valid_loss
这就是我所说的火车功能。
network = Model()
network = network.to(device)
optimizer = torch.optim.SGD(network.parameters(),lr=0.01,momentum=0.9)
criterion = nn.CrossEntropyLoss()
model, train_accuracy, train_loss, valid_accuracy, valid_loss = train_model(model=network,train_data=train,valid_data=valid,learning_rate=0.01,num_epochs=100,optimizer=optimizer,criterion=criterion)
print("Ready")
现在,无论我做什么,我的训练损失和准确性以及验证损失和准确性始终保持不变。
Epoch: [100,1], train accuracy: 0.0999, valid accuracy: 0.1050, train loss: 2.3612, valid loss: 0.1050
Epoch: [100,2], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,3], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,4], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,5], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,6], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,7], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,8], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,9], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,10], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,11], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
Epoch: [100,12], train accuracy: 0.1000, valid accuracy: 0.1050, train loss: 2.3611, valid loss: 0.1050
也许我拟合不足,为了防止我增加模型的复杂性,我还尝试将学习率从 0.001 更改为 0.01 等,但无论我尝试什么,我都无法修复它。有人知道出了什么问题,更重要的是有人可以帮我解决吗?
