数据挖掘 - 卷积神经网络中的层形状计算（pyTorch） - 吾爱随笔录

卷积神经网络中的层形状计算（pyTorch）

数据挖掘美国有线电视新闻网火炬

2021-10-12 06:02:59

您如何知道预期的输入大小（图像输入大小（张量大小）），例如对于这个网络（参见pyTorch 教程示例）：

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

因为它没有明确说明。此外评论

self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension

不清楚。这个形状： (16 * 6 * 6, 120) 与图像大小有什么关系（例如本教程作者声称的 32x32）？通过查看代码，我无法找到一种方法来了解网络期望的输入大小？

1个回答

好吧，使用 pyTorch 中的卷积层，除了通道数/深度之外，您不需要指定输入大小。但是，您需要为全连接层指定它。因此，在定义第一个线性层的输入维度时，您必须知道您输入的图像的大小。

您可以在此处和此处或此处找到有关卷积层和池化层的输出大小计算的信息

如果您输入大小为 32x32 的图像，则此模型的逐层输出为：

conv1 ： $6$ 大小的特征图 $\left \lfloor{\frac{32 + 2\times0 -1\times(3-1)-1}{1}+1}\right \rfloor = 30$
max_pool2d： $6$ 大小的特征图 $\left \lfloor{\frac{30 + 2\times0 -1\times(2-1)-1}{2}+1}\right \rfloor = 15$
conv2 ： $16$ 大小的特征图 $\left \lfloor{\frac{15+ 2\times0 -1\times(3-1)-1}{1}+1}\right \rfloor = 13$
max_pool2d： $16$ 大小的特征图 $\left \lfloor{\frac{13+ 2\times0 -1\times(2-1)-1}{2}+1}\right \rfloor = 6$

因此，对于第一个线性层之前的输出的扁平化大小是 $16\times6\times6$ ：

 self.fc1 = nn.Linear(16 * 6 * 6, 120)

通过反向进行所有尺寸计算，您可能会发现输入尺寸必须是 $32\times32$ .

其它你可能感兴趣的问题

上一篇如何为模型的准确性创建图下一篇二元分类机器学习模型中的高方差意味着什么？