数据挖掘 - 在 PyTorch 中确定 Conv 层后 FC 层的大小 - 吾爱随笔录

在 PyTorch 中确定 Conv 层后 FC 层的大小

数据挖掘美国有线电视新闻网火炬

2021-09-27 10:10:39

我正在学习 PyTorch 和 CNN，但对如何计算 Conv2D 层后第一个 FC 层的输入数量感到困惑。

我的网络架构如下所示，这是我使用此处解释的计算的推理。

输入图像将具有形状 (1 x 28 x 28)。

第一个 Conv 层的步幅为 1，填充为 0，深度为 6，我们使用 (4 x 4) 内核。因此输出将是 (6 x 24 x 24)，因为新体积是 (28 - 4 + 2*0)/1。

然后我们将它与 (2 x 2) 内核和步幅 2 合并，因此我们得到 (6 x 11 x 11) 的输出，因为新卷是 (24 - 2)/2。

第二个 Conv 层和池化层也是如此，但这次在 Conv 层中使用 (3 x 3) 内核，最终生成 (16 x 3 x 3) 特征图。

我的假设是第一个线性层应该有 144 个输入（16 * 3 * 3），但是当我以编程方式计算输入时，我得到 400。我错过了什么？

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(400, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))
    
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features # 400, not 144

相关但不那么重要：人们是否使用推理来获得良好的内核大小、层数和池层数，或者每个人都只是看看 SOTA 论文做了什么？

4个回答

您好，欢迎来到 Stack Exchange！

您的问题的答案很简单：您没有使用正确的公式。

您使用的公式是（假设我们使用平方输入）

W^{'} = \frac{W - F + 2 P}{S}

$W'=\frac{W-F+2P}{S}$

但正确的公式是

W^{'} = \frac{W - F + 2 P}{S} + 1

$W'=\frac{W-F+2P}{S}+1$

现在，如果我们从以下开始重做您的计算 $(1 \times 28 \times 28)$ 输入：

W^{(1)} = 28 - 4 + 1 = 25 W^{(2)} = ⌊ \frac{25 - 2}{2} + 1 ⌋ = 12 W^{(3)} = 12 - 3 + 1 = 10 W^{(4)} = ⌊ \frac{10 - 2}{2} + 1 ⌋ = 5

$W^{(1)}=28-4+1=25\\ W^{(2)}=\lfloor\frac{25-2}{2}+1\rfloor=12\\ W^{(3)}=12-3+1=10\\ W^{(4)}=\lfloor\frac{10-2}{2}+1\rfloor=5$

考虑到第二个卷积层有 16 个输出通道（或特征图），您确实可以将输入的数量计算为 $16\cdot5^2=400$ .

如果你愿意给 CNN 额外的输入参数，你可以自动计算。MNIST 的输入昏暗为input_dim=(1,28,28). 所以，我可以这样计算：

import torch
from torch import nn

import functools
import operator

class CNN(nn.Module):
    """Basic Pytorch CNN implementation"""

    def __init__(self, in_channels, out_channels, input_dim):
        nn.Module.__init__(self)
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=20, kernel_size=3, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

            nn.Conv2d(in_channels=20, out_channels=50, kernel_size=3, stride=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
        )

        num_features_before_fcnn = functools.reduce(operator.mul, list(self.feature_extractor(torch.rand(1, *input_dim)).shape))

        self.classifier = nn.Sequential(
            nn.Linear(in_features=num_features_before_fcnn, out_features=100),
            nn.Linear(in_features=100, out_features=out_channels),
        )

    def forward(self, x):
        batch_size = x.size(0)

        out = self.feature_extractor(x)
        out = out.view(batch_size, -1)  # flatten the vector
        out = self.classifier(out)
        return out

您可以使用 torch.nn.AdaptiveMaxPool2d 设置特定的输出。

例如，如果我设置 nn.AdaptiveMaxPool2d((5,7)) 我将强制图像为 5X7。然后你可以将它乘以你之前的 Conv2d 层的 out_channels 。

https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.adapt = nn.AdaptiveMaxPool2d((5,7))
        self.fc1 = nn.Linear(16*5*7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = self.adapt(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*7)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

我在 Pytorch 模型中添加了一种自动确定输入线性层神经元大小的方法，希望它对任何计算困难的人有所帮助。

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
                               #color channel, # of conv layers
        self.conv1 = nn.Conv2d(in_channels= 1, out_channels= 32, kernel_size= 3)
        self.maxpool = nn.MaxPool2d(kernel_size= 2, stride= 2)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.neurons = self.linear_input_neurons()

        self.fc1 = nn.Linear(self.linear_input_neurons(), 1000)
        self.fc2 = nn.Linear(1000, 500)
        self.fc3 = nn.Linear(500, classes)

    def forward(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))
        x = x.view(-1, self.neurons)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

    # here we apply convolution operations before linear layer, and it returns the 4-dimensional size tensor. 
    def size_after_relu(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))

        return x.size()


    # after obtaining the size in above method, we call it and multiply all elements of the returned size.
    def linear_input_neurons(self):
        size = self.size_after_relu(torch.rand(1, 1, 64, 32)) # image size: 64x32
        m = 1
        for i in size:
            m *= i

        return int(m)

其它你可能感兴趣的问题

上一篇当人们说成本函数是你想要最小化的东西时，这是什么意思？下一篇检查时间序列之间的相似性