机器算法验证 - Conv1D 和 Conv2D 有什么区别？ - 吾爱随笔录

Conv1D 和 Conv2D 有什么区别？

机器算法验证机器学习神经网络卷积神经网络喀拉斯

2022-02-03 02:34:14

我正在浏览 keras 卷积文档，我发现了两种类型的 convultuion Conv1D 和 Conv2D。我做了一些网络搜索，这就是我对 Conv1D 和 Conv2D 的理解；Conv1D 用于序列，Conv2D 用于图像。

我一直认为卷积神经网络仅用于图像，并以这种方式可视化 CNN

图像被认为是一个大矩阵，然后一个过滤器将在这个矩阵上滑动并计算点积。我相信 keras 提到的 Conv2D。如果 Conv2D 以这种方式工作，那么 Conv1D 的机制是什么，我们如何想象它的机制？

4个回答

我想以一种非常简单的方法直观地详细解释差异（代码中的注释）。

让我们首先检查TensorFlow 中的 Conv2D。

c1 = [[0, 0, 1, 0, 2], [1, 0, 2, 0, 1], [1, 0, 2, 2, 0], [2, 0, 0, 2, 0], [2, 1, 2, 2, 0]]
c2 = [[2, 1, 2, 1, 1], [2, 1, 2, 0, 1], [0, 2, 1, 0, 1], [1, 2, 2, 2, 2], [0, 1, 2, 0, 1]]
c3 = [[2, 1, 1, 2, 0], [1, 0, 0, 1, 0], [0, 1, 0, 0, 0], [1, 0, 2, 1, 0], [2, 2, 1, 1, 1]]
data = tf.transpose(tf.constant([[c1, c2, c3]], dtype=tf.float32), (0, 2, 3, 1))
# we transfer [batch, in_channels, in_height, in_width] to [batch, in_height, in_width, in_channels]
# where batch = 1, in_channels = 3 (c1, c2, c3 or x[:, :, 0], x[:, :, 1], x[:, :, 2] in the gif), in_height and in_width are all 5(the sizes of the blue matrices without padding) 
f2c1 = [[0, 1, -1], [0, -1, 0], [0, -1, 1]]
f2c2 = [[-1, 0, 0], [1, -1, 0], [1, -1, 0]]
f2c3 = [[-1, 1, -1], [0, -1, -1], [1, 0, 0]]
filters = tf.transpose(tf.constant([[f2c1, f2c2, f2c3]], dtype=tf.float32), (2, 3, 1, 0))
# transfer the [out_channels, in_channels, filter_height, filter_width] to [filter_height, filter_width, in_channels, out_channels]
# out_channels is 1(in the gif it is 2 since here we only use one filter W1), in_channels is 3 because data has three channels(c1, c2, c3), filter_height and filter_width are all 3(the sizes of the filter W1)
# f2c1, f2c2, f2c3 are the w1[:, :, 0], w1[:, :, 1] and w1[:, :, 2] in the gif
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=2, padding=[[0, 0], [1, 1], [1, 1], [0, 0]]))
# this is just the o[:,:,1] in the gif
# <tf.Tensor: id=93, shape=(3, 3), dtype=float32, numpy=
# array([[-8., -8., -3.],
#        [-3.,  1.,  0.],
#        [-3., -8., -5.]], dtype=float32)>

Conv1D 是 Conv2D 的一个特例，如 Conv1D的 TensorFlow 文档中本段所述。

在内部，此操作会重塑输入张量并调用 tf.nn.conv2d。例如，如果data_format不以“NC”开头，则将一个形状为[batch, in_width, in_channels]的张量reshape为[batch, 1, in_width, in_channels]，并且filter被reshape为[1, filter_width, in_channels, out_channels]。然后将结果重新整形为 [batch, out_width, out_channels]（其中 out_width 是步幅和填充的函数，如 conv2d 中一样）并返回给调用者。

让我们看看如何将 Conv1D 转换为 Conv2D 问题。由于 Conv1D 通常用于 NLP 场景，我们可以在下面的 NLP 问题中说明这一点。

cat = [0.7, 0.4, 0.5]
sitting = [0.2, -0.1, 0.1]
there = [-0.5, 0.4, 0.1]
dog = [0.6, 0.3, 0.5]
resting = [0.3, -0.1, 0.2]
here = [-0.5, 0.4, 0.1]
sentence = tf.constant([[cat, sitting, there, dog, resting, here]]
# sentence[:,:,0] is equivalent to x[:,:,0] or c1 in the first example and the same for sentence[:,:,1] and sentence[:,:,2]
data = tf.reshape(sentence), (1, 1, 6, 3))
# we reshape [batch, in_width, in_channels] to [batch, 1, in_width, in_channels] according to the quote above
# each dimension in the embedding is a channel(three in_channels)
f3c1 = [0.6, 0.2]
# equivalent to f2c1 in the first code snippet or w1[:,:,0] in the gif
f3c2 = [0.4, -0.1]
# equivalent to f2c2 in the first code snippet or w1[:,:,1] in the gif
f3c3 = [0.5, 0.2]
# equivalent to f2c3 in the first code snippet or w1[:,:,2] in the gif
# filters = tf.constant([[f3c1, f3c2, f3c3]])
# [out_channels, in_channels, filter_width]: [1, 3, 2]
# here we also have only one filter and also three channels in it. Please compare these three with the three channels in W1 for the Conv2D in the gif
filter1D = tf.transpose(tf.constant([[f3c1, f3c2, f3c3]]), (2, 1, 0))
# shape: [2, 3, 1] for the conv1d example
filters = tf.reshape(filter1D, (1, 2, 3, 1))  # this should be expand_dim actually
# transpose [out_channels, in_channels, filter_width] to [filter_width, in_channels, out_channels]] and then reshape the result to [1, filter_width, in_channels, out_channels] as we described in the text snippet from Tensorflow doc of conv1doutput
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=(1, 1, 2, 1), padding="VALID"))
# the numbers for strides are for [batch, 1, in_width, in_channels] of the data input
# <tf.Tensor: id=119, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>

让我们使用 Conv1D（也在 TensorFlow 中）来做到这一点：

output = tf.squeeze(tf.nn.conv1d(sentence, filter1D, stride=2, padding="VALID"))
# <tf.Tensor: id=135, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>
# here stride defaults to be for the in_width

我们可以看到，Conv2D 中的 2D 意味着输入和过滤器中的每个通道都是 2 维的（如我们在 gif 示例中看到的），而 Conv1D 中的 1D 意味着输入和过滤器中的每个通道都是 1 维的（如我们在 cat 中看到的）和狗 NLP 示例）。

卷积是一种数学运算，您可以在其中将张量或矩阵或向量“汇总”为更小的一个。如果您的输入矩阵是一维的，那么您可以在维度上进行汇总，如果张量有 n 维，那么您可以沿所有 n 维度进行汇总。Conv1D 和 Conv2D 沿一维或二维进行汇总（卷积）。

例如，您可以将一个向量卷积为一个较短的向量，如下所示。得到一个具有 n 个元素的“长”向量 A，并使用具有 m 个元素的权重向量 W 将其卷积为具有 n-m+1 个元素的“短”（汇总）向量 B：其中

b_{i} = \sum_{j = m - 1}^{0} a_{i + j} * w_{j}

$b_i=\sum_{j=m-1}^0 a_{i+j}*w_j$

i = [1, n - m + 1]

$i=[1,n-m+1]$

因此，如果您有长度为 n 的向量，并且您的权重矩阵也是长度 n，那么卷积将产生一个标量或长度为 1 的向量，该向量等于输入矩阵中所有值的平均值。如果您愿意，这是一种退化卷积。如果相同的权重矩阵比输入矩阵短一个，那么你会在长度为 2 的输出中得到一个移动平均值等。 $w_i=1/n$

[\begin{matrix} a : & a_{1} & a_{2} & a_{3} \\ w : & 1 / 2 & 1 / 2 \\ w : & 1 / 2 & 1 / 2 \end{matrix}] = [\begin{matrix} b : & \frac{a_{1} + a_{2}}{2} & \frac{a_{2} + a_{3}}{2} \end{matrix}]

$\begin{bmatrix} a:&a_1 & a_2 & a_3\\ w:&1/2 & 1/2&\\ w:&&1/2 & 1/2\\ \end{bmatrix}=\begin{bmatrix} b:&\frac{a_1+a_2} 2 & \frac{a_2+a_3} 2 \end{bmatrix}$

您可以以相同的方式对 3 维张量（矩阵）执行相同操作：其中

b_{i k l} = \sum_{j_{1} = m_{1} - 1 j_{2} = m_{2} - 1 j_{3} = m_{4} - 1}^{0} a_{i + j_{1}, k + j_{2}, l + j_{3}} * w_{j_{1} j_{2} j_{3}}

$b_{ikl}=\sum_{j_1=m_1-1\\j_2=m_2-1\\j_3=m_4-1}^{0} a_{i+j_1,k+j_2,l+j_3}*w_{j_1j_2j_3}$

i = [1, n_{1} - m_{1} + 1], k = [1, n_{2} - m_{2} + 1], l = [1, n_{3} - m_{3} + 1]

$i=[1,n_1-m_1+1],k=[1,n_2-m_2+1],l=[1,n_3-m_3+1]$

这种一维卷积可以节省成本，它以相同的方式工作，但假设一个一维数组与元素相乘。如果您想可视化考虑行或列的矩阵，即当我们相乘时，我们得到一个形状相同但值较低或较高的数组，因此它有助于最大化或最小化值的强度。

这张图可能对你有帮助

详情请参阅 https://www.youtube.com/watch?v=qVP574skyuM

我将使用 Pytorch 透视图，但是，逻辑保持不变。

使用 Conv1d() 时，我们必须记住，我们最有可能使用二维输入，例如单热编码 DNA 序列或黑白图片。

更传统的 Conv2d() 和 Conv1d() 之间的唯一区别是后者使用一维内核，如下图所示。

在这里，输入数据的高度变成了“深度”（或 in_channels），我们的行变成了内核大小。例如，

import torch
import torch.nn as nn

tensor = torch.randn(1,100,4)
output = nn.Conv1d(in_channels =100,out_channels=1,kernel_size=1,stride=1)(tensor)
#output.shape == [1,1,4]

我们可以看到内核自动跨越到图片的高度（就像在 Conv2d() 中内核的深度自动跨越图像的通道），因此我们剩下的就是相对于跨度的内核大小行。

我们只需要记住，如果我们假设一个二维输入，我们的过滤器就变成了我们的列，我们的行变成了内核大小。

其它你可能感兴趣的问题

上一篇如何计算巨大稀疏矩阵的 SVD？下一篇基于熵的 Shalizi 的贝叶斯反向时间箭头悖论的反驳？