如何计算 conv2d_transpose 的输出形状?

数据挖掘 神经网络 张量流 卷积神经网络 生成模型
2021-09-14 02:55:01

目前我编写了一个 GAN 来生成 MNIST 数字,但生成器不想工作。首先,我选择每批形状为 100 的 z,放入一个图层以形成形状 (7,7, 256)。然后将 conv2d_transpose 层转换为 28、28、1。(这基本上是一个 mnist pic)

我有两个问题 1.) 这段代码不明显。你有什么线索,为什么?2.) 我非常了解转置卷积的工作原理,但我找不到任何资源来计算给定输入、步幅和特定于 Tensorflow 的内核大小的输出大小。我找到的有用信息是https://arxiv.org/pdf/1603.07285v1.pdf,但是 Tensorflow 中的填充例如非常不同。你能帮助我吗?

mb_size = 32 #Size of image batch to apply at each iteration.
X_dim = 784
z_dim = 100
h_dim = 7*7*256
dropoutRate = 0.7
alplr = 0.2 #leaky Relu


def generator(z, G_W1, G_b1, keepProb, first_shape):

    G_W1 = tf.Variable(xavier_init([z_dim, h_dim]))
    G_b1 = tf.Variable(tf.zeros(shape=[h_dim]))    


    G_h1 = lrelu(tf.matmul(z, G_W1) + G_b1, alplr)
    G_h1Drop = tf.nn.dropout(G_h1, keepProb)  # drop out

    X = tf.reshape(G_h1Drop, shape=first_shape)
    out = create_new_trans_conv_layer(X, 256, INPUT_CHANNEL, [3, 3], [2,2], "transconv1", [-1, 28, 28, 1])    
    return out




# new transposed cnn
def create_new_trans_conv_layer(input_data, num_input_channels, num_output_channels, filter_shape, stripe, name, output_shape):
    # setup the filter input shape for tf.nn.conv_2d
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_output_channels, num_input_channels]


    # initialise weights and bias for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
                          name=name + '_W')
    bias = tf.Variable(tf.truncated_normal([num_input_channels]), name=name + '_b')

    # setup the convolutional layer operation
    conv1 = tf.nn.conv2d_transpose(input_data, weights, output_shape, [1, stripe[0], stripe[1], 1], padding='SAME')

    # add the bias
    conv1 += bias

    # apply a ReLU non-linear activation

    conv1 = lrelu(conv1, alplr)

    return conv1


...


    _, G_loss_curr = sess.run(
        [G_solver, G_loss],
        feed_dict={z: sample_z(mb_size, z_dim), keepProb: 1.0} #training generator
4个回答

这是计算输出大小的正确公式tf.layers.conv2d_transpose()

# Padding==Same:
H = H1 * stride

# Padding==Valid
H = (H1-1) * stride + HF

其中,H= 输出大小,H1= 输入大小,HF= 过滤器的高度

e.g., if `H1` = 7, Stride = 3, and Kernel size = 4, 

With padding=="same", output size = 21, 
with padding=="valid", output size = 22

要对此进行测试(在 tf 1.4.0 中验证):

import tensorflow as tf
import numpy as np

x = tf.placeholder(dtype=tf.float32, shape=(None, 7, 7, 32))
dcout = tf.layers.conv2d_transpose(x, 64, 4, 3, padding="valid")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    xin = np.random.rand(1,7,7,32)
    out = sess.run(dcout, feed_dict={x:xin})
    print(out.shape)

查看 的源代码,它在计算其输出大小时tf.keras.Conv2DTranspose调用该函数。deconv_output_length接受的答案与您在此处找到的答案之间存在细微差别:

def deconv_output_length(input_length, filter_size, padding,
                         output_padding=None, stride=0, dilation=1):
  """Determines output length of a transposed convolution given input length.
  Arguments:
      input_length: Integer.
      filter_size: Integer.
      padding: one of `"same"`, `"valid"`, `"full"`.
      output_padding: Integer, amount of padding along the output dimension.
          Can be set to `None` in which case the output length is inferred.
      stride: Integer.
      dilation: Integer.
  Returns:
      The output length (integer).
  """
  assert padding in {'same', 'valid', 'full'}
  if input_length is None:
    return None

  # Get the dilated kernel size
  filter_size = filter_size + (filter_size - 1) * (dilation - 1)

  # Infer length if output padding is None, else compute the exact length
  if output_padding is None:
    if padding == 'valid':
      # note the call to `max` below!
      length = input_length * stride + max(filter_size - stride, 0)
    elif padding == 'full':
      length = input_length * stride - (stride + filter_size - 2)
    elif padding == 'same':
      length = input_length * stride

  else:
    if padding == 'same':
      pad = filter_size // 2
    elif padding == 'valid':
      pad = 0
    elif padding == 'full':
      pad = filter_size - 1

    length = ((input_length - 1) * stride + filter_size - 2 * pad +
              output_padding)
  return length

我在调用上方添加了评论max

的公式padding == 'valid'H = H1 * stride + max(HF - stride, 0),仅与 @Manish P 的答案不同时stride < HF这个给我带来了麻烦,所以我想我会把它贴在这里。

而不是使用tf.nn.conv2d_transpose你可以使用tf.layers.conv2d_transpose 它是一个包装层,不需要输入输出形状,或者如果你想计算输出形状,你可以使用公式:

H = (H1 - 1)*stride + HF - 2*padding
H - height of output image i.e H = 28 
H1 - height of input image i.e H1 = 7 
HF - height of filter 

这里的答案给出了有效的数字,但他们没有提到卷积转置操作有多种可能的输出形状。实际上,如果输出形状完全由其他参数确定,则无需指定它。

卷积运算的输出大小为

# padding=="SAME" 
conv_out = ceil(conv_in/stride)

# padding=="VALID" 
conv_out = ceil((conv_in-k+1)/stride)    

其中conv_in是输入大小,k是内核大小。在 OP 的链接中,这些填充方法分别称为“半填充”和“无填充”。

打电话时

tf.nn.conv2d_transpose(value, filter, output_shape, strides)

我们需要output_shape参数是张量的形状,如果与filter和卷积strides,将产生与 形状相同的张量value由于四舍五入,当 时有多个这样的形状stride>1具体来说,我们需要

dconv_in-1 <= (dconv_out-k)/s <= dconv_in 
==> 
(dconv_in-1)s + k <= dconv_out <= (dconv_in)s + k

如果 dconv_in = 7, k = 4, stride = 3

# with SAME padding
dconv_out = 19 or 20 or 21

# with VALID padding
dconv_out = 22 or 23 or 24

API 会自动计算 output_shape(这tf.layers似乎是 VALID 填充的最小可能值和 SAME 填充可能的最大值)。这通常很方便,但如果您试图恢复先前卷积张量的形状,例如在自动编码器中,也会导致形状不匹配。例如

import tensorflow as tf
import numpy as np


k=22
cin = tf.placeholder(tf.float32, shape=(None, k+1,k+1,64))
w1 = tf.placeholder(tf.float32, shape=[4,4,64,32])
cout = tf.nn.conv2d(cin, w1, strides=(1,3,3,1), padding="VALID")               
f_dict={cin:np.random.rand(1,k+1,k+1,64),
        w1:np.random.rand(4,4,64,32)}

dcout1 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1), 
        padding="VALID", output_shape=[1,k,k,64])
dcout2 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1), 
        padding="VALID", output_shape=[1,k+1,k+1,64])
dcout_layers = tf.layers.conv2d_transpose(cout, 64, 4, 3, padding="VALID")


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    inp_shape = sess.run(cin, feed_dict=f_dict).shape
    conv_shape = sess.run(cout, feed_dict=f_dict).shape
    lyrs_shape = sess.run(rcout, feed_dict=f_dict).shape
    nn_shape1 = sess.run(dcout1, feed_dict=f_dict).shape
    nn_shape2 = sess.run(dcout2, feed_dict=f_dict).shape


    print("original input shape:", inp_shape)
    print("shape after convolution:", conv_shape)
    print("recovered output shape using tf.layers:", lyrs_shape)
    print("one possible recovered output shape using tf.nn:", nn_shape1)
    print("another possible recovered output shape using tf.nn:", nn_shape2)

>>> original input shape: (1, 23, 23, 64)
>>> shape after convolution: (1, 8, 8, 32)
>>> recovered output shape using tf.layers: (1, 22, 22, 64)
>>> one possible recovered output shape using tf.nn: (1, 22, 22, 64)
>>> another possible recovered output shape using tf.nn: (1, 23, 23, 64)