VAE 生成不良图像。由于不平衡的损失函数?

数据挖掘 机器学习 Python 损失函数 自动编码器 vae
2022-02-04 19:15:52

我正在使用TensorFlow.kerasCelebA数据集上训练变分自动编码器

我面临的问题是生成的图像不够多样化,看起来有点糟糕。

(新)示例:

新图片

我的想法

  • 这很糟糕,因为重建和 KL 损失是不平衡的。
  • 我阅读了这个问题并遵循了它的解决方案 - 阅读了KL annealing并尝试自己实现它,但没有奏效。

笔记:

  • 这是我第一次使用自动编码器,所以也许我错过了一些明显的东西。

  • 如果你能给出一个程序/技术解决方案,而不是一个带有方程和复杂数学的理论解决方案,那将非常感激

损失函数:

def r_loss(self, y_true, y_pred):
    return K.mean(K.square(y_true - y_pred), axis=[1, 2, 3])

def kl_loss(self, y_true, y_pred):
    return  -0.5 * K.sum(1 + self.sd_layer - K.square(self.mean_layer) - K.exp(self.sd_layer), axis=1)

def total_loss(self, y_true, y_pred):
    return K.mean(self.r_loss(y_true, y_pred) + self.kl_loss(y_true, y_pred))

编码器:

    def build_encoder(self):

        conv_filters = [32, 64, 64, 64]
        conv_kernel_size = [3, 3, 3, 3]
        conv_strides = [2, 2, 2, 2]

        # Number of Conv layers
        n_layers = len(conv_filters)

        # Define model input
        x = self.encoder_input

        # Add convolutional layers
        for i in range(n_layers):
            x = Conv2D(filters=conv_filters[i],
                       kernel_size=conv_kernel_size[i],
                       strides=conv_strides[i],
                       padding='same',
                       name='encoder_conv_' + str(i)
                       )(x)
            if self.use_batch_norm: # True
                x = BatchNormalization()(x)

            x = LeakyReLU()(x)

            if self.use_dropout: # False
                x = Dropout(rate=0.25)(x)

        # Required for reshaping latent vector while building Decoder
        self.shape_before_flattening = K.int_shape(x)[1:]

        x = Flatten()(x)

        self.mean_layer = Dense(self.encoder_output_dim, name='mu')(x)
        self.sd_layer = Dense(self.encoder_output_dim, name='log_var')(x)


        # Defining a function for sampling
        def sampling(args):
            mean_mu, log_var = args
            epsilon = K.random_normal(shape=K.shape(mean_mu), mean=0., stddev=1.)
            return mean_mu + K.exp(log_var / 2) * epsilon

            # Using a Keras Lambda Layer to include the sampling function as a layer

        # in the model
        encoder_output = Lambda(sampling, name='encoder_output')([self.mean_layer, self.sd_layer])

        return Model(self.encoder_input, encoder_output, name="VAE_Encoder")

解码器:

def build_decoder(self):
    conv_filters = [64, 64, 32, 3]
    conv_kernel_size = [3, 3, 3, 3]
    conv_strides = [2, 2, 2, 2]

    n_layers = len(conv_filters)

    # Define model input
    decoder_input = self.decoder_input

    # To get an exact mirror image of the encoder
    x = Dense(np.prod(self.shape_before_flattening))(decoder_input)
    x = Reshape(self.shape_before_flattening)(x)

    # Add convolutional layers
    for i in range(n_layers):
        x = Conv2DTranspose(filters=conv_filters[i],
                            kernel_size=conv_kernel_size[i],
                            strides=conv_strides[i],
                            padding='same',
                            name='decoder_conv_' + str(i)
                            )(x)

        # Adding a sigmoid layer at the end to restrict the outputs
        # between 0 and 1
        if i < n_layers - 1:
            x = LeakyReLU()(x)
        else:
            x = Activation('sigmoid')(x)

    # Define model output
    self.decoder_output = x

    return Model(decoder_input, self.decoder_output, name="VAE_Decoder")

组合模型:

def build_autoencoder(self):
    self.encoder = self.build_encoder()
    self.decoder = self.build_decoder()

    # Input to the combined model will be the input to the encoder.
    # Output of the combined model will be the output of the decoder.
    self.autoencoder = Model(self.encoder_input, self.decoder(self.encoder(self.encoder_input)),
                             name="Variational_Auto_Encoder")

    self.autoencoder.compile(optimizer=self.adam_optimizer, loss=self.total_loss,
                             metrics=[self.total_loss],
                             experimental_run_tf_function=False)
    self.autoencoder.summary()

编辑:

潜在大小为256,采样方法如下;

def generate(self, image=None):
    if not os.path.exists(self.sample_dir):
        os.makedirs(self.sample_dir)
    if image is None:
        img = np.random.normal(size=(9, self.encoder_output_dim))

        prediction = self.decoder.predict(img)

        op = np.vstack((np.hstack((prediction[0], prediction[1], prediction[2])),
                        np.hstack((prediction[3], prediction[4], prediction[5])),
                        np.hstack((prediction[6], prediction[7], prediction[8]))))
        print(op.shape)
        op = cv2.resize(op, (self.input_size * 9, self.input_size * 9), interpolation=cv2.INTER_AREA)
        op = cv2.cvtColor(op, cv2.COLOR_BGR2RGB)
        cv2.imshow("generated", op)
        cv2.imwrite(self.sample_dir + "generated" + str(r(0, 9999)) + ".jpg", (op * 255).astype("uint8"))

    else:
        img = cv2.imread(image, cv2.IMREAD_UNCHANGED)
        img = cv2.resize(img, (self.input_size, self.input_size), interpolation=cv2.INTER_AREA)
        img = img.astype("float32")
        img = img / 255

        prediction = self.autoencoder.predict(img.reshape(1, self.input_size, self.input_size, 3))
        img = cv2.resize(prediction[0][:, :, ::-1], (960, 960), interpolation=cv2.INTER_AREA)

        cv2.imshow("prediction", img)

        cv2.imwrite(self.sample_dir + "generated" + str(r(0, 9999)) + ".jpg", (img * 255).astype("uint8"))
1个回答

问题在于您的抽样程序。VAE 的目的是训练一个神经网络,即解码器,它从正态分布 ,并将它们映射到图像,使得图像遵循原始图像分布编码器的工作本质上是促进解码器的训练,但对于采样来说,它不是必需的。zp(z)xp(x)

无关,并将其映射到潜在空间。编码器经过训练,可以将图像映射到潜在空间,而不是噪声,因此编码很差。p(x)

相比,像素中具有正态分布值的图像可能都类似地“错误” ,因此它们被映射到潜在空间中的相似域,因此产生相似的输出。p(x)

为了生成新样本,您只需要解码器,因此不要对具有正态分布像素值的图像进行采样,而是对 256 维的正态分布向量进行采样,然后仅将它们传递给解码器。

旁注:对我来说,您没有在编码器的末尾/解码器的开头使用具有非线性的全连接层,这似乎有点奇怪。如果它仅适用于从最后一个特征映射到潜在空间的线性映射,那很好,但直观地说,我会假设应该至少有一个具有非线性激活的全连接层。但同样,如果它有效,那么不要担心。