无法使此自动编码器网络正常运行(使用卷积层和 maxpool 层)

机器算法验证 机器学习 神经网络 降维 无监督学习 自动编码器
2022-03-17 03:08:51

自编码器网络似乎比普通的分类器 MLP 网络要复杂得多。在使用Lasagne进行了几次尝试之后,我在重建输出中得到的所有结果都类似于MNIST数据库的所有图像的模糊平均,而不区分输入数字的实际含义。

我选择的网络结构是以下级联层:

  1. 输入层 (28x28)
  2. 2D 卷积层,滤波器大小 7x7
  3. 最大池化层,大小 3x3,步幅 2x2
  4. 密集(全连接)扁平化层,10 个单元(这是瓶颈)
  5. 密集(全连接)层,121 个单元
  6. 将图层重塑为 11x11
  7. 2D 卷积层,滤波器大小 3x3
  8. 2D 放大层因子 2
  9. 2D 卷积层,滤波器大小 3x3
  10. 2D 放大层因子 2
  11. 2D 卷积层,滤波器大小 5x5
  12. 特征最大池化(从 31x28x28 到 28x28)

所有 2D 卷积层都具有未绑定的偏差、sigmoid 激活和 31 个过滤器。

所有全连接层都有 sigmoid 激活。

使用的损失函数是平方误差,更新函数是adagrad学习块的长度是 100 个样本,乘以 1000 个 epoch。

下面是问题的说明:上排是一些样本集作为网络的输入,下排是重建:

自动编码器输入和输出

为了完整起见,以下是我使用的代码:

import theano.tensor as T
import theano
import sys
sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
import lasagne
from theano import pp
from theano import function
import gzip
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
def load_mnist():

    def load_mnist_images(filename):
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: (examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
    return X_train, y_train, X_test, y_test

def plot_filters(conv_layer):
    W = conv_layer.get_params()[0]
    W_fn = theano.function([],W)
    params = W_fn()
    ks = np.squeeze(params)
    kstack = np.vstack(ks)
    plt.imshow(kstack,interpolation='none')
    plt.show()

def main():

    #theano.config.exception_verbosity="high"
    #theano.config.optimizer='None'

    X_train, y_train, X_test, y_test = load_mnist()
    ohe = OneHotEncoder()

    y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
    chunk_len = 100
    visamount = 10
    num_epochs = 1000
    num_filters=31
    dropout_p=.0
    print "X_train.shape",X_train.shape,"y_train.shape",y_train.shape
    input_var = T.tensor4('X')
    output_var = T.tensor4('X')
    conv_nonlinearity = lasagne.nonlinearities.sigmoid
    net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
    conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
    net = lasagne.layers.DropoutLayer(net,p=dropout_p)
    #conv2_layer = lasagne.layers.Conv2DLayer(dropout_layer,num_filters,(3,3),nonlinearity=conv_nonlinearity)
    #pool2_layer = lasagne.layers.MaxPool2DLayer(conv2_layer,(3,3),stride=(2,2))
    net = lasagne.layers.DenseLayer(net,10,nonlinearity=lasagne.nonlinearities.sigmoid)

    #augment_layer1 = lasagne.layers.DenseLayer(reduction_layer,33,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.DenseLayer(net,121,nonlinearity=lasagne.nonlinearities.sigmoid)

    net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,11,11))

    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.Upscale2DLayer(net,2)

    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    #pool_after0 = lasagne.layers.MaxPool2DLayer(conv_after1,(3,3),stride=(2,2))
    net = lasagne.layers.Upscale2DLayer(net,2)

    net = lasagne.layers.DropoutLayer(net,p=dropout_p)

    #conv_after2 = lasagne.layers.Conv2DLayer(upscale_layer1,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    #pool_after1 = lasagne.layers.MaxPool2DLayer(conv_after2,(3,3),stride=(1,1))
    #upscale_layer2 = lasagne.layers.Upscale2DLayer(pool_after1,4)

    net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
    print "output_shape:",lasagne.layers.get_output_shape(net)
    params = lasagne.layers.get_all_params(net, trainable=True)
    prediction = lasagne.layers.get_output(net)
    loss = lasagne.objectives.squared_error(prediction, output_var)
    #loss = lasagne.objectives.binary_crossentropy(prediction, output_var)
    aggregated_loss = lasagne.objectives.aggregate(loss)
    updates = lasagne.updates.adagrad(aggregated_loss,params)
    train_fn = theano.function([input_var, output_var], loss, updates=updates)

    test_prediction = lasagne.layers.get_output(net, deterministic=True)
    predict_fn = theano.function([input_var], test_prediction)

    print "starting training..."
    for epoch in range(num_epochs):
        selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
        X_train_sub = X_train[selected,:]
        _loss = train_fn(X_train_sub, X_train_sub)
        print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
        """
        chunk = X_train[0:chunk_len,:,:,:]
        result = predict_fn(chunk)
        vis1 = np.hstack([chunk[j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()
        """
    print "done."

    chunk = X_train[0:chunk_len,:,:,:]
    result = predict_fn(chunk)
    print "chunk.shape",chunk.shape
    print "result.shape",result.shape
    plot_filters(conv1)
    for i in range(chunk_len/visamount):
        vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()
    import ipdb; ipdb.set_trace()

if __name__ == "__main__":
    main()

关于如何改进这个网络以获得功能合理的自动编码器的任何想法?

问题解决了!

使用完全不同的实现,在卷积层中使用泄漏整流器而不是 sigmoid 函数,瓶颈层中只有 2 个(!!)节点,最后使用 1x1 内核进行卷积。

这是一些重建的结果:

在此处输入图像描述

代码:

import theano.tensor as T
import theano
import sys
sys.path.insert(0,'./Lasagne') # local checkout of Lasagne
import lasagne
from theano import pp
from theano import function
import theano.tensor.nnet
import gzip
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt
def load_mnist():

    def load_mnist_images(filename):
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: (examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
    return X_train, y_train, X_test, y_test

def main():

    X_train, y_train, X_test, y_test = load_mnist()
    ohe = OneHotEncoder()

    y_train = ohe.fit_transform(np.expand_dims(y_train,1)).toarray()
    chunk_len = 100
    num_epochs = 10000
    num_filters=7
    input_var = T.tensor4('X')
    output_var = T.tensor4('X')
    #conv_nonlinearity = lasagne.nonlinearities.sigmoid
    #conv_nonlinearity = lasagne.nonlinearities.rectify
    conv_nonlinearity = lasagne.nonlinearities.LeakyRectify(.1)
    softplus = theano.tensor.nnet.softplus
    #conv_nonlinearity = theano.tensor.nnet.softplus
    net = lasagne.layers.InputLayer((chunk_len,1,28,28), input_var)
    conv1 = net = lasagne.layers.Conv2DLayer(net,num_filters,(7,7),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(2,2))
    net = lasagne.layers.DenseLayer(net,2,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.DenseLayer(net,49,nonlinearity=lasagne.nonlinearities.sigmoid)
    net = lasagne.layers.ReshapeLayer(net,(chunk_len,1,7,7))
    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(1,1))
    net = lasagne.layers.Upscale2DLayer(net,4)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(3,3),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.MaxPool2DLayer(net,(3,3),stride=(1,1))
    net = lasagne.layers.Upscale2DLayer(net,4)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(5,5),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.Conv2DLayer(net,num_filters,(1,1),nonlinearity=conv_nonlinearity,untie_biases=True)
    net = lasagne.layers.FeaturePoolLayer(net,num_filters,pool_function=theano.tensor.max)
    net = lasagne.layers.Conv2DLayer(net,1,(1,1),nonlinearity=conv_nonlinearity,untie_biases=True)
    print "output shape:",net.output_shape
    params = lasagne.layers.get_all_params(net, trainable=True)
    prediction = lasagne.layers.get_output(net)
    loss = lasagne.objectives.squared_error(prediction, output_var)
    #loss = lasagne.objectives.binary_hinge_loss(prediction, output_var)
    aggregated_loss = lasagne.objectives.aggregate(loss)
    #updates = lasagne.updates.adagrad(aggregated_loss,params)
    updates = lasagne.updates.nesterov_momentum(aggregated_loss,params,0.5)#.005
    train_fn = theano.function([input_var, output_var], loss, updates=updates)

    test_prediction = lasagne.layers.get_output(net, deterministic=True)
    predict_fn = theano.function([input_var], test_prediction)

    print "starting training..."
    for epoch in range(num_epochs):
        selected = list(set(np.random.random_integers(0,59999,chunk_len*4)))[:chunk_len]
        X_train_sub = X_train[selected,:]
        _loss = train_fn(X_train_sub, X_train_sub)
        print("Epoch %d: Loss %g" % (epoch + 1, np.sum(_loss) / len(X_train)))
    print "done."

    chunk = X_train[0:chunk_len,:,:,:]
    result = predict_fn(chunk)
    print "chunk.shape",chunk.shape
    print "result.shape",result.shape
    visamount = 10
    for i in range(10):
        vis1 = np.hstack([chunk[i*visamount+j,0,:,:] for j in range(visamount)])
        vis2 = np.hstack([result[i*visamount+j,0,:,:] for j in range(visamount)])
        plt.imshow(np.vstack([vis1,vis2]))
        plt.show()

    import ipdb; ipdb.set_trace()
if __name__ == "__main__":
    main()
1个回答

您可能会通过可视化权重而不仅仅是重建来获得更多洞察力。当我的偏见配置错误时,我遇到了类似的问题。以下所有内容都是根据我编写自己的学习库的经验编写的。您可以在 Github http://github.com/josephcatrambone/aij 上查看代码。

这是没有偏差时我的程序的屏幕截图。因为我急于完成这篇文章,所以这可能只过了十个纪元:

只有权重 - 没有偏见。

权重更新由以下操作完成:

weights.add_i(positiveProduct.subtract(negativeProduct).elementMultiply(learningRate / (float) batchSize));
//visibleBias.add_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));
//hiddenBias.add_i(positiveHiddenProbabilities.subtract(negativeHiddenProbabilities).meanRow().elementMultiply(learningRate));

如果我取消注释可见的偏差代码,我会得到以下结果:

纠正可见偏差。

如果我搞砸了可见偏差代码的符号(减去而不是添加):

visibleBias.subtract_i(batch.subtract(negativeVisibleProbabilities).meanRow().elementMultiply(learningRate));

我得到这张图片:

反向偏置符号。

哪个滚雪球,最终达到你上面的东西。检查错误函数的标志。