机器算法验证 - 在 Tensorflow 中构建自动编码器以超越 PCA - 吾爱随笔录

在 Tensorflow 中构建自动编码器以超越 PCA

机器算法验证主成分分析 Python 深度学习张量流自动编码器

2022-02-11 18:37:09

Hinton 和 Salakhutdinov 在Reducing the Dimensionality of Data with Neural Networks, Science 2006通过使用深度自动编码器提出了非线性 PCA。我曾多次尝试使用 Tensorflow 构建和训练 PCA 自动编码器，但我从未能够获得比线性 PCA 更好的结果。

如何有效地训练自动编码器？

（后来由@amoeba 编辑：这个问题的原始版本包含无法正常工作的 Python Tensorflow 代码。可以在编辑历史记录中找到它。）

3个回答

这是 Hinton 和 Salakhutdinov 2006 年科学论文中的关键人物：

它显示了 MNIST 数据集的降维（ $28\times 28$ 个位数的黑白图像）从原来的 784 维变为二维。

让我们尝试重现它。我不会直接使用 Tensorflow，因为使用 Keras（运行在 Tensorflow 之上的更高级别的库）来完成像这样的简单深度学习任务要容易得多。使用 H&S

784 \to 1000 \to 500 \to 250 \to 2 \to 250 \to 500 \to 1000 \to 784

$784\to 1000\to 500\to 250\to 2\to 250\to 500\to 1000\to 784$ 带有物流单元的架构，使用受限玻尔兹曼机堆栈进行预训练。十年后，这听起来很老套。我将使用更简单的

784 \to 512 \to 128 \to 2 \to 128 \to 512 \to 784

$784\to 512\to 128\to 2\to 128\to 512\to 784$ 具有指数线性单元的架构，无需任何预训练。我将使用 Adam 优化器（具有动量的自适应随机梯度下降的特定实现）。

该代码是从 Jupyter 笔记本复制粘贴的。在 Python 3.6 中，您需要安装 matplotlib（用于 pylab）、NumPy、seaborn、TensorFlow 和 Keras。在 Python shell 中运行时，您可能需要添加plt.show()以显示绘图。

初始化

%matplotlib notebook

import pylab as plt
import numpy as np
import seaborn as sns; sns.set()

import keras
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense
from keras.optimizers import Adam

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784) / 255
x_test = x_test.reshape(10000, 784) / 255

主成分分析

mu = x_train.mean(axis=0)
U,s,V = np.linalg.svd(x_train - mu, full_matrices=False)
Zpca = np.dot(x_train - mu, V.transpose())

Rpca = np.dot(Zpca[:,:2], V[:2,:]) + mu    # reconstruction
err = np.sum((x_train-Rpca)**2)/Rpca.shape[0]/Rpca.shape[1]
print('PCA reconstruction error with 2 PCs: ' + str(round(err,3)));

这输出：

PCA reconstruction error with 2 PCs: 0.056

训练自编码器

m = Sequential()
m.add(Dense(512,  activation='elu', input_shape=(784,)))
m.add(Dense(128,  activation='elu'))
m.add(Dense(2,    activation='linear', name="bottleneck"))
m.add(Dense(128,  activation='elu'))
m.add(Dense(512,  activation='elu'))
m.add(Dense(784,  activation='sigmoid'))
m.compile(loss='mean_squared_error', optimizer = Adam())
history = m.fit(x_train, x_train, batch_size=128, epochs=5, verbose=1, 
                validation_data=(x_test, x_test))

encoder = Model(m.input, m.get_layer('bottleneck').output)
Zenc = encoder.predict(x_train)  # bottleneck representation
Renc = m.predict(x_train)        # reconstruction

这在我的工作桌面上大约需要 35 秒并输出：

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 7s - loss: 0.0577 - val_loss: 0.0482
Epoch 2/5
60000/60000 [==============================] - 7s - loss: 0.0464 - val_loss: 0.0448
Epoch 3/5
60000/60000 [==============================] - 7s - loss: 0.0438 - val_loss: 0.0430
Epoch 4/5
60000/60000 [==============================] - 7s - loss: 0.0423 - val_loss: 0.0416
Epoch 5/5
60000/60000 [==============================] - 7s - loss: 0.0412 - val_loss: 0.0407

所以你已经可以看到我们在仅仅两个训练 epoch 之后就超过了 PCA 损失。

（顺便说一句，将所有激活函数更改为activation='linear'并观察损失如何精确地收敛到 PCA 损失是有指导意义的。这是因为线性自动编码器等价于 PCA。）

将 PCA 投影与瓶颈表示并排绘制

plt.figure(figsize=(8,4))
plt.subplot(121)
plt.title('PCA')
plt.scatter(Zpca[:5000,0], Zpca[:5000,1], c=y_train[:5000], s=8, cmap='tab10')
plt.gca().get_xaxis().set_ticklabels([])
plt.gca().get_yaxis().set_ticklabels([])

plt.subplot(122)
plt.title('Autoencoder')
plt.scatter(Zenc[:5000,0], Zenc[:5000,1], c=y_train[:5000], s=8, cmap='tab10')
plt.gca().get_xaxis().set_ticklabels([])
plt.gca().get_yaxis().set_ticklabels([])

plt.tight_layout()

重建

现在让我们看看重建（第一行 - 原始图像，第二行 - PCA，第三行 - 自动编码器）：

plt.figure(figsize=(9,3))
toPlot = (x_train, Rpca, Renc)
for i in range(10):
    for j in range(3):
        ax = plt.subplot(3, 10, 10*j+i+1)
        plt.imshow(toPlot[j][i,:].reshape(28,28), interpolation="nearest", 
                   vmin=0, vmax=1)
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

plt.tight_layout()

通过更深的网络、一些正则化和更长的训练可以获得更好的结果。实验。深度学习很简单！

为@amoeba 制作这个很好的例子提供了巨大的支持。我只是想表明，该帖子中描述的自动编码器训练和重建过程也可以在 R 中以类似的容易方式完成。下面的自动编码器已设置，因此它尽可能接近地模拟变形虫的示例 - 相同的优化器和整体架构。由于 TensorFlow 后端没有以类似方式播种，因此无法重现确切的成本。

初始化

library(keras)
library(rARPACK) # to use SVDS
rm(list=ls())
mnist   = dataset_mnist()
x_train = mnist$train$x
y_train = mnist$train$y
x_test  = mnist$test$x
y_test  = mnist$test$y

# reshape & rescale
dim(x_train) = c(nrow(x_train), 784)
dim(x_test)  = c(nrow(x_test), 784)
x_train = x_train / 255
x_test = x_test / 255

主成分分析

mus = colMeans(x_train)
x_train_c =  sweep(x_train, 2, mus)
x_test_c =  sweep(x_test, 2, mus)
digitSVDS = svds(x_train_c, k = 2)

ZpcaTEST = x_test_c %*% digitSVDS$v # PCA projection of test data

自动编码器

model = keras_model_sequential() 
model %>%
  layer_dense(units = 512, activation = 'elu', input_shape = c(784)) %>%  
  layer_dense(units = 128, activation = 'elu') %>%
  layer_dense(units = 2,   activation = 'linear', name = "bottleneck") %>%
  layer_dense(units = 128, activation = 'elu') %>% 
  layer_dense(units = 512, activation = 'elu') %>% 
  layer_dense(units = 784, activation='sigmoid')

model %>% compile(
  loss = loss_mean_squared_error, optimizer = optimizer_adam())

history = model %>% fit(verbose = 2, validation_data = list(x_test, x_test),
                         x_train, x_train, epochs = 5, batch_size = 128)

# Unsurprisingly a 3-year old laptop is slower than a desktop
# Train on 60000 samples, validate on 10000 samples
# Epoch 1/5
#  - 14s - loss: 0.0570 - val_loss: 0.0488
# Epoch 2/5
#  - 15s - loss: 0.0470 - val_loss: 0.0449
# Epoch 3/5
#  - 15s - loss: 0.0439 - val_loss: 0.0426
# Epoch 4/5
#  - 15s - loss: 0.0421 - val_loss: 0.0413
# Epoch 5/5
#  - 14s - loss: 0.0408 - val_loss: 0.0403

# Set the auto-encoder
autoencoder = keras_model(model$input, model$get_layer('bottleneck')$output)
ZencTEST = autoencoder$predict(x_test)  # bottleneck representation  of test data

将 PCA 投影与瓶颈表示并排绘制

par(mfrow=c(1,2))
myCols = colorRampPalette(c('green',     'red',  'blue',  'orange', 'steelblue2',
                            'darkgreen', 'cyan', 'black', 'grey',   'magenta') )
plot(ZpcaTEST[1:5000,], col= myCols(10)[(y_test+1)], 
     pch=16, xlab = 'Score 1', ylab = 'Score 2', main = 'PCA' ) 
legend( 'bottomright', col= myCols(10), legend = seq(0,9, by=1), pch = 16 )

plot(ZencTEST[1:5000,], col= myCols(10)[(y_test+1)], 
     pch=16, xlab = 'Score 1', ylab = 'Score 2', main = 'Autoencoder' ) 
legend( 'bottomleft', col= myCols(10), legend = seq(0,9, by=1), pch = 16 )

重建

我们可以用通常的方式重建数字。（顶行是原始数字，中间行是 PCA 重建，底行是自动编码器重建。）

Renc = predict(model, x_test)        # autoencoder reconstruction
Rpca = sweep( ZpcaTEST %*% t(digitSVDS$v), 2, -mus) # PCA reconstruction

dev.off()
par(mfcol=c(3,9), mar = c(1, 1, 0, 0))
myGrays = gray(1:256 / 256)
for(u in seq_len(9) ){
  image( matrix( x_test[u,], 28,28, byrow = TRUE)[,28:1], col = myGrays, 
         xaxt='n', yaxt='n')
  image( matrix( Rpca[u,], 28,28, byrow = TRUE)[,28:1], col = myGrays , 
         xaxt='n', yaxt='n')
  image( matrix( Renc[u,], 28,28, byrow = TRUE)[,28:1], col = myGrays, 
         xaxt='n', yaxt='n')
}

如前所述，更多的时期和更深和/或更智能训练的网络将产生更好的结果。例如，PCA 重构误差 $k$ = 9 大约是 $0.0356$ ，我们可以得到几乎相同的错误（ $0.0359$ ) 来自上述自动编码器，只需将训练时期从 5 增加到 25。在这个用例中，2 个自动编码器派生的组件将提供与 9 个主组件相似的重建误差。凉爽的！

这是我的 jupyter 笔记本，我尝试在其中复制您的结果，但有以下区别：

我没有直接使用 tensorflow，而是使用它查看 keras
泄漏relu而不是relu以避免饱和（即编码输出为0）
- 这可能是 AE 性能不佳的原因
自动编码器输入是缩放到 [0,1] 的数据
- 我想我在某处读到，带有 relu 的自动编码器最适合 [0-1] 数据
- 运行我的笔记本，自动编码器的输入是 mean=0，std=1 为 AE > 0.7 的所有降维提供了 MSE，所以也许这是你的问题之一

~~PCA 输入保持为均值 = 0 和标准 = 1 的数据~~
- ~~也许我稍后会用 PCA 和 AE 的 [0-1] 数据重新运行它~~
PCA 输入也被缩放到 [0-1]。PCA 也适用于 (mean=0,std=1) 数据，但 MSE 将无法与 AE 相提并论

我的 PCA 的 MSE 结果从 1 到 6 的降维（输入有 6 列）和 AE 从暗淡。红色的。1到6个如下：

PCA 输入为 (mean=0,std=1) 而 AE 输入为 [0-1] 范围 - 4e-15 : PCA6 - .015 : PCA5 - .0502 : AE5 - .0508 : AE6 - .051 : AE4 - .053 : AE3 - .157 : PCA4 - .258 : AE2 - .259 : PCA3 - .377 : AE1 - .483 : PCA2 - .682 : PCA1

9e-15：PCA6
.0094 : PCA5
.0502 : AE5
.0507 : AE6
.0514 : AE4
.0532 : AE3
.0772 : PCA4
.1231：PCA3
.2588 : AE2
.2831：PCA2
.3773 : AE1
.3885：PCA1

没有降维的线性 PCA 可以达到 9e-15，因为它可以将无法放入最后一个组件的任何东西推入。

其它你可能感兴趣的问题

上一篇R中的计时功能下一篇PCA 的线性度