机器算法验证 - 深度学习：在自动编码器中使用 dropout？ - 吾爱随笔录

我正在使用自动编码器并且几乎没有混淆，我正在尝试不同的自动编码器，例如：

全连接自动编码器
卷积自动编码器
去噪自编码器

我有两个数据集，一个是带有浮点和整数值的数值，第二个是带有文本和日期值的文本数据集。

数值数据集如下所示：

date ,        id ,             check_in , check_out , coke_per , permanent_values , temp
13/9/2017     142453390001    134.2       43.1        13         87                 21
14/9/2017     142453390005    132.2       46.1        19         32                 41
15/9/2017     142453390002    120.2       42.1        33         99                 54
16/9/2017     142453390004    100.2       41.1        17         39

我的任何文本数据集都如下所示：

data              text
13/9/2017         i totally understand this conversation about farmer market and the organic products, a nice conversation ’cause prices are cheaper than traditional
14/9/2017         The conversation was really great. But I think I need much more practice. I need to improve my listening a lot. Now I’m very worried because I thought that I’d understand more. Although, I understood but I had to repeat and repeat. See you!!!

我的问题是：

在输入任何类型的自动编码器之前，我应该标准化我的数值数据值吗？如果它们是int和float值，我还需要标准化吗？
我应该在自动编码器中使用哪个激活函数？有些文章和研究论文说“sigmoid”，有些说“relu”？
我应该在每一层都使用 dropout 吗？就像我的自动编码器看起来像

编码器 (1000 --> 500 --> 256 ----> 128 ) --> 解码器 (128 --> 256 --> 500--> 784)

像这样的东西？

encoder(dropout(1000,500) --> dropout( 500,256) --> dropout (256,128) )----> decoder(dropout(128,256),dropout(256,500),dropout(500,784))

对于文本数据集，如果我使用word2vec或任何嵌入将文本转换为向量，那么每个单词都会有浮点值，我是否也应该规范化该数据？

文本（你好，你好吗）-> word2vec（文本）---->（[1854.92002，54112.89774，5432.9923，5323.98393]）

我应该标准化这些值还是直接在自动编码器中使用它们？