机器算法验证 - 在有和没有重置状态的情况下训练 LSTM - 吾爱随笔录

在有和没有重置状态的情况下训练 LSTM

机器算法验证 Python lstm 喀拉斯

2022-03-04 15:54:15

我对深度学习和 Keras 很陌生，我想知道 LSTM RNN 的这两种训练方法有什么区别。

1:
for i in range(10): #training
    model.fit(trainX, trainY, epochs=1, batch_size=batch_size, verbose=0, 
              shuffle=False)
    model.reset_states()

2:
model.fit(trainX, trainY, epochs=10, batch_size=batch_size, verbose=0, 
          shuffle=False)

在这两种情况下，网络不是在整个数据集上训练 10 次吗？我意识到，在示例中，我们可以在每个数据批次迭代中重置状态，但即使我删除了重置指令，结果也会大不相同。我很困惑。

1个回答

是的你是对的。在这两种情况下，模型都训练了 10 个 epoch。在每个 epoch 中，训练数据中的所有示例都流经网络。批量大小决定了模型的权重或参数更新后的示例数量。

第一种情况和第二种情况的区别在于，第一种情况允许您fit()在 epoch 之间的方法之外执行一些处理，例如model.reset_states(). 但是，类似的处理也可以fit()通过自定义回调类应用于方法中的第二种情况，包括例如on_epoch_begin、、和函数。on_epoch_endon_batch_beginon_batch_end

model.reset_states()关于从第一种情况中删除时两种情况得到完全不同结果的问题：它不应该发生。如果您在一种情况下重置时代之间的模型状态，但在另一种情况下不重置，您将从每种情况下得到不同的结果。如果您在两个时期之间不重置任何一种情况下的状态，则结果（一定数量的时期后的损失）将是相同的，在导入 Keras 之前初始化一个伪随机数生成器并在运行这两种情况之间重新启动 Python 解释器。我通过以下示例验证了这一点，其目标是从嘈杂的波形中学习纯正弦波。以下代码片段已使用 Python 3.5、NumPy 1.12.1、Keras 2.0.4 和 Matplotlib 2.0.2 实现：

import numpy as np

# Needed for reproducible results
np.random.seed(1)

from keras.models import Sequential
from keras.layers import LSTM, Dense

# Generate example data
# -----------------------------------------------------------------------------
x_train = y_train = [np.sin(i) for i in np.arange(start=0, stop=10, step=0.01)]
noise = np.random.normal(loc=0, scale=0.1, size=len(x_train))
x_train += noise

n_examples = len(x_train)
n_features = 1
n_outputs = 1
time_steps = 1

x_train = np.reshape(x_train, (n_examples, time_steps, n_features))
y_train = np.reshape(y_train, (n_examples, n_outputs))

# Initialize LSTM
# -----------------------------------------------------------------------------
batch_size = 100
model = Sequential()
model.add(LSTM(units=10, input_shape=(time_steps, n_features),
               return_sequences=True, stateful=True, batch_size=batch_size))
model.add(LSTM(units=10, return_sequences=False, stateful=True))
model.add(Dense(units=n_outputs, activation='linear'))
model.compile(loss='mse', optimizer='adadelta')

# Train LSTM
# -----------------------------------------------------------------------------
epochs = 70

# Case 1
for i in range(epochs):
    model.fit(x_train, y_train, epochs=1, batch_size=batch_size, verbose=2,
              shuffle=False)

# !!! To get exactly the same results between the cases, do the following:
# !!!  * To record the loss of the 1st case, run all the code until here.
# !!!  * To record the loss of the 2nd case,
# !!!    restart Python, comment out the 1st case and run all the code.

# Case 2
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, verbose=2,
          shuffle=False)

另外，以下是未重置状态的任何一种情况的结果的可视化：

import matplotlib.pyplot as plt

plt.style.use('ggplot')
ax = plt.figure(figsize=(10, 6)).add_subplot(111)
ax.plot(x_train[:, 0], label='x_train', color='#111111', alpha=0.8, lw=3)
ax.plot(y_train[:, 0], label='y_train', color='#E69F00', alpha=1, lw=3)
ax.plot(model.predict(x_train, batch_size=batch_size)[:, 0],
        label='Predictions for x_train after %i epochs' % epochs,
        color='#56B4E9', alpha=0.8, lw=3)
plt.legend(loc='lower right')

在 Keras 网站上，RNN 的状态性在循环层文档和常见问题解答中进行了讨论。

编辑：上述解决方案目前适用于 Theano 后端，但不适用于 TensorFlow 后端。

其它你可能感兴趣的问题

上一篇检测图像中的操作（例如，照片复制粘贴）下一篇为什么通过输出离散化将回归模型简化为分类模型会改进模型？