机器算法验证 - 延迟在 LSTM 网络中的作用 - 吾爱随笔录

延迟在 LSTM 网络中的作用

机器算法验证时间序列神经网络马尔科夫过程 lstm 循环神经网络

2022-04-13 15:40:09

LSTM 网络被假定为关于记忆，保留预测的重要信息。

如果是这样，为什么我们还需要考虑延迟输入？

我的假设是，如果模型足够复杂，LSTM 会以某种方式记住最后的输入（如果相关）。（类似于我们将马尔可夫链或更高阶转换为一阶马尔可夫链的技巧。）

然而，我的实验表明，无论 LSTM 模型的复杂性/简单性如何，延迟项都很重要。

怎么解释？

编辑：

通过延迟输入，我的意思是：

$X_{t}$ 是我的时间序列。我想预测 $X_{t+1}$ . 我知道 $X_{t+1}$ 取决于 $X_t$ 并且还在 $X_{t-1}$ . 我会假设 LSTM 即使只是在 $X_t$ .

一些代码：

from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
data = [0,1,2,3,2,1]*20
import numpy as np
def shape_it(X):
    return np.expand_dims(X.reshape((-1,1)),2)

n_data = len(data)
data = np.matrix(data)
n_train = int(0.8*n_data)
X_train = shape_it(data[:,:n_train])
Y_train = shape_it(data[:,1:(n_train+1)])
X_test = shape_it(data[:,n_train:-1])
Y_test = shape_it(data[:,(n_train+1):])

model = Sequential()
model.add(SimpleRNN(units=2,activation='relu',input_shape=(None,1)))
model.add(Dense(units=5,activation='relu'))
model.add(Dense(units=1,activation='relu'))

model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(X_train,Y_train.reshape(-1,1),epochs=5000,batch_size=n_train)

import matplotlib.pyplot as plt

plt.plot(model.predict(X_test).reshape(-1,1))
plt.plot(Y_test.reshape(-1,1))

结果如下图：

请注意，这将是完全准确的，但显然并非如此。

一个理想的答案将包含一个能够准确学习的 RNN 的配置，而不涉及 $X_{t-1}$ .

1个回答

更新

你的例子很有趣。一方面，它的构造方式是你真的只需要一个参数，它的值为 1：

y_{t} = β + w y_{t - 1} β = 0 w = 1

$y_t=\beta+w y_{t-1}\\\beta=0\\w=1$

您的训练数据集很小（96 个观察值），但是使用三层网络，您有很多参数。很容易过拟合。

最有趣的部分是您的测试代码。目前尚不清楚您是在尝试进行一系列单步预测还是动态多步预测。

在一步预测中，您预测时间 t 并得到 $\hat y_t=f(x_t)=f(y_{t-1})$ . 因此，您始终使用最新观察到的信息进行预测，以提前一步进行预测，然后进行下一个时间段。

注意上面我是如何使用的 $y_{t-1}$ 并不是 $\hat y_{t-1}$ . 这是重要的区别：在一步预测中，您始终使用上一步的观察值。相比之下，动态预测使用先前的预测来得出下一个： $\hat y_t=f(\hat y_{t-1})$ . 这就是为什么它被称为动态的。

因此，首先，我重新安排了您的代码并进行了修改，以使其生成单步和动态预测以进行比较。下面是输出：

# In[50]:


import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
from sklearn.metrics import mean_squared_error
data = [0,1,2,3,2,1]*20
import numpy as np
def shape_it(X):
    return np.expand_dims(X.reshape((-1,1)),2)

from keras import regularizers

from numpy.random import seed



# In[51]:


n_data = len(data)
data = np.matrix(data)
n_train = int(0.8*n_data)


# In[52]:


X_train = shape_it(data[:,:n_train])
Y_train = shape_it(data[:,1:(n_train+1)])
X_test = shape_it(data[:,n_train:-1])
Y_test = shape_it(data[:,(n_train+1):])


# In[26]:


plt.plot(X_train.reshape(-1,1))
plt.plot(Y_train.reshape(-1,1))
plt.show()

# In[27]:


plt.plot(X_test.reshape(-1,1))
plt.plot(Y_test.reshape(-1,1))
plt.show()

# In[75]:


model = Sequential()
batch_size = 1
model.add(SimpleRNN(12, batch_input_shape=(batch_size, X_train.shape[1], X_train.shape[2]),stateful=True))
model.add(Dense(12))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

epochs = 1000
for i in range(epochs):
    model.fit(X_train, np.reshape(Y_train,(-1,)), epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
    model.reset_states()


# build state
model.reset_states()
model.predict(X_train, batch_size=batch_size)

predictions = list()

for i in range(len(X_test)):
    # make one-step forecast
    X = X_test[i]
    X = X.reshape(1, 1, 1)
    yhat = model.predict(X, batch_size=batch_size)[0,0]

    # store forecast
    predictions.append(yhat)
    expected = Y_test[ i ]
    print('Month=%d, Predicted=%f, Expected=%f' % (i+1, yhat, expected))

# report performance
rmse = np.sqrt(mean_squared_error(Y_test.reshape(len(Y_test)), predictions))
print('Test RMSE: %.3f' % rmse)
# line plot of observed vs predicted
plt.plot(Y_test.reshape(len(Y_test)))
plt.plot(predictions)
plt.show()

现在我们得到了您所期望的图片。您的原始代码有几个问题。一是对于这个特定问题，ReLU 不是一个好主意。您有线性问题，因此“线性”或默认激活应该更好。第二个问题是您必须在fit函数中使用 stateful=True 调用。最后，我更改了预测实现，使其成为一步预测。

这还不错，但这只是一步预测。接下来，我们将尝试进行如前所述的动态预测。

# build state
model.reset_states()
model.predict(X_train, batch_size=batch_size)

dynpredictions = list()
dyhat = X_test[0]


for i in range(len(X_test)):
    # make one-step forecast
    dyhat = yhat.reshape(1, 1, 1)
    dyhat = model.predict(dyhat, batch_size=batch_size)[0,0]

    # store forecast
    dynpredictions.append(dyhat)
    expected = Y_test[ i ]
    print('Month=%d, Predicted Dynamically=%f, Expected=%f' % (i+1, dyhat, expected))


drmse = np.sqrt(mean_squared_error(Y_test.reshape(len(Y_test)), dynpredictions))
print('Test Dynamic RMSE: %.3f' % drmse)
# line plot of observed vs predicted
plt.plot(Y_test.reshape(len(Y_test)))
plt.plot(dynpredictions)
plt.show()

如下图所示，动态预测看起来并不那么好。回想一下，现在我们没有样本了，并且我们没有使用超出观察 #96 的观察值，这与一步预测不同。尽管如此，我们还是想解决这个问题，因为这个问题对我们来说太明显了，我们希望 NN 也能解决它。

我将尝试一个不同的 NN，它只有一个隐藏层，并通过正则化来对抗过度拟合，如下所示。

seed(1)

modelR = Sequential()
batch_size = 1
modelR.add(SimpleRNN(4, batch_input_shape=(batch_size, X_train.shape[1], X_train.shape[2]),stateful=True,
                     kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.)))
modelR.add(Dense(1,kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.)))
modelR.compile(loss='mean_squared_error', optimizer='adam')

epochs = 1000
for i in range(epochs):
    modelR.fit(X_train, np.reshape(Y_train,(-1,)), epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
    modelR.reset_states()

# build state
modelR.reset_states()
modelR.predict(X_train, batch_size=batch_size)

predictions = list()

for i in range(len(X_test)):
    # make one-step forecast
    X = X_test[i]
    X = X.reshape(1, 1, 1)
    yhat = modelR.predict(X, batch_size=batch_size)[0,0]

    # store forecast
    predictions.append(yhat)
    expected = Y_test[ i ]
    print('Month=%d, Predicted=%f, Expected=%f' % (i+1, yhat, expected))

# report performance
rmse = np.sqrt(mean_squared_error(Y_test.reshape(len(Y_test)), predictions))
print('Test RMSE: %.3f' % rmse)
# line plot of observed vs predicted
plt.plot(Y_test.reshape(len(Y_test)))
plt.plot(predictions)
plt.show()

新模型仍在进行一步预测，如下所示。

现在让我们尝试动态预测。

# build state
modelR.reset_states()
modelR.predict(X_train, batch_size=batch_size)

dynpredictions = list()
dyhat = X_test[0]


for i in range(len(X_test)):
    # make one-step forecast
    dyhat = dyhat.reshape(1, 1, 1)
    dyhat = modelR.predict(dyhat, batch_size=batch_size)[0,0]

    # store forecast
    dynpredictions.append(dyhat)
    expected = Y_test[ i ]
    print('Month=%d, Predicted Dynamically=%f, Expected=%f' % (i+1, dyhat, expected))


drmse = np.sqrt(mean_squared_error(Y_test.reshape(len(Y_test)), dynpredictions))
print('Test Dynamic RMSE: %.3f' % drmse)
# line plot of observed vs predicted
plt.plot(Y_test.reshape(len(Y_test)))
plt.plot(dynpredictions)
plt.show()

现在动态预测似乎也起作用了！

其它你可能感兴趣的问题

上一篇XX,是Y独立同分布。对称性有反例吗X-是X−Y? 下一篇泊松分布和均匀分布有什么区别？