我正在构建一个用于时间序列回归的 lstm 模型。为了验证我的模型实现并理解 keras,我使用了一个玩具问题来确保我理解正在发生的事情。问题是我不明白这里发生了什么。
当我拟合模型时,训练损失总是大于验证损失,即使对于平衡的训练/验证集(每个 5000 个样本)也是如此:
据我了解,这两条曲线应该正好相反,这样训练损失将成为验证损失的上限。
预测在这里或多或少是可以的。但是我仍然想了解发生了什么,因为我在我的实际问题中看到了类似的损失行为,但那里的预测是垃圾。所以我怀疑,这个模型发生了一些我不理解的事情。
这是我的玩具问题的代码:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#create testdata
nEpochs = 12
nTimestepsPerSeq = 5
nFeatures = 5
def generate_examples(nSamples, nTimestepsPerSeq, nFeatures):
X = np.random.random((nSamples, nTimestepsPerSeq, nFeatures))
#make feature 1 categorical: [0,1,2]
X[:,:,0] = np.random.randint(0,3, X[:,:,0].shape)
#make feature 2 categorical: [-1, 0,1]
X[:,:,1] = np.random.randint(-1,2, X[:,:,1].shape)
#shift feature 3 by a constant
X[:,:,2] = X[:,:,2] + 2
#calc output
Y = np.zeros((1, nSamples))
#combine features and introduce non-linearity
Y = X[:,-1,0]*np.mean(X[:,-1,3]) + X[:,-1,2]*np.mean(X[:,-1,4]) + \
(X[:,-1,0]*X[:,-1,1]*np.mean(X[:,-1,2]))**2
#add uniform noise
Y = Y*np.random.uniform(0.95,1.05,size=Y.shape)
#reshape for scaler instance:
# ValueError: Expected 2D array, got 1D array instead:
# array=[ 1.27764489 27.56604355 1.39317709 ..., 1.57210734 8.18834281
# 1.66174279].
# Reshape your data either using array.reshape(-1, 1) if your data has a single fe
# ature or array.reshape(1, -1) if it contains a single sample.
Y = Y.reshape((-1,1))
return X,Y
Xtrain,Ytrain = generate_examples(5000, nTimestepsPerSeq, nFeatures)
Xval,Yval = generate_examples(5000, nTimestepsPerSeq, nFeatures)
Xtest,Ytest = generate_examples(20, nTimestepsPerSeq, nFeatures)
#scale input data
for i in range(0,nFeatures):
#scaler = StandardScaler()
scaler = MinMaxScaler()
scaler = scaler.fit(Xtrain[:,:,i])
Xtrain[:,:,i] = scaler.transform(Xtrain[:,:,i])
Xval[:,:,i] = scaler.transform(Xval[:,:,i])
Xtest[:,:,i] = scaler.transform(Xtest[:,:,i])
targetScaler = MinMaxScaler()
targetScaler = targetScaler.fit(Ytrain)
#transform target
Ytrain = targetScaler.transform(Ytrain)
Yval = targetScaler.transform(Yval)
Ytest = targetScaler.transform(Ytest)
# defining the LSTM model
model = Sequential()
model.add(LSTM(200, input_shape=(Xtrain.shape[1], Xtrain.shape[2]), return_sequences=True))
model.add(LSTM(200))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['acc'])
# fitting the model
history = model.fit(Xtrain, Ytrain, epochs=nEpochs, batch_size=50, validation_data=(Xval, Yval), shuffle=True, verbose=2)
#test model
yhat = model.predict(Xtest)
print("pediction vs truth:")
for i in range(0,10):
print(yhat[i], Ytest[i])
# summarize history for loss
plt.subplot(1,1,1)
plt.plot(history.history['loss'], '.-')
plt.plot(history.history['val_loss'], '.-')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()
编辑:我添加了一些实验的输出:
Epoch 1/12
- 6s - loss: 0.1056 - acc: 2.0000e-04 - val_loss: 0.0680 - val_acc: 0.0000e+00
Epoch 2/12
...
Epoch 11/12
- 4s - loss: 0.0033 - acc: 4.0000e-04 - val_loss: 0.0020 - val_acc: 0.0000e+00
Epoch 12/12
- 4s - loss: 0.0016 - acc: 4.0000e-04 - val_loss: 0.0016 - val_acc: 0.0000e+00
pediction vs truth:
[ 0.25022525] [ 0.25465108]
[ 0.98761547] [ 0.91661543]
[ 1.06177747] [ 0.95979166]
[ 0.0835482] [ 0.03742919]
[ 0.02432941] [ 0.01149685]
[ 0.00915699] [ 0.00887351]
[ 0.2765356] [ 0.27340488]
[ 0.02941256] [ 0.01685951]
[-0.01059875] [ 0.00157809]
[ 0.04762106] [ 0.01983566]