数据挖掘 - 我的 Keras 双向 LSTM 模型给出了糟糕的预测 - 吾爱随笔录

我的 Keras 双向 LSTM 模型给出了糟糕的预测

数据挖掘喀拉斯张量流 rnn lstm

2021-09-28 00:15:41

我正在尝试使用双向 LSTM 预测组成一段音乐的音符的速度（动态）值，非常接近这篇博客文章：http: //imanmalik.com/cs/2017/06/05/neural -style.html

我认为我已经正确设置了所有内容，但是在预测我的模型时似乎只会产生乱码。这是在验证样本上进行的预测图（y 轴始终是时间，x 轴是，对于I.，一个音符激活数组；对于II. 预测速度和对于III. 真实的预期速度）。如您所见，预测的速度值（在中间的图表中）对于所有时间步长都是相同的（除了开始和结束时的一些，但这在图中很难看到）：

对于这个特定的预测，我已经用 44 批每批 4 个样本和 1 个 epoch 训练了我的模型，但我也尝试用不同的批量大小训练它 20 个 epoch，它似乎没有给出更好的结果。

Using TensorFlow backend.
Loading numpy data ...
173 train sequences
10 test sequences
x_train shape: (173,)
y_train shape: (173,)
Setting up model ...
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
bidirectional_1 (Bidirection (4, None, 128)            123392
_________________________________________________________________
bidirectional_2 (Bidirection (4, None, 128)            98816
_________________________________________________________________
bidirectional_3 (Bidirection (4, None, 128)            98816
_________________________________________________________________
time_distributed_1 (TimeDist (4, None, 88)             11352
=================================================================
Total params: 332,376
Trainable params: 332,376
Non-trainable params: 0
_________________________________________________________________
None
Training model ...
Epoch 1/1
2018-03-10 14:47:27.650226: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
44/44 [==============================] - 2145s 49s/step - loss: 0.0783 - mean_squared_error: 0.0783
Saving model ...
Loss and metrics: [0.002714238129556179, 0.0027141105383634567]

因此，换句话说，我最终在 1 个 epoch 后得到了 0.0027 的验证 MSE。这是我的 Keras 模型的代码：

number_of_notes = 88
input_size = number_of_notes * 2 # 88 notes * 2 states (pressed, sustained).
output_size = number_of_notes # 88 notes.

units = 64
dropout = 0.2 # Drop 20% of units for linear transformation of inputs.

model = Sequential()
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout),
                        input_shape=(None,input_size),
                        batch_input_shape=(args.batch_size,None,input_size)))
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout)))
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout)))
model.add(TimeDistributed(Dense(output_size, activation='sigmoid')))
model.compile(loss='mse', optimizer=Adam(lr=0.001, clipnorm=10), metrics=['mse'])

print('Training model ...')

number_of_train_batches = np.ceil(len(x_train)/float(args.batch_size))
model.fit_generator(batch_generator(x_train, y_train),
                    steps_per_epoch=number_of_train_batches,
                    epochs=args.epochs)

以下是我的预测：

prediction_data = np.load(path)

# Copy prediction input N times to create a batch of the right size.
tiled = np.tile(prediction_data, [args.batch_size,1,1])

raw_prediction = model.predict(tiled, batch_size=args.batch_size)[0]
prediction = (raw_prediction * 127).astype(int) # Float -> MIDI velocity.

所以我要问的是，我的模型可能在哪里出错？为了获得更好的结果，我应该尝试更改哪些参数？或者我怎样才能改变我的模型或训练设置以获得更好的结果？等任何和所有的帮助表示赞赏！

3个回答

由于音乐的时间维度，我认为您不能使用双向 LSTM 进行预测。我的意思是后向层必须首先预测最新值，并且只有在预测后才能看到给出上下文的序列 - 这就像你看一部反转的电影，你必须猜测第一帧的样子而不知道其余部分.

另外，查看imanmalik模型的 stylenet。进行预测的解释 lstm 是单向的。只有流派网络单元是双向的，但它们已经将预测值作为输入。

最初的博客文章提到解释层减少了模型的整体参数大小，虽然我不知道减少了多少。看起来你的网络没有这个，而且感觉真的很大；只有 173 个训练样例，即使它们的维度很大，对于一个拥有 300k+ 样例的网络来说，感觉也只是一个小数目。您可以尝试使用较小版本的网络，看看它是否可以在您拥有的数据上进行训练？

另外，你能解释一下你复制输入数据的步骤吗 $N$ 次？

由于我的一位同事的一些极好的建议，我最终解决了这个问题，这相当于：

将模型缩减到最基本的部分，然后看看你是否得到了合理的输出。如果你这样做了，一个一个地添加元素/层，看看哪里开始出错了。

在这里，我认为问题在于最后一层的 Sigmoid 激活。我现在的模型运行良好，看起来像这样：

model = Sequential()
model.add(Bidirectional(LSTM(output_size, activation='relu', return_sequences=True, dropout=dropout),
                        merge_mode='sum',
                        input_shape=(None, input_size),
                        batch_input_shape=(batch_size, None, input_size)))
model.add(Bidirectional(LSTM(output_size, activation='relu', return_sequences=True,
                             dropout=dropout), merge_mode='sum'))
model.add(Bidirectional(LSTM(output_size, activation='relu', return_sequences=True,
                             dropout=dropout), merge_mode='sum'))
model.compile(loss='mse', optimizer=Adam(
    lr=0.001, clipnorm=1), metrics=['mse'])

其它你可能感兴趣的问题

上一篇多类分类的评价方法下一篇基本时间序列分类示例