我正在尝试使用双向 LSTM 预测组成一段音乐的音符的速度(动态)值,非常接近这篇博客文章:http: //imanmalik.com/cs/2017/06/05/neural -style.html
我认为我已经正确设置了所有内容,但是在预测我的模型时似乎只会产生乱码。这是在验证样本上进行的预测图(y 轴始终是时间,x 轴是,对于I.,一个音符激活数组;对于II. 预测速度和对于III. 真实的预期速度)。如您所见,预测的速度值(在中间的图表中)对于所有时间步长都是相同的(除了开始和结束时的一些,但这在图中很难看到):
对于这个特定的预测,我已经用 44 批每批 4 个样本和 1 个 epoch 训练了我的模型,但我也尝试用不同的批量大小训练它 20 个 epoch,它似乎没有给出更好的结果。
Using TensorFlow backend.
Loading numpy data ...
173 train sequences
10 test sequences
x_train shape: (173,)
y_train shape: (173,)
Setting up model ...
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_1 (Bidirection (4, None, 128) 123392
_________________________________________________________________
bidirectional_2 (Bidirection (4, None, 128) 98816
_________________________________________________________________
bidirectional_3 (Bidirection (4, None, 128) 98816
_________________________________________________________________
time_distributed_1 (TimeDist (4, None, 88) 11352
=================================================================
Total params: 332,376
Trainable params: 332,376
Non-trainable params: 0
_________________________________________________________________
None
Training model ...
Epoch 1/1
2018-03-10 14:47:27.650226: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
44/44 [==============================] - 2145s 49s/step - loss: 0.0783 - mean_squared_error: 0.0783
Saving model ...
Loss and metrics: [0.002714238129556179, 0.0027141105383634567]
因此,换句话说,我最终在 1 个 epoch 后得到了 0.0027 的验证 MSE。这是我的 Keras 模型的代码:
number_of_notes = 88
input_size = number_of_notes * 2 # 88 notes * 2 states (pressed, sustained).
output_size = number_of_notes # 88 notes.
units = 64
dropout = 0.2 # Drop 20% of units for linear transformation of inputs.
model = Sequential()
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout),
input_shape=(None,input_size),
batch_input_shape=(args.batch_size,None,input_size)))
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout)))
model.add(Bidirectional(LSTM(units, return_sequences=True, dropout=dropout)))
model.add(TimeDistributed(Dense(output_size, activation='sigmoid')))
model.compile(loss='mse', optimizer=Adam(lr=0.001, clipnorm=10), metrics=['mse'])
print('Training model ...')
number_of_train_batches = np.ceil(len(x_train)/float(args.batch_size))
model.fit_generator(batch_generator(x_train, y_train),
steps_per_epoch=number_of_train_batches,
epochs=args.epochs)
以下是我的预测:
prediction_data = np.load(path)
# Copy prediction input N times to create a batch of the right size.
tiled = np.tile(prediction_data, [args.batch_size,1,1])
raw_prediction = model.predict(tiled, batch_size=args.batch_size)[0]
prediction = (raw_prediction * 127).astype(int) # Float -> MIDI velocity.
所以我要问的是,我的模型可能在哪里出错?为了获得更好的结果,我应该尝试更改哪些参数?或者我怎样才能改变我的模型或训练设置以获得更好的结果?等任何和所有的帮助表示赞赏!
