这个问题呼吁人们分享他们对keras_self_attention
模块的个人经验。
我还总结了我遇到的问题以及我从答案中找到或收到的解决方案。
背景
我正在使用时间序列数据构建分类器。输入的形状为(批次、步骤、特征)。
有缺陷的代码如下所示。
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention
X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)
model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_val, y_val),
verbose=1
)
yhat = model.predict_prob(X_test)
问题
SeqSelfAttention
1.模型在层引发了IndexError 。
Solution: This problem was because the `return_sequences` in the last `LSTM` layer was not set to `True`
2.由于形状不兼容的问题,模型ValueError
在层之前提出了一个问题。Dense
Solution: The solution is to add a `Flatten()` layer before `Dense` layer.
*3。怎么做Muplicative attention
和Multi-head
工作?
*4。堆叠多层self_attention
(SA) 是否有助于提高准确性?应该是 LSTM+LSTM+SA+SA 还是 LSTM+SA+LSTM+SA?