数据挖掘 - Keras自注意力模块的使用 - 吾爱随笔录

这个问题呼吁人们分享他们对keras_self_attention模块的个人经验。

我还总结了我遇到的问题以及我从答案中找到或收到的解决方案。

背景

我正在使用时间序列数据构建分类器。输入的形状为（批次、步骤、特征）。

有缺陷的代码如下所示。

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention

X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)

model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_val, y_val),
                        verbose=1
                        )

yhat = model.predict_prob(X_test)

问题

SeqSelfAttention1.模型在层引发了IndexError 。

Solution: This problem was because the `return_sequences` in the last `LSTM` layer was not set to `True`

2.由于形状不兼容的问题，模型ValueError在层之前提出了一个问题。Dense

Solution: The solution is to add a `Flatten()` layer before `Dense` layer.

*3。怎么做Muplicative attention和Multi-head工作？

*4。堆叠多层self_attention(SA) 是否有助于提高准确性？应该是 LSTM+LSTM+SA+SA 还是 LSTM+SA+LSTM+SA？