Keras自注意力模块的使用

数据挖掘 机器学习 深度学习 时间序列
2021-10-02 11:32:32

这个问题呼吁人们分享他们对keras_self_attention模块的个人经验。

我还总结了我遇到的问题以及我从答案中找到或收到的解决方案。


背景

我正在使用时间序列数据构建分类器。输入的形状为(批次、步骤、特征)。

有缺陷的代码如下所示。

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention

X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)

model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_data=(X_val, y_val),
                        verbose=1
                        )

yhat = model.predict_prob(X_test)

问题

SeqSelfAttention1.模型在层引发了IndexError 。

Solution: This problem was because the `return_sequences` in the last `LSTM` layer was not set to `True`

2.由于形状不兼容的问题,模型ValueError在层之前提出了一个问题。Dense

Solution: The solution is to add a `Flatten()` layer before `Dense` layer.

*3。怎么做Muplicative attentionMulti-head工作?

*4。堆叠多层self_attention(SA) 是否有助于提高准确性?应该是 LSTM+LSTM+SA+SA 还是 LSTM+SA+LSTM+SA?

1个回答

似乎该SeqSelfAttention层期待所有的时间步长。
return_sequences=True
主页示例中显示了相同的内容。关联

import tensorflow as tf, numpy as np
from tensorflow import keras 
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention

X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)

model = tf.keras.models.Sequential()
model.add(keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(keras.layers.Bidirectional(LSTM(20, dropout=0.25, recurrent_dropout=0.1, return_sequences=True)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(keras.layers.Flatten())
model.add(Dense(1, activation='sigmoid'))

model.summary()

model.compile(loss='binary_crossentropy')
model.fit(X_train,y_train,epochs=2)

在此处输入图像描述

在此处输入图像描述