带有查询、键和值的注意力的一般公式对应于注意力的重新检索视图:您有一些查询用于根据与它们对应的键检索一些值。
使用 RNN,注意力被用于机器翻译等序列到序列模型。(时间序列预测通常被表述为序列标签。)RNN 解码器中的注意力是这种情况的一个特例:
然后在 Keras 中实现的 RNN 解码器看起来像这样(基于TensorFlow 教程):
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units):
super(Decoder, self).__init__()
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(
self.dec_units, return_sequences=True,
return_state=True, recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
self.attention = tf.keras.layers.AdditiveAttention()
def call(self, x, hidden, enc_output):
# hidden is the previous hidden state (batch, 1, dec_units)
# x is the previous output: (batch, 1)
# enc_output shape == (batch_size, src_length, hidden_size)
# hidden shape == (batch_size, 1, dec_units)
context_vector = self.attention([hidden, enc_output])
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state