我是 Pytorch 的新手,正在尝试实现 lstm 字符级 seq2seq 模型。我想做的是:每个序列都是一个特定单词的字符列表,几个单词将创建一个 minibatch,它也代表一个句子。现在,在我对每个序列(在我的例子中是字符嵌入列表)的理解中,都会有一个最终的隐藏状态。所以,如果有两个字符序列(两个词),就会有两个隐藏状态,每个隐藏状态代表一个词。我什至没有考虑可变长度序列。我也不明白如果序列长度是可变的,为什么会出现问题。在每个特定序列中都有元素之前,lstm 不应该循环吗?迭代次数不应该是静态的,对吧?这是我尝试过的代码:
character_embedding = nn.Embedding(17,5)
#LSTM with input embedding dimention 5, and expected hidden state dimention 3
lstm = nn.LSTM(5,3)
#each vector is a word and there are two words with same number of character
char_embeds=character_embedding(torch.tensor([[1,2,3,4,5],[4,5,6,7,8]]))
#out will contain all the hidden states for each character and hidden sould contain final hidden state for each sequence
out, hidden=lstm(char_embeds)
print("char_embeds: ")
print(char_embeds)
print("hidden: ")
print(hidden[0])
输出:
char_embeds:
tensor([[[ 1.0157, -0.2197, 1.6615, -1.2916, -0.6116],
[ 0.5630, -0.9618, 0.7287, -0.5727, 1.6796],
[ 0.9902, -0.5408, 0.9785, -1.1090, 1.1126],
[ 0.7472, 0.0440, 1.0629, -0.7375, 0.0828],
[ 0.6632, -0.4523, 0.5051, 2.6031, 0.3798]],
[[ 0.7472, 0.0440, 1.0629, -0.7375, 0.0828],
[ 0.6632, -0.4523, 0.5051, 2.6031, 0.3798],
[-0.6522, -3.2626, 0.7967, -1.0322, 0.4667],
[-0.5086, 0.5142, -0.7141, -1.5352, 0.4177],
[-0.0582, 1.3398, -0.2829, 0.1392, 1.0709]]],
grad_fn=<EmbeddingBackward>)
hidden:
tensor([[[-0.2774, 0.0724, -0.4297],
[-0.4580, 0.1563, -0.5811],
[-0.5492, -0.2314, 0.3473],
[-0.0772, 0.2474, -0.1026],
[-0.1042, 0.4394, -0.3582]]], grad_fn=<StackBackward>)
在这里,我希望有两个隐藏状态,因为有两个序列。但我得到了 5 个隐藏状态。那是什么?我错过了什么?
第二个问题是,为什么 LSTM 不能处理变长序列?