训练简单 RNN 以生成单词的问题

人工智能 循环神经网络 火炬
2021-11-06 22:36:28

在完成 Andrew Ng 的 Coursera 课程后,我想再次实现简单的 RNN,用于基于包含大约 800 个恐龙名称的文本文件生成恐龙名称。这是用 coursera 中的 Numpy 完成的,这里是一个 Jupyter notebook 的链接(不是我的 repo)来获取策略和完整的目标: 这里

我开始了类似的实现,但在 Pytorch 中,模型如下:

class RNN(nn.Module):
    def __init__(self,input_size):
        super(RNN, self).__init__()
        print("oo")
        self.hiddenWx1 = nn.Linear(input_size, 100) 
        self.hiddenWx2 = nn.Linear(100, input_size)
        self.z1 = nn.Linear(input_size,100)
        self.z2 = nn.Linear(100,input_size)
        self.tanh = nn.Tanh()
        self.softmax = torch.nn.Softmax(dim=1)

    def forward(self, input, hidden):
        layer = self.hiddenWx1(input)
        layer = self.hiddenWx2(layer)
        a_next = self.tanh(layer)
        z = self.z1(a_next)
        z = self.z2(z)
        y_next = self.softmax(z)
        return y_next,a_next

这是训练的主要算法:

for word in examples:  # for every dinosaurus name
                model.zero_grad()
                hidden= torch.zeros(1, len(ix_to_char)) #initialise hidden to null, ix_to_char is below
                word_vector = word_tensor(word) # convert each letter of  the current name in one-hot tensors
                output = torch.zeros(1, len(ix_to_char)) #first input is null
                loss = 0
                counter = 0
                true = torch.LongTensor(len(word)) #will contains the index of each letter.If word is "badu" => [2,1,4,22,0]

                measured = torch.zeros(len(word)) # will contains the vectors returned by the model for each letter (softmax output) 


                for t in range(len(word_vector)): # for each letter of current word
                    true[counter] = char_to_ix[word[counter]] # char_to_ix return the index of letter in dictionary

                    output, hidden = model(output, hidden)

                    if (counter ==0):
                        measured = output
                    else: #measures is a tensor containing tensors of probability distribution
                        measured = torch.cat((measured,output),dim=0)
                    counter+=1

                loss = nn.CrossEntropyLoss()(measured, true) #
                loss.backward()
                optimizer.step()

字母字典(ix_to_char)如下:

{0:'\n',1:'a',2:'b',3:'c',4:'d',5:'e',6:'f',7:'g', 8:'h',9:'i',10:'j',11:'k',12:'l',13:'m',14:'n',15:'o',16: 'p',17:'q',18:'r',19:'s',20:'t',21:'u',22:'v',23:'w',24:'x ',25:'y',26:'z'}

每 2000 个 epoch,我使用这个函数采样一些新单词,使用 torch multimonial 根据模型返回的 softmax 概率选择一个字母:

def sampling(model):
    idx = -1 
    counter = 0
    newline_character = char_to_ix['\n']

    x = torch.zeros(1,len(ix_to_char))
    hidden = torch.zeros(1, len(ix_to_char))
    generated_word=""


    while (idx != newline_character and counter != 35):
        x,hidden = model(x, hidden)
        #print(x)
        counter+=1
        idx = torch.multinomial(x,1)
        #print(idx.item())
        generated_word+=ix_to_char[idx.item()]
    if counter ==35:
        generated_word+='\n'
    print(generated_word)

以下是第一次显示的结果:

epoch:1, loss:3.256033420562744
aaasaaauasaaasasauaaaaapsaaaasaaaaa

aaaaaaaaaaaaasaaaoaaaaaauaaaaaaaaaa

taaaauasaasaaaaasaaasaauaaaaaaaausa

uaasaaaaauaaaasasssaauaaaaasaaaaaaa

auaaaaaaaassasaaauaaaaaaaaasasaaaas

epoch:2, loss:3.199960231781006
aaasaaassussssusssussssssssssusssss

aasaaassssssssssssasusssissssssssss

sasaaassssuosasssssssssssssssssssss

aasassasassusssssssssussssssssssuss

oasaasassssssussssssssussssssssssss

epoch:3, loss:3.263746500015259
aaaaaaasaaaasaaaaasaaaasaaaaaaaaaaa

aaaaaaasaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaaaaaaaaaaaaaaaaauaaaaaaaaaaas

aaaaaaaasaaaasraaaaaaaaaaaaaaaaaaaa

aaaaaaaaaaaaauusaaaaauaaaaaaaaaaaaa

它不起作用,但我不知道如何解决这个问题。在完全没有训练的情况下,采样函数似乎可以工作,因为返回的单词似乎是完全随机的:

hbtpsbykkxvlah

ttiwlzxdxabzmbdvsapsnwwpaoiasotalft

我的帖子可能有点长,但到目前为止我不知道我的程序有什么问题。

非常感谢您的帮助。

1个回答

您的转发功能未使用先前的隐藏状态。

观察:你通过隐藏但从不使用它。

def forward(self, input, hidden):
    layer = self.hiddenWx1(input)
    layer = self.hiddenWx2(layer)
    a_next = self.tanh(layer)
    z = self.z1(a_next)
    z = self.z2(z)
    y_next = self.softmax(z)
    return y_next,a_next