在完成 Andrew Ng 的 Coursera 课程后,我想再次实现简单的 RNN,用于基于包含大约 800 个恐龙名称的文本文件生成恐龙名称。这是用 coursera 中的 Numpy 完成的,这里是一个 Jupyter notebook 的链接(不是我的 repo)来获取策略和完整的目标: 这里
我开始了类似的实现,但在 Pytorch 中,模型如下:
class RNN(nn.Module):
def __init__(self,input_size):
super(RNN, self).__init__()
print("oo")
self.hiddenWx1 = nn.Linear(input_size, 100)
self.hiddenWx2 = nn.Linear(100, input_size)
self.z1 = nn.Linear(input_size,100)
self.z2 = nn.Linear(100,input_size)
self.tanh = nn.Tanh()
self.softmax = torch.nn.Softmax(dim=1)
def forward(self, input, hidden):
layer = self.hiddenWx1(input)
layer = self.hiddenWx2(layer)
a_next = self.tanh(layer)
z = self.z1(a_next)
z = self.z2(z)
y_next = self.softmax(z)
return y_next,a_next
这是训练的主要算法:
for word in examples: # for every dinosaurus name
model.zero_grad()
hidden= torch.zeros(1, len(ix_to_char)) #initialise hidden to null, ix_to_char is below
word_vector = word_tensor(word) # convert each letter of the current name in one-hot tensors
output = torch.zeros(1, len(ix_to_char)) #first input is null
loss = 0
counter = 0
true = torch.LongTensor(len(word)) #will contains the index of each letter.If word is "badu" => [2,1,4,22,0]
measured = torch.zeros(len(word)) # will contains the vectors returned by the model for each letter (softmax output)
for t in range(len(word_vector)): # for each letter of current word
true[counter] = char_to_ix[word[counter]] # char_to_ix return the index of letter in dictionary
output, hidden = model(output, hidden)
if (counter ==0):
measured = output
else: #measures is a tensor containing tensors of probability distribution
measured = torch.cat((measured,output),dim=0)
counter+=1
loss = nn.CrossEntropyLoss()(measured, true) #
loss.backward()
optimizer.step()
字母字典(ix_to_char)如下:
{0:'\n',1:'a',2:'b',3:'c',4:'d',5:'e',6:'f',7:'g', 8:'h',9:'i',10:'j',11:'k',12:'l',13:'m',14:'n',15:'o',16: 'p',17:'q',18:'r',19:'s',20:'t',21:'u',22:'v',23:'w',24:'x ',25:'y',26:'z'}
每 2000 个 epoch,我使用这个函数采样一些新单词,使用 torch multimonial 根据模型返回的 softmax 概率选择一个字母:
def sampling(model):
idx = -1
counter = 0
newline_character = char_to_ix['\n']
x = torch.zeros(1,len(ix_to_char))
hidden = torch.zeros(1, len(ix_to_char))
generated_word=""
while (idx != newline_character and counter != 35):
x,hidden = model(x, hidden)
#print(x)
counter+=1
idx = torch.multinomial(x,1)
#print(idx.item())
generated_word+=ix_to_char[idx.item()]
if counter ==35:
generated_word+='\n'
print(generated_word)
以下是第一次显示的结果:
epoch:1, loss:3.256033420562744
aaasaaauasaaasasauaaaaapsaaaasaaaaa
aaaaaaaaaaaaasaaaoaaaaaauaaaaaaaaaa
taaaauasaasaaaaasaaasaauaaaaaaaausa
uaasaaaaauaaaasasssaauaaaaasaaaaaaa
auaaaaaaaassasaaauaaaaaaaaasasaaaas
epoch:2, loss:3.199960231781006
aaasaaassussssusssussssssssssusssss
aasaaassssssssssssasusssissssssssss
sasaaassssuosasssssssssssssssssssss
aasassasassusssssssssussssssssssuss
oasaasassssssussssssssussssssssssss
epoch:3, loss:3.263746500015259
aaaaaaasaaaasaaaaasaaaasaaaaaaaaaaa
aaaaaaasaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaauaaaaaaaaaaas
aaaaaaaasaaaasraaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaauusaaaaauaaaaaaaaaaaaa
它不起作用,但我不知道如何解决这个问题。在完全没有训练的情况下,采样函数似乎可以工作,因为返回的单词似乎是完全随机的:
hbtpsbykkxvlah
ttiwlzxdxabzmbdvsapsnwwpaoiasotalft
我的帖子可能有点长,但到目前为止我不知道我的程序有什么问题。
非常感谢您的帮助。