我正在尝试创建一个神经网络,可以学习如何从大卫科波菲尔(通过古腾堡项目)一书中逐个字符地编写文本。
它开始很好,然后在 25 纪元左右忘记了标点符号,并在 26 纪元变成了废话。我一直在努力寻找一个可以开始解决这个问题的起点。我已经阅读了有关剪裁渐变以防止其消失或爆炸的概念的研究论文,但是我很难找到一种方法来首先可视化渐变以及出了什么问题,然后最多如何剪裁它适当的值。
我已经保存了所有 50 个时代的检查点模型。
我已经尝试根据一篇关于 LSTM 中梯度裁剪的研究论文,在 5 处使用梯度裁剪,但它并没有改变任何东西。我不需要预算来通过实验找到最佳值,但如果这是唯一的方法,我会让它发挥作用。
我已经为此工作了很长时间,但我是自学成才的,在这里感觉有点超出我的深度。非常感谢主题专家向正确方向轻推。
第 20 纪元:
我看到 Traddles 在房子里提供服务,人有点像,柜台的大手是 Omer 先生的潜在客户,他回答说要走了,所以他的男人的房子让人分心他在门上,最后一次听到她的声音后,她精神一振,还有米考伯先生,喝了那么多啤酒,当我是那个男人的时候。我可以被这样看,以使我所有的黑人都渴望得到约束,并且几乎对他的更多手臂感到满意,因为我很高兴,以至于她问我的时间和她自己,我在这样一个绿色的窗户里度过了多少岁月最夜的眼睛是亲爱的,但那还有你吗?我开始说亲爱的,她不是现在的话,长了一个更好的后手,如果他说的那只手坐了几个星期,那么在寻找时可以松一口气,
第 25 纪元:
我和我一起在我们父亲梦想的阴影和男人中完成了,可以说,但他说的是男人。我认为他已经超越了在可以说更多的悲伤中不被爱的欲望。
第 26 纪元:
spi , i ed o eoa ti i. nhsaae st?hhthn
t cha,ptr ieto t wo a hw ne s uawpai ,y na aet dttte t?atsh oh hi au aaa ddn e haf es t.rooe wt etdt s
ta ,, iteahe a, dt os rhr tis m elrii ea ao ty otatp ya rta ty o he, ee , ss i. hn eo te aoo se shu te senea e tt sd ew , ie s I ee ihi hnts a an r rv otso ta eshne tta o tt
eomlh arnt led n sa aaeh tww n ee th ha ,tdeh te nntt i atnr,e wt eee hb atn oea ae ei t -。de eaesen atehh heas ef en to hr d , eh In.io st watn t htih e io tt axhss , esr ohsmtldal er e
n, t,dtthan,ths hdhe oa oh tbh t ot is oe tr et ttnm an ot ng ca ds tr .s that t ewehson n sr oe se iee httst bit tt 。tn I he ow so tt t,w sttt nta tai o
代码如下:(我从 Tensorflow 开始,但为了简单起见切换到 TFLearn。不过,我愿意学习任何具有解决问题的工具的框架。我是一个自学成才的学生,所以学习真的是唯一的目标在这里。)
import time
start_script_time = time.time()
import numpy as np
import tflearn
import random
import pickle
''' create data '''
log_file = 'dickens_log.txt'
def my_log(text, filename=log_file):
text = str(text)
print(text)
with open(filename, 'a', newline='\n') as file:
file.write(text + '\n')
try:
book_name = 'as_loaded.txt'
book = open(book_name, errors='ignore',
encoding='ascii', newline='\n').read()
except:
book_name = 'copperfield.txt'
book = open(book_name, errors='ignore',
encoding='utf-8', newline='\n').read()
#book = book.replace('\r', '')
#book = book.replace('\n', ' ')
with open('as_loaded.txt', 'w', newline='\n') as file:
file.write(book)
# make smaller slice for quickly testing code on CPU
# book = book[0:1500]
# del(book_name)
# length of strings in the training set
string_length = 30
def process_book(book, string_length, redundant_step=3):
# Remember to pickle to dictionary as a binary. This is pretty critical for loading your model on a different machine than you trained on.
try:
pickle_ld = open('charDict.pi', 'rb')
charDict = pickle.load(pickle_ld)
pickle_ld.close()
except:
# dictionary of character-number pairs
chars = sorted(list(set(book)))
charDict = dict((c, i) for i, c in enumerate(chars))
#charDict.pop('\r')
pickle_sv = open('charDict.pi', 'wb')
pickle.dump(charDict, pickle_sv)
pickle_sv.close()
len_chars = len(charDict)
# train is a string input and target is the
# expected next character
train = []
target = []
for i in range(0, len(book)-string_length, redundant_step):
train.append(book[i:i+string_length])
target.append(book[i+string_length])
# create containers for data with appropriate dimensions
# 3D (n_samples, sample_size, n_categories)
X = np.zeros((len(train), string_length, len_chars), dtype=np.bool)
# 2D (n_samples, n_categories)
y = np.zeros((len(train), len_chars), dtype=np.bool)
# fill arrays
for i, string in enumerate(train):
for j, char in enumerate(string):
# X is a sparse 3D tensor where a 1 value signals
# that a information is present in 3rd dimension index
X[i, j, charDict[char]] = 1
y[i, charDict[target[i]]] = 1
return charDict, X, y
charDict, X, y = process_book(book, string_length)
''' build the network '''
# number of hidden layers in each LSTM layer
lstm_hidden = 512
drop_rate = 0.5
net = tflearn.input_data(shape=(None, string_length, len(charDict)))
# input shape is the length of the strings by the number of characters
# leading None is necessary if no placeholders
net = tflearn.lstm(net, lstm_hidden, return_seq=True)
net = tflearn.dropout(net, drop_rate)
# You have to use a separate dropout layer. There's a glitch where tflean
# will drop out all the time, not just during training, making prediction
# impossible.
net = tflearn.lstm(net, lstm_hidden, return_seq=True)
net = tflearn.dropout(net, drop_rate)
net = tflearn.lstm(net, lstm_hidden, return_seq=False)
net = tflearn.dropout(net, drop_rate)
net = tflearn.fully_connected(net, len(charDict), activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.005)
# https://www.quora.com/What-is-gradient-clipping-and-why-is-it-necessary
model = tflearn.SequenceGenerator(net, dictionary=charDict,
seq_maxlen=string_length,
clip_gradients=5,
checkpoint_path='model_checkpoint_v3')
my_log('Character dictionary for ' + book_name)
my_log(charDict)
my_log('charDict length: ' + str(len(charDict)))
my_log('&&&&&&&&&&&&&&&&&')
def random_seed_test(book, temp=0.5, gen_length=300):
my_log('#######################')
seed_no = random.randint(0, len(book) - string_length)
seed = book[seed_no : seed_no + string_length]
my_log('(temp ' + str(temp) + ') ' + 'Seed: "' + seed + '"')
my_log('++++++++++++++++++++++')
my_log(model.generate(seq_length=gen_length, temperature=temp,
seq_seed=seed))
my_log('#######################')
# If you train one epoch at a time in a loop, you can get an idea
# of how the model progressed. With other ML problems, error rate and
# accuracy reveal a lot, but with this problem performance is subjective.
for epoch in range(50):
start_epoch = time.time()
my_log('======================================================')
my_log('Begin epoch %d' % (epoch+1))
model.fit(X, y, validation_set=0.1, batch_size=128, n_epoch=1)
my_log('End epoch %d' % (epoch+1))
epoch_time = time.time() - start_epoch
my_log('This epoch took ' + str(epoch_time) + ' seconds.')
random_seed_test(book, temp=0.5, gen_length=1000)
random_seed_test(book, temp=0.75, gen_length=1000)
random_seed_test(book, temp=1.0, gen_length=1000)
my_log('End epoch %d' % (epoch+1))
my_log('======================================================')
full_time = time.time() - start_script_time
my_log('This program took ' + str(full_time) + ' seconds.')
model.save('dickens_compute_4.model')
my_log('finished')