数据挖掘 - 在 PyTorch 词嵌入上启用小批量处理 - 吾爱随笔录

我是 PyTorch 的新手，正在尝试创建词嵌入。我从下面的示例开始，一切正常，并且相对较快地完成。

CONTEXT_SIZE = 2
EMBEDDING_DIM = 10
# We will use Shakespeare Sonnet 2
test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()
# we should tokenize the input, but we will ignore that for now
# build a list of tuples.  Each tuple is ([ word_i-2, word_i-1 ], target word)
trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
            for i in range(len(test_sentence) - 2)]
# print the first 3, just so you can see what they look like
print(trigrams[:3])

vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}


class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs


losses = []
loss_function = nn.NLLLoss()
model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)

for epoch in range(10):
    total_loss = torch.Tensor([0])
    for context, target in trigrams:

        # Step 1. Prepare the inputs to be passed to the model (i.e, turn the words
        # into integer indices and wrap them in variables)
        context_idxs = torch.tensor([word_to_ix[w] for w in context], dtype=torch.long)

        # Step 2. Recall that torch *accumulates* gradients. Before passing in a
        # new instance, you need to zero out the gradients from the old
        # instance
        model.zero_grad()

        # Step 3. Run the forward pass, getting log probabilities over next
        # words
        log_probs = model(context_idxs)

        # Step 4. Compute your loss function. (Again, Torch wants the target
        # word wrapped in a variable)
        loss = loss_function(log_probs, torch.tensor([word_to_ix[target]], dtype=torch.long))

        # Step 5. Do the backward pass and update the gradient
        loss.backward()
        optimizer.step()

        # Get the Python number from a 1-element Tensor by calling tensor.item()
        total_loss += loss.item()
    losses.append(total_loss)
print(losses)  # The loss decreased every iteration over the training data!

当我添加自己的中型语料库时，该过程需要很长时间，因为上面的示例没有包含小批量的概念。因此，我决定尝试在流程中实施小批量处理。

首先，我将上下文 ID 与目标一起转换为 2d 张量：

context_idxs = []
targets = []
for context, target in trigrams:
    # Step 1. Prepare the inputs to be passed to the model (i.e, turn the words
    # into integer indices and wrap them in variables)
    context_idxs.append(torch.Tensor([word_to_ix[w] for w in context]))
    targets.append(torch.Tensor([word_to_ix[target]]))

context_ids = torch.stack(context_idxs, dim=0)
target_ids = torch.stack(targets, dim=0)

接下来我尝试使用这样的小批量运行：

    current_start = 0
    keep_going = True
    while keep_going:
        if current_start + MINI_BATCH < len(target_ids):
            minibatchids = slice(current_start, current_start + MINI_BATCH -1)
            print(minibatchids)
            current_start = current_start + MINI_BATCH
        else:
            minibatchids = slice(current_start, len(target_ids))
            print(minibatchids)
            keep_going = False

        model.zero_grad()

        # Step 3. Run the forward pass, getting log probabilities over next
        # minibatch of words
        log_probs = model(context_ids[minibatchids])

PyTorch 抛出以下错误：

Traceback (most recent call last):
  File "/mnt/data/projects/PyTorchTutorial/IntroTorch/PyTorch_WordEmbedding_BeigeBook.py", line 102, in <module>
    log_probs = model(context_ids[minibatchids])
  File "/home/david/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/data/projects/PyTorchTutorial/IntroTorch/PyTorch_WordEmbedding_BeigeBook.py", line 25, in forward
    embeds = self.embeddings(inputs).view((1, -1))
  File "/home/david/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/david/miniconda3/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 108, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/david/miniconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1076, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CPUFloatTensor instead (while checking arguments for embedding)

我试图强调我的代码是如何从上面的教程中改变的。很高兴添加更多细节，但不确定还有什么有用的。

这给我留下了几个问题：

我正在尝试的甚至可能吗？

如果是这样的话：

我在正确的道路上吗？
有没有小批量实现的例子（我找不到）
错误的含义是什么？