损失函数不起作用(RNN)

数据挖掘 机器学习 神经网络
2022-02-23 01:40:10

我正在按照Siraj Raval 的视频实现来构建 RNN 。我已经对其进行了调整以使用我的数据集,而不是从文件中导入数据集。当程序到达损失函数时,它会报告 loss += -np.log(ps[t][y[t], 0]) IndexError: index 7 is out of bounds for axis 0 with size 4

这是什么意思,我该如何解决?此外,在 Siraj 的版本中,他将损失计算为:

loss += -np.log(ps[t][y[t], 0])

不应该是这个吗

loss += -np.log(ps[t]) * [y[t], 0]

因为交叉熵损失是 L = -yln(yhat)?

我的代码:

import numpy as np

# Data Processing
x = np.array([
    # t/no. of inputs         
    [1, 2, 3, 4, 5, 6],
    [7, 8, 9, 10, 11, 12],     # Samples
    [13, 14, 15, 16, 17, 18],
    [19, 20, 21, 22, 23, 24]]).T

# Model Parameters
numInputs = x.shape[1] # Yields 4
numNeurons = 8 # Yields 8
numEntries = x.shape[0] # Yields 6

u = np.random.random((numNeurons, numInputs))
v = np.random.random((numInputs, numNeurons))
w = np.random.random((numNeurons, numNeurons))
bh = np.zeros((numNeurons, 1))
bo = np.zeros((numInputs, 1))
atimeline = [] # Contains 6 timestep's woth of a's
yhattimeline = [] # Contains 6 timestep's worth of yhat's
hprev = np.zeros((8, 1)) # Contains previous a state

# Training
def loss(x, y, hprev):
    xs, hs, ys, ps = {}, {}, {}, {}
    hs[-1] = np.copy(hprev) # Copies hprev so hprev can still be used
                            # Adds key: -1, value: hprev, to the dict
    loss = 0
    xs[0] = np.zeros((numInputs, 1)) # Sets t = 0 to 0's
    for t in xrange(numEntries):
        xs[t + 1] = x[t] # Adds in data from x to the dict
        hs[t] = np.tanh(np.dot(u, xs[t]) + np.dot(w, hs[t - 1]) + bh)
        ys[t] = np.dot(v, hs[t]) + bo
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # Softmax
        loss += -np.log(ps[t][y[t], 0])
    du, dv, dw = np.zeros_like(u), np.zeros_like(v), np.zeros_like(w)
    dbh, dbo = np.zeros_like(bh), np.zeros_like(bo)
    dhnext = np.zeros_like(hs[0])
    for t in reversed(xrange(numEntries)):
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1 # Derived from dL/dyhat
        dv += np.dot(dy, hs[t].T)
        dbo += dy
        dh = np.dot(v.T, dy) + dhnext
        dhraw = (1 - hs[t] * hs[t]) * dh # tanh
        dbh += dhraw
        du += np.dot(dhraw, xs[t].T)
        dw += np.dot(dhraw, xs[t - 1].T)
        dhnext = np.dot(w.T, dhraw)
        return du, dv, dw, dbo, dbh, hs[numEntries - 1]

u, v, w, dbo, dbh, hprev = loss(x, x, hprev)
1个回答

发生的事情是

xs[0] = np.zeros((numInputs, 1))

生成 ((4, 1)) 的数组,并且

xs[t + 1] = x[t]

生成 ((1, 4)) 的数组。为了解决这个问题,我转置了 x[t],所以现在它的形状与上面相同 - ((4, 1)):

xs[t + 1] = np.array([x[t]]).T

烦人的错误,很难解决,但一个相当简单的解决方案。