伪代码中的 RNN

数据挖掘 神经网络 lstm rnn 执行
2021-09-27 18:25:41

几年前,当我从头开始编写一个实现(仅使用 Python + Numpy,不使用 tensorflow)时,我对经典 MLP 神经网络的理解要好得多。现在我想对循环神经网络做同样的事情。

对于具有密集层的标准 MLP NN,前向传播可以总结为:

def predict(x0):
    x = x0
    for i in range(numlayers-1):
        y = dot(W[i], x) + B[i]     # W[i] is a weight matrix, B[i] the biases 
        x = activation[i](y)
    return x

对于给定的单层,这个想法只是:

output_vector = activation(W[i] * input_vector + B[i])

一个简单的 RNN 层的等价物是什么,例如。SimpleRNN?


更准确地说,让我们以这样的 RNN 层为例:
输入形状:(None, 250, 32)
输出形状:(None, 100)
给定形状为 (250, 32) 的输入x,我可以使用哪个伪代码生成y形状(100,)的输出,当然是通过使用权重等?

2个回答

简单的 RNN 单元遵循以下模式:

Given the following data:
    input data:         X
    weights:            wx
    recursive weights:  wRec

Initialize initial hidden state to 0

For each state, one by one:
    Update new hidden state as: (Input data * weights) + (Hidden state + recursive weights)

在 Python 代码中:

def compute_states(X, wx, wRec):
    """
    Unfold the network and compute all state activations 
    given the input X, input weights (wx), and recursive weights 
    (wRec). Return the state activations in a matrix, the last 
    column S[:,-1] contains the final activations.
    """
    # Initialise a matrix that holds all states for all input sequences.
    # The initial state s_0 is set to 0, each of the others will depend from the previous.
    S = np.zeros((X.shape[0], X.shape[1]+1))

    # Compute each state k from the previous state ( S[:,k] ) and current input ( X[:,k] ), 
    # by use of the input weights (wx) and recursive weights (wRec).
    for k in range(0, X.shape[1]):
        S[:,k+1] = (X[:,k] * wx) + (S[:,k] * wRec)

    return S

这是我在这里找到的代码的更清晰的版本

这对您有帮助吗?

换句话说,RNN 的前向传播是什么样的。您阅读了有关使用来自前一个节点的输入加上值(这里将是 prev_s)首先初始化权重,然后执行前向传递。我突出显示了您要查找的内容。

U = np.random.uniform(0, 1, (hidden_dim, T))
W = np.random.uniform(0, 1, (hidden_dim, hidden_dim))
V = np.random.uniform(0, 1, (output_dim, hidden_dim))


 for i in range(Y.shape[0]):
        x, y = X[i], Y[i]

        layers = []
        prev_s = np.zeros((hidden_dim, 1))
        dU = np.zeros(U.shape)
        dV = np.zeros(V.shape)
        dW = np.zeros(W.shape)

        dU_t = np.zeros(U.shape)
        dV_t = np.zeros(V.shape)
        dW_t = np.zeros(W.shape)

        dU_i = np.zeros(U.shape)
        dW_i = np.zeros(W.shape)

        # forward pass
        for t in range(T):
            new_input = np.zeros(x.shape)
            new_input[t] = x[t]
            mulu = np.dot(U, new_input)
            mulw = np.dot(W, prev_s)
            add = mulw + mulu
            s = sigmoid(add)
            ***mulv = np.dot(V, s)***
            layers.append({'s':s, 'prev_s':prev_s})
            prev_s = s

所以' * * '区域可以大致翻译为:mulv = np.dot(V, s) 是权重乘以当前状态。(与之前相同,s==input_vector)但不同之处在于 s 将使用来自先前输出和当前输入的权重来计算,即

mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)

这就是为什么我们首先有 3 个初始权重。