数据挖掘 - 伪代码中的 RNN - 吾爱随笔录

伪代码中的 RNN

数据挖掘神经网络 lstm rnn 执行

2021-09-27 18:25:41

几年前，当我从头开始编写一个实现（仅使用 Python + Numpy，不使用 tensorflow）时，我对经典 MLP 神经网络的理解要好得多。现在我想对循环神经网络做同样的事情。

对于具有密集层的标准 MLP NN，前向传播可以总结为：

def predict(x0):
    x = x0
    for i in range(numlayers-1):
        y = dot(W[i], x) + B[i]     # W[i] is a weight matrix, B[i] the biases 
        x = activation[i](y)
    return x

对于给定的单层，这个想法只是：

output_vector = activation(W[i] * input_vector + B[i])

一个简单的 RNN 层的等价物是什么，例如。SimpleRNN?

更准确地说，让我们以这样的 RNN 层为例：
输入形状：(None, 250, 32)
输出形状：(None, 100)
给定形状为 (250, 32) 的输入x，我可以使用哪个伪代码生成y形状（100，）的输出，当然是通过使用权重等？

2个回答

简单的 RNN 单元遵循以下模式：

Given the following data:
    input data:         X
    weights:            wx
    recursive weights:  wRec

Initialize initial hidden state to 0

For each state, one by one:
    Update new hidden state as: (Input data * weights) + (Hidden state + recursive weights)

在 Python 代码中：

def compute_states(X, wx, wRec):
    """
    Unfold the network and compute all state activations 
    given the input X, input weights (wx), and recursive weights 
    (wRec). Return the state activations in a matrix, the last 
    column S[:,-1] contains the final activations.
    """
    # Initialise a matrix that holds all states for all input sequences.
    # The initial state s_0 is set to 0, each of the others will depend from the previous.
    S = np.zeros((X.shape[0], X.shape[1]+1))

    # Compute each state k from the previous state ( S[:,k] ) and current input ( X[:,k] ), 
    # by use of the input weights (wx) and recursive weights (wRec).
    for k in range(0, X.shape[1]):
        S[:,k+1] = (X[:,k] * wx) + (S[:,k] * wRec)

    return S

这是我在这里找到的代码的更清晰的版本。

这对您有帮助吗？

换句话说，RNN 的前向传播是什么样的。您阅读了有关使用来自前一个节点的输入加上值（这里将是 prev_s）首先初始化权重，然后执行前向传递。我突出显示了您要查找的内容。

U = np.random.uniform(0, 1, (hidden_dim, T))
W = np.random.uniform(0, 1, (hidden_dim, hidden_dim))
V = np.random.uniform(0, 1, (output_dim, hidden_dim))


 for i in range(Y.shape[0]):
        x, y = X[i], Y[i]

        layers = []
        prev_s = np.zeros((hidden_dim, 1))
        dU = np.zeros(U.shape)
        dV = np.zeros(V.shape)
        dW = np.zeros(W.shape)

        dU_t = np.zeros(U.shape)
        dV_t = np.zeros(V.shape)
        dW_t = np.zeros(W.shape)

        dU_i = np.zeros(U.shape)
        dW_i = np.zeros(W.shape)

        # forward pass
        for t in range(T):
            new_input = np.zeros(x.shape)
            new_input[t] = x[t]
            mulu = np.dot(U, new_input)
            mulw = np.dot(W, prev_s)
            add = mulw + mulu
            s = sigmoid(add)
            ***mulv = np.dot(V, s)***
            layers.append({'s':s, 'prev_s':prev_s})
            prev_s = s

所以' * * '区域可以大致翻译为：mulv = np.dot(V, s) 是权重乘以当前状态。（与之前相同，s==input_vector）但不同之处在于 s 将使用来自先前输出和当前输入的权重来计算，即

mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)

这就是为什么我们首先有 3 个初始权重。

其它你可能感兴趣的问题

上一篇如何解决线性分类问题的梯度下降？下一篇具有连续和分类特征的数据的特征选择？