神经网络反向传播算法对权重做了什么改变?

数据挖掘 机器学习 神经网络 训练 算法 反向传播
2021-10-14 05:00:55

我已经看到了用于神经网络误差最小化的反向传播算法的公式,但我不太确定它对权重单独执行了哪些更改。

让我们假设一个简单的神经网络如下:

在此处输入图像描述

连接节点之间的所有权重都初始化为 1。

没有激活函数,因此上一层的权重和输入的点积不变地传输到下一层。

在这里,原始输入是 1 和 1。因此,在这种情况下,最终输出变为 4。假设我们想改变权重,使“校正”的输出与期望的目标输出 = 1 完全匹配。

在此处输入图像描述

我的问题是,在应用反向传播算法规定的更改后,得到的权重值会是多少(如上图所示)?

谢谢你。

1个回答

正如评论中已经提到的,我强烈建议手动进行这种计算(至少在最初和像这样的简单情况下)。

无论如何,这里有一个 python 代码,你可以使用它来做你想做的事情。

注意:结果很大程度上取决于学习率和时期数。

import numpy as np # I only need numpy for this

# Cost function 
def J(y_true, y_pred):
    """
        Cost or Loss function
    """ 
    return ((y_true - y_pred) ** 2).mean()

class NeuralNetwork:
    """
        A neural network with:
            - 2 inputs
            - a hidden layer with 2 neurons (a1, a2)
            - an output layer with 1 neuron (h)
    """

    def __init__(self):
        # Initialize weights and biases
        self.beta0 = np.ones(shape=(2,3))
        self.beta1 = np.ones(shape=(1,3))
        self.beta0[0,0] = 0
        self.beta0[1,0] = 0
        self.beta1[0,0] = 0
    

    def feedforward(self, x):
        a11 = self.beta0[0,1] * x[0] + self.beta0[0,2] * x[1] + self.beta0[0,0]
        a12 = self.beta0[1,1] * x[0] + self.beta0[1,2] * x[1] + self.beta0[1,0]
        h = self.beta1[0,1] * a11 + self.beta1[0,2] * a12 + self.beta1[0,0]
        return h

    def train(self, ground_truth_dataset, epoch, lr):
        # ground_truth_dataset: has shape (n, 3), where n is the number of items.
        # epoch: the number of times to loop through the entire ground truth dataset
        # lr: learning rate

        epochs = []
        min_losses = []
        avg_losses = []
        max_losses = []
        y_trues = np.array(ground_truth_dataset)[:, 2]

        for ep in range(epoch):
            costs = []
            for item in ground_truth_dataset:
                # input
                x1, x2 = item[:2]

                # real result
                y_true = item[2]

                # ====== Feed forward ======
                # Neuron a1
                z1 = self.beta0[0,1] * x1 + self.beta0[0,2] * x2 + self.beta0[0,0]
                a1 = z1 # I use such identity her for the sake of clarity, since you do not have activation function.

                # Neuron a2
                z2 = self.beta0[1,1] * x1 + self.beta0[1,2] * x2 + self.beta0[1,0]
                a2 = z2

                # Neuron h
                z = self.beta1[0,1] * a1 + self.beta1[0,2] * a2 + self.beta1[0,0]
                h = z
                y_pred = h

                cost = J(y_true, y_pred)
                costs.append(cost)

                # ====== Back propagation ======
                # Calculate gradients for OUTPUT layer
                dJ_dy_pred = -2 * (y_true - y_pred)

                dJ_db101 = dJ_dy_pred  * a1
                dJ_db102 = dJ_dy_pred  * a2
                dJ_db100 = dJ_dy_pred

                # Calculate gradients for HIDDEN layer
                dJ_db001 = dJ_dy_pred * self.beta1[0,1] * x1
                dJ_db002 = dJ_dy_pred * self.beta1[0,1] * x2
                dJ_db011 = dJ_dy_pred * self.beta1[0,2] * x1
                dJ_db012 = dJ_dy_pred * self.beta1[0,2] * x2
                dJ_db000 = dJ_dy_pred * self.beta1[0,1]
                dJ_db010 = dJ_dy_pred * self.beta1[0,2]

                # Update weights and biases
                self.beta0[0,1] -= lr * dJ_db001
                self.beta0[0,2] -= lr * dJ_db002
                self.beta0[1,1] -= lr * dJ_db011
                self.beta0[1,2] -= lr * dJ_db012
                self.beta1[0,1] -= lr * dJ_db101
                self.beta1[0,2] -= lr * dJ_db102

                self.beta0[0,0] -= lr * dJ_db000
                self.beta0[1,0] -= lr * dJ_db010
                self.beta1[0,0] -= lr * dJ_db100

            epochs.append(ep)
            min_losses.append(min(costs))
            avg_losses.append(sum(costs) / len(costs))
            max_losses.append(max(costs))
            if ep % 10 == 0:
                print("Epoch {}: min_loss = {}, avg_loss = {}, max_loss = {}".format(
                    ep, min_losses[ep], avg_losses[ep], max_losses[ep]))

                print(f"w_1 = {self.beta0[0,1]}, w_2 = {self.beta0[0,2]}, w_3 = {self.beta0[1,1]}, w_4 = {self.beta0[1,2]}, w_5 = {self.beta1[0,1]}, w_6 = {self.beta1[0,2]}")

        return (epochs, min_losses, avg_losses, max_losses)


现在您已准备好训练您的网络。你需要指定所有的训练参数并给出一个训练集:

%matplotlib inline
%config InlineBackend.figure_format = "retina"

import matplotlib.pyplot as plt
import pickle


ground_truth_dataset = [
  [1, 1, 1]
]

epoch = 10
learning_rate = 0.1

n = NeuralNetwork()

stats = n.train(ground_truth_dataset, epoch, learning_rate)
epochs = stats[0]
min_losses = stats[1]
avg_losses = stats[2]
max_losses = stats[3]


plt.figure(figsize=(16,10))
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.plot(epochs, min_losses, label="Min loss")
plt.plot(epochs, avg_losses, label="Avg loss")
plt.plot(epochs, max_losses, label="Max loss")
plt.legend(loc="center")
plt.show();

享受!

附加说明:您可能会注意到您的体重都是一样的。这些是因为您以对称方式而不是随机方式初始化它们。此外,没有激活函数意味着没有非线性。这意味着您的 2 层网络等效于线性回归器。