神经网络异或门分类

数据挖掘 神经网络 反向传播
2022-03-06 07:13:15

我编写了一个可以预测 XOR 门函数的简单神经网络。我想我已经正确地使用了数学,但是损失并没有下降并且保持在 0.6 附近。谁能帮我找出原因?

import numpy as np
import matplotlib as plt

train_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
train_Y = np.array([[0,1,1,0]])
test_X = np.array([[0,0],[0,1],[1,0],[1,1]]).T
test_Y = np.array([[0,1,1,0]])

learning_rate = 0.1
S = 5

def sigmoid(z):
    return 1/(1+np.exp(-z))

def sigmoid_derivative(z):
    return sigmoid(z)*(1-sigmoid(z))

S0, S1, S2 = 2, 5, 1
m = 4

w1 = np.random.randn(S1, S0) * 0.01
b1 = np.zeros((S1, 1))
w2 = np.random.randn(S2, S1) * 0.01
b2 = np.zeros((S2, 1))

for i in range(1000000):
    Z1 = np.dot(w1, train_X) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(w2, A1) + b2
    A2 = sigmoid(Z2)

    J = np.sum(-train_Y * np.log(A2) + (train_Y-1) * np.log(1-A2)) / m

    dZ2 = A2 - train_Y
    dW2 = np.dot(dZ2, A1.T) / m
    dB2 = np.sum(dZ2, axis = 1, keepdims = True) / m
    dZ1 = np.dot(w2.T, dZ2) * sigmoid_derivative(Z1)
    dW1 = np.dot(dZ1, train_X.T) / m
    dB1 = np.sum(dZ1, axis = 1, keepdims = True) / m

    w1 = w1 - dW1 * 0.03
    w2 = w2 - dW2 * 0.03
    b1 = b1 - dB1 * 0.03
    b2 = b2 - dB2 * 0.03

    print(J)
2个回答

我通过三个更改解决了这个问题:

  1. 一开始权重太小,去掉缩放
  2. 我认为偏差也应该随机初始化:
w1 = np.random.randn(S1, S0) 
b1 = np.random.randn(S1, 1)
w2 = np.random.randn(S2, S1) 
b2 = np.random.randn(S2, 1)
  1. 您正在使用 0.03 的固定学习率,将其更改为使用学习率,您也可以增加它:

    learning_rate = 0.1 ...

    w1 = w1 - dW1 * learning_rate
    w2 = w2 - dW2 * learning_rate
    b1 = b1 - dB1 * learning_rate
    b2 = b2 - dB2 * learning_rate

下图显示了我对 J 的进展所得到的信息:

在此处输入图像描述

只是为了好奇,如果我们用 ReLU 改变 sigmoid 函数,看看 J 会发生什么(在代码中我没有改变 defs 的名称......)。ReLU 的学习速度要快得多。

在此处输入图像描述


    def sigmoid(z):
    return np.maximum(0,z)

    def sigmoid_derivative(x):
    x[x < = 0] = 0
    x[x > 0]  = 1
    return x

这是来自 Github Repo Mine的 tf 中的相同内容。

import tensorflow as tf    

input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]  # XOR input
output_data = [[0.], [1.], [1.], [0.]]  # XOR output

n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")

hidden_nodes = 5

b_hidden_1 = tf.Variable(tf.random_normal([hidden_nodes]), name="hidden_bias")
W_hidden_1 = tf.Variable(tf.random_normal([2, hidden_nodes]), name="hidden_weights")
hidden_1 = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)

W_output = tf.Variable(tf.random_normal([hidden_nodes, 1]), name="output_weights")  # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output))  # calc output layer's activation

cross_entropy = tf.square(n_output - output)  # simpler, but also works

loss = tf.reduce_mean(cross_entropy)  # mean the cross_entropy
optimizer = tf.train.AdamOptimizer(0.01)  # take a Adam Optimizer for optimizing with a "stepsize" of 0.01
train = optimizer.minimize(loss)  # let the optimizer train

init = tf.initialize_all_variables()

sess = tf.Session()  # create the session and therefore the graph
sess.run(init)  # initialize all variables  

for epoch in range(0, 1000):
    # run the training operation
    cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
                       feed_dict={n_input: input_data, n_output: output_data})

    if epoch % 200 == 0:
        print("")
        print("step: {:>3}".format(epoch))
        print("loss: {}".format(cvalues[1]))

print("")
print("input: {} | output: {}".format(input_data[0], sess.run(output, feed_dict={n_input: [input_data[0]]})))
print("input: {} | output: {}".format(input_data[1], sess.run(output, feed_dict={n_input: [input_data[1]]})))
print("input: {} | output: {}".format(input_data[2], sess.run(output, feed_dict={n_input: [input_data[2]]})))