异或门二元分类的神经网络

数据挖掘 神经网络 反向传播
2022-02-13 08:38:29

我已经为 XOR 函数编写了这个神经网络。输出不正确。它没有正确分类测试输入。谁能告诉我原因。

import numpy as np
import pandas as pd
x=np.array([[0,0],[0,1],[1,0],[1,1]])
y=np.array([[0],[1],[1],[0]])
np.random.seed(0) 
theta1=np.random.rand(2,8)
theta2=np.random.rand(8,1)
np.random.seed(0)
b1=np.random.rand(4,8)
b2=np.random.rand(4,1)
alpha=0.01
lamda=0.01

for i in range(1,2000):
    z1=x.dot(theta1)+b1

    h1=1/(1+np.exp(-z1))
    z2=h1.dot(theta2)+b2
    h2=1/(1+np.exp(-z2))

    dh2=h2-y
    #back prop

    dz2=dh2*(1-dh2)
    H1=np.transpose(h1)
    dw2=np.dot(H1,dz2)
    db2=np.sum(dz2)

    W2=np.transpose(theta2)
    dh1=np.dot(dz2,W2)
    dz1=dh1*(1-dh1)

    X=np.transpose(x)
    dw1=np.dot(X,dz1)

    db1=np.sum(dz1)


    dw2=dw2-lamda*theta2

    dw1=dw1-lamda*theta1
    theta1=theta1-alpha*dw1
    theta2=theta2-alpha*dw2
    b1+=-alpha*db1

    b2+=-alpha*db2


#prediction
#test inputs
input1=np.array([[0,0],[1,1],[0,1],[1,0]])
z1=np.dot(input1,theta1)

h1=1/(1+np.exp(-z1))
z2=np.dot(h1,theta2)

h2=1/(1+np.exp(-z2))

预期输出=[0],[0],[1],[1] actual output=[[ 0.95678049] [ 0.99437206] [ 0.98686979] [ 0.98628204]]

这里都是。

1个回答

代码中存在一些错误,因此我将在此处提供带有注释的修订版本。

设置

import numpy as np
import pandas as pd
x=np.array([[0,0],[0,1],[1,0],[1,1]])
y=np.array([[0],[1],[1],[0]])
np.random.seed(0)

# Optional, but a good idea to have +ve and -ve weights
theta1=np.random.rand(2,8)-0.5
theta2=np.random.rand(8,1)-0.5

# Necessary - the bias terms should have same number of dimensions
# as the layer. For some reason you had one bias vector per example.
# (You could still use np.random.rand(8) and np.random.rand(1))
b1=np.zeros(8)
b2=np.zeros(1)

alpha=0.01
# Regularisation not necessary for XOR, because you have a complete training set.
# You could have lamda=0.0, but I have left a value here just to show it works.
lamda=0.001

训练 - 前向传播

# More iterations than you might think! This is because we have
# so little training data, we need to repeat it a lot.
for i in range(1,40000):
    z1=x.dot(theta1)+b1
    h1=1/(1+np.exp(-z1))
    z2=h1.dot(theta2)+b2
    h2=1/(1+np.exp(-z2))

训练 - 反向传播

    # This dz term assumes binary cross-entropy loss
    dz2 = h2-y 
    # You could also have stuck with squared error loss, the extra h2 terms
    # are the derivative of the sigmoid transfer function. 
    # It converges slower though:
    # dz2 = (h2-y) * h2 * (1-h2)

    # This is just the same as you had before, but with less temp variables
    dw2 = np.dot(h1.T, dz2)
    db2 = np.sum(dz2, axis=0)

    # The derivative of sigmoid is h1 * (1-h1), NOT dh1*(1-dh1)
    dz1 = np.dot(dz2, theta2.T) * h1 * (1-h1)
    dw1 = np.dot(x.T, dz1)
    db1 = np.sum(dz1, axis=0)

    # The L2 regularisation terms ADD to the gradients of the weights
    dw2 += lamda * theta2
    dw1 += lamda * theta1

    theta1 += -alpha * dw1
    theta2 += -alpha * dw2

    b1 += -alpha * db1
    b2 += -alpha * db2

预言

这是你可以踢自己的地方,你忘了使用偏见!

input1=np.array([[0,0],[1,1],[0,1],[1,0]])
z1=np.dot(input1,theta1)+b1
h1=1/(1+np.exp(-z1))
z2=np.dot(h1,theta2)+b2
h2=1/(1+np.exp(-z2))

print(h2)

当我运行上面的代码时,我得到了一个正确的输出

[[ 0.01031446]
 [ 0.0201576 ]
 [ 0.9824826 ]
 [ 0.98584079]]

总之,您的三个大错误是设置中偏差向量的错误维度,sigmoid 函数的不正确导数(使用正确的形式,但变量错误)以及在最后预测时完全忘记使用偏差。其他细节仍然值得注意,但不会阻止您获得一些工作。