学习按位异或的神经网络

数据挖掘 Python 神经网络 喀拉斯 深度学习
2021-09-17 01:43:51

我正在尝试构建一个深度神经网络来学习两个矩阵的坐标坐标按位异或,但它的性能很差。

例如,在 2 位的情况下,其精度保持在 0.5 左右。这是代码片段:

from keras.layers import Dense, Activation
from keras.layers import Input
import numpy as np 
from keras.layers.merge import concatenate
from keras.models import Model


size=1
data1 = np.random.choice([0, 1], size=(50000,size,size))
data2 = np.random.choice([0, 1], size=(50000,size,size))
labels  = np.bitwise_xor(data1, data2)
a = Input(shape=(size,size))
b = Input(shape=(size,size))
a1 = Dense(size, activation='sigmoid')(a)
b1 = Dense(size, activation='sigmoid')(b)
merged = concatenate([a1, b1])
hidden = Dense(1, activation='sigmoid')(merged)
hidden = Dense(3, activation='sigmoid')(hidden)
hidden = Dense(5, activation='relu')(hidden)
hidden = Dense(4, activation='sigmoid')(hidden)
hidden = Dense(3, activation='sigmoid')(hidden)
outputs = Dense(1, activation='relu')(hidden)

model = Model(inputs=[a, b], outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([data1, data2], np.array(labels), epochs=15, batch_size=32)

这里发生了什么?

Epoch 1/15
50000/50000 [==============================] - 7s 130us/step - loss: 0.7118 - acc: 0.5044
Epoch 2/15
50000/50000 [==============================] - 4s 78us/step - loss: 0.6933 - acc: 0.5023
Epoch 3/15
50000/50000 [==============================] - 4s 74us/step - loss: 0.6934 - acc: 0.5030
Epoch 4/15
50000/50000 [==============================] - 4s 86us/step - loss: 0.6935 - acc: 0.5002
Epoch 5/15
50000/50000 [==============================] - 4s 79us/step - loss: 0.6934 - acc: 0.5015
Epoch 6/15
50000/50000 [==============================] - 5s 96us/step - loss: 0.6935 - acc: 0.5030
Epoch 7/15
50000/50000 [==============================] - 5s 105us/step - loss: 0.6934 - acc: 0.5026
2个回答

我认为可能发生了一些事情。

您可能有一个原因,但我不知道您为什么将输入数据塑造成三个维度:size=(50000,size,size).

另外,您可能有一个原因,但我不知道为什么您通过不同的层(每个层都有一个隐藏单元)分别运行每个功能,然后在通过另一系列层运行合并的输出之前合并输出:

a = Input(shape=(size,size))
b = Input(shape=(size,size))
a1 = Dense(size, activation='sigmoid')(a)
b1 = Dense(size, activation='sigmoid')(b)
merged = concatenate([a1, b1])

此外,我怀疑通过单个隐藏单元运行特征会减少通过网络其余部分发送的信息,因此网络无法学习 XOR 功能。

这是一些对我有用的代码:

from keras import models

from keras.layers import Dense

import numpy as np

模拟数据:

X_1 = np.random.choice([0, 1], size = (50000, 1))
X_2 = np.random.choice([0, 1], size = (50000, 1))

X = np.concatenate((X_1, X_2), axis = 1)

Y = np.bitwise_xor(X[:, 0], X[:, 1])

FNN 模型:

# Define model.

network_fnn = models.Sequential()
network_fnn.add(Dense(4, activation = 'relu', input_shape = (X.shape[1],)))
network_fnn.add(Dense(4, activation = 'relu'))
network_fnn.add(Dense(1, activation = 'sigmoid'))

# Compile model.

network_fnn.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy', metrics = ['acc'])

# Fit model.

history_fnn = network_fnn.fit(X, Y, epochs = 5, batch_size = 32, verbose = True)

真正的原因是您activation='relu'在输出层中使用。对于二进制分类,您必须使用sigmoid

然后你选择了一个糟糕的架构,我建议:

a = Input(shape=(size,size))
b = Input(shape=(size,size))
a1 = Dense(size, activation='sigmoid')(a)
b1 = Dense(size, activation='sigmoid')(b)
merged = concatenate([a1, b1])
hidden = Dense(5, activation='relu')(merged)
hidden = Dense(5, activation='relu')(hidden)
hidden = Dense(5, activation='relu')(hidden)

outputs = Dense(1, activation='sigmoid')(hidden)
model = Model(inputs=[a, b], outputs=outputs)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([data1, data2], np.array(labels), epochs=15, batch_size=32)