为什么准确性保持不变

数据挖掘 机器学习 喀拉斯 张量流
2022-02-28 20:11:34

我是机器学习的新手,我尝试自己创建一个简单的模型。这个想法是训练一个模型来预测一个值是大于还是小于某个阈值。

我在阈值之前和之后生成一些随机值并创建模型

import os
import random

import numpy as np
from keras import Sequential
from keras.layers import Dense
from random import shuffle

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
threshold = 50000
samples = 5000

train_data = []
for i in range(0, samples):
    train_data.append([random.randrange(0, threshold), 0])
    train_data.append([random.randrange(threshold, 2 * threshold), 1])

data_set = np.array(train_data)
shuffle(data_set)

input_value = data_set[:, 0:1]
expected_result = data_set[:, 1]


model = Sequential()
model.add(Dense(3, input_dim=1, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='relu'))

# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit the keras model on the dataset
model.fit(input_value, expected_result, epochs=10, batch_size=5)

_, accuracy = model.evaluate(input_value, expected_result)
print('Accuracy: %.2f' % (accuracy*100))

问题是准确率总是大约 0.5,如果我检查训练过程,我会看到类似的东西。

Epoch 1/10

    5/10000 [..............................] - ETA: 8:07 - loss: 6.4472 - acc: 0.6000
  230/10000 [..............................] - ETA: 12s - loss: 7.4283 - acc: 0.5391 
  455/10000 [>.............................] - ETA: 7s - loss: 7.8642 - acc: 0.5121 
  675/10000 [=>............................] - ETA: 5s - loss: 7.9277 - acc: 0.5081
  890/10000 [=>............................] - ETA: 4s - loss: 7.7693 - acc: 0.5180
 1095/10000 [==>...........................] - ETA: 4s - loss: 7.9045 - acc: 0.5096
 1305/10000 [==>...........................] - ETA: 3s - loss: 7.8306 - acc: 0.5142
 1515/10000 [===>..........................] - ETA: 3s - loss: 7.7558 - acc: 0.5188
 1730/10000 [====>.........................] - ETA: 3s - loss: 7.7516 - acc: 0.5191
 1920/10000 [====>.........................] - ETA: 2s - loss: 7.7149 - acc: 0.5214
 2120/10000 [=====>........................] - ETA: 2s - loss: 7.7245 - acc: 0.5208
 2340/10000 [======>.......................] - ETA: 2s - loss: 7.7422 - acc: 0.5197
 2565/10000 [======>.......................] - ETA: 2s - loss: 7.7668 - acc: 0.5181
 2785/10000 [=======>......................] - ETA: 2s - loss: 7.8015 - acc: 0.5160
 3000/10000 [========>.....................] - ETA: 2s - loss: 7.9032 - acc: 0.5097
 3210/10000 [========>.....................] - ETA: 2s - loss: 7.9134 - acc: 0.5090
 3435/10000 [=========>....................] - ETA: 2s - loss: 7.9629 - acc: 0.5060
 3660/10000 [=========>....................] - ETA: 1s - loss: 7.9578 - acc: 0.5063
 3875/10000 [==========>...................] - ETA: 1s - loss: 7.9696 - acc: 0.5055
 4085/10000 [===========>..................] - ETA: 1s - loss: 7.9861 - acc: 0.5045
 4305/10000 [===========>..................] - ETA: 1s - loss: 7.9823 - acc: 0.5048
 4530/10000 [============>.................] - ETA: 1s - loss: 7.9737 - acc: 0.5053
 4735/10000 [=============>................] - ETA: 1s - loss: 8.0063 - acc: 0.5033
 4945/10000 [=============>................] - ETA: 1s - loss: 7.9955 - acc: 0.5039
 5160/10000 [==============>...............] - ETA: 1s - loss: 7.9935 - acc: 0.5041
 5380/10000 [===============>..............] - ETA: 1s - loss: 7.9991 - acc: 0.5037
 5605/10000 [===============>..............] - ETA: 1s - loss: 8.0432 - acc: 0.5010
 5805/10000 [================>.............] - ETA: 1s - loss: 8.0466 - acc: 0.5008
 6020/10000 [=================>............] - ETA: 1s - loss: 8.0189 - acc: 0.5025
 6240/10000 [=================>............] - ETA: 1s - loss: 8.0151 - acc: 0.5027
 6470/10000 [==================>...........] - ETA: 0s - loss: 7.9843 - acc: 0.5046
 6695/10000 [===================>..........] - ETA: 0s - loss: 7.9760 - acc: 0.5052
 6915/10000 [===================>..........] - ETA: 0s - loss: 7.9926 - acc: 0.5041
 7140/10000 [====================>.........] - ETA: 0s - loss: 8.0004 - acc: 0.5036
 7380/10000 [=====================>........] - ETA: 0s - loss: 7.9848 - acc: 0.5046
 7595/10000 [=====================>........] - ETA: 0s - loss: 7.9752 - acc: 0.5052
 7805/10000 [======================>.......] - ETA: 0s - loss: 7.9568 - acc: 0.5063
 8035/10000 [=======================>......] - ETA: 0s - loss: 7.9557 - acc: 0.5064
 8275/10000 [=======================>......] - ETA: 0s - loss: 7.9802 - acc: 0.5049
 8515/10000 [========================>.....] - ETA: 0s - loss: 7.9748 - acc: 0.5052
 8730/10000 [=========================>....] - ETA: 0s - loss: 7.9944 - acc: 0.5040
 8955/10000 [=========================>....] - ETA: 0s - loss: 7.9934 - acc: 0.5041
 9190/10000 [==========================>...] - ETA: 0s - loss: 7.9854 - acc: 0.5046
 9430/10000 [===========================>..] - ETA: 0s - loss: 7.9975 - acc: 0.5038
 9650/10000 [===========================>..] - ETA: 0s - loss: 8.0190 - acc: 0.5025
 9865/10000 [============================>.] - ETA: 0s - loss: 8.0337 - acc: 0.5016
10000/10000 [==============================] - 3s 255us/step - loss: 8.0397 - acc: 0.5012

我尝试更改层数和层中的节点数,但结果基本相同。我缺少什么使它起作用?

2个回答

你有两个不同的问题正在发生。

采用sigmoid

首先,在执行二元分类问题时,您应该将最后一层的激活设置为sigmoid(或softmax,这在二元分类情况下是等价的)。

扩展您的数据

其次,在使用神经网络时,重要的是要确保您的数据具有“合理”的规模。“合理”的尺度通常在 0 均值、单位方差正态分布范围内。

修复这些问题的效果

让我们看看在 5 个 epoch 后修复这些问题的效果。

如果我将您的最后一层更改为sigmoid

model.add(Dense(3, input_dim=1, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

我得到〜94%的准确度:

Epoch 1/5
10000/10000 [==============================] - 1s 127us/sample - loss: 107.2031 - acc: 0.5594
Epoch 2/5
10000/10000 [==============================] - 1s 118us/sample - loss: 0.8730 - acc: 0.6688
Epoch 3/5
10000/10000 [==============================] - 1s 118us/sample - loss: 0.6432 - acc: 0.7455
Epoch 4/5
10000/10000 [==============================] - 1s 119us/sample - loss: 0.5688 - acc: 0.7899
Epoch 5/5
10000/10000 [==============================] - 1s 119us/sample - loss: 0.3340 - acc: 0.8631
10000/10000 [==============================] - 0s 10us/sample - loss: 0.2087 - acc: 0.9440
Accuracy: 94.40

如果我将最后一次激活更改为sigmoid,但将您的输入值缩放到 0 和 1 之间:

    train_data.append([random.randrange(0, threshold) / 100000, 0])
    train_data.append([random.randrange(threshold, 2 * threshold) / 100000, 1])

然后我得到 99.8% 的准确率。

Epoch 1/5
10000/10000 [==============================] - 1s 128us/sample - loss: 0.5206 - acc: 0.7013
Epoch 2/5
10000/10000 [==============================] - 1s 114us/sample - loss: 0.2051 - acc: 0.9732
Epoch 3/5
10000/10000 [==============================] - 1s 115us/sample - loss: 0.1083 - acc: 0.9943
Epoch 4/5
10000/10000 [==============================] - 1s 116us/sample - loss: 0.0697 - acc: 0.9953
Epoch 5/5
10000/10000 [==============================] - 1s 116us/sample - loss: 0.0512 - acc: 0.9967
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0450 - acc: 0.9980
Accuracy: 99.80

我在这里看到的问题是您对模型和模型都使用相同train_data建议您使用as尝试使用您的模型。trainingevaluationshuffled subsettrain_datatest_datavalidation_dataevaluate