使用人工神经网络的输入输出映射的结果非常糟糕

数据挖掘 神经网络 深度学习 喀拉斯 张量流
2022-02-21 18:33:54

我想听听人工神经网络专家对我试图解决的问题的意见。我刚开始使用人工神经网络,想通过使用 3375 个数据点来训练具有 3 个输入和 3 个输出的 ANN。目标是将 3 个输入映射到 3 个输出。为此,我使用了在 tensorflow 和 keras 中实现的多层 percetron。

我认为通常人工神经网络在进行这种输入输出映射方面特别好。但是,结果非常糟糕。我多次更改所有内容,但值(批量大小、时期、隐藏层数、神经元数、误差函数)存在巨大差异,但结果仍然非常糟糕(例如 val_mean_absolute_percentage_error: 2360328448.0000)。映射非常错误,根本没有用。令我惊讶的是,即使使用来自训练数据集的输入也会导致灾难性的输出。

这就是为什么我想听听你对此的看法。我是在做完全错误的事情,还是我假设在这种情况下 ANN 对这种输入输出映射特别好,这不是真的?还是训练数据有问题?我非常感谢您的任何意见和建议,因为我不知道还能做什么。

在这里你可以看到代码:

# For data manipulation
import numpy as np
import pandas as pd

#For plotting
from matplotlib import pyplot as plt

# For building model and loading dataset
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow import keras

#Load the data
dataframe = pd.read_csv("C:/Users/User1/Desktop/ANN_inputs_outputs.csv", sep =";")
dataset = dataframe.values


# Assign the columns of the dataframe to the inputs for arrays for the ANN

X_input_dataset = dataset[:, 1:4]
Y_output_dataset = dataset[:, 4:7]

#Create the model

#Input shape defiens the number of input neurons
input_shape = (3,)


#Sequential model is just one for a vanilla MLP
model = Sequential()

#Add the different layers
model.add(keras.layers.Flatten(input_shape=(3,))),
model.add(Dense(20,  activation='relu'))
model.add(Dense(40,  activation='relu'))
model.add(Dense(20,  activation='relu'))
model.add(Dense(3, activation='linear'))

# Configure the model and start training

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_percentage_error'])
history = model.fit(X_input_dataset, Y_output_dataset, epochs=100, batch_size=10, verbose=1, validation_split=0.2)

#Plot training results
history_dict = history.history
print(history_dict.keys())


plt.plot(history.history['mean_absolute_percentage_error'])
plt.plot(history.history['val_mean_absolute_percentage_error'])
plt.title('Mean absolute percentage errror')
plt.ylabel('Mean absolute percentage errror')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss function')
plt.ylabel('mean absolute error')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

#predict values
x_new = [(100000,100000,100000), (100000,1000000,500000),
        (100000,100000,100000), (100000,500000,100000),
        (100000,100000,100000), (500000,100000,100000)]

y_new = model.predict(x_new)
print(y_new)

不幸的是,数据太大,我无法通过 StackExchange 直接共享(我试过了)。这就是我将 csv 文件上传到 File Dropper CSV_File的原因。如果您不想从那里下载数据,请告诉我如何与您共享数据的另一个来源/方式。我不知道这是否有帮助,但在这里您至少可以看到 3375 中的前 100 个数据点(在完整数据中,我改变了每个输入并创建了所有输入组合):

Input_1 Input_2 Input_3 Output_1    Output_2    Output_3
100000  100000  100000  81.63842992 336.0202553 142.6094997
100000  100000  200000  83.91274058 353.0797849 123.2756595
100000  100000  300000  86.49717207 366.4358367 107.3351762
100000  100000  400000  87.94279678 376.396602  95.92878625
100000  100000  500000  89.57430815 384.9555939 85.73828291
100000  100000  600000  92.65738103 396.8354166 70.77538736
100000  100000  700000  96.0171678  408.3277988 55.92321845
100000  100000  800000  100.5642366 420.7969577 38.90699073
100000  100000  900000  109.0237    438.4473815 12.79710349
100000  100000  1000000 114.2438266 446.0243584 0
100000  100000  1100000 114.2438266 446.0243584 0
100000  100000  1200000 114.2438266 446.0243584 0
100000  100000  1300000 114.2438266 446.0243584 0
100000  100000  1400000 114.2438266 446.0243584 0
100000  100000  1500000 114.2438266 446.0243584 0
100000  200000  100000  92.17726716 320.8186761 147.2722417
100000  200000  200000  93.98736653 336.6494039 129.6314145
100000  200000  300000  96.92805106 349.6806425 113.6594914
100000  200000  400000  98.58276913 360.6424603 101.0429556
100000  200000  500000  100.31333   368.9105132 91.04434172
100000  200000  600000  102.6334311 377.1300392 80.50471475
100000  200000  700000  105.7178567 388.244019  66.30630933
100000  200000  800000  108.9149247 398.1848881 53.16837219
100000  200000  900000  115.571269  411.5986127 33.09830325
100000  200000  1000000 127.0748864 430.1972029 2.996095751
100000  200000  1100000 128.2092221 432.0589629 0
100000  200000  1200000 128.2092221 432.0589629 0
100000  200000  1300000 128.2092221 432.0589629 0
100000  200000  1400000 128.2092221 432.0589629 0
100000  200000  1500000 128.2092221 432.0589629 0
100000  300000  100000  100.0917771 307.9756287 152.2007792
100000  300000  200000  102.9726253 323.9279352 133.3676245
100000  300000  300000  105.6062056 335.7776535 118.884326
100000  300000  400000  107.3121984 346.883184  106.0728025
100000  300000  500000  109.4540231 354.663097  96.15106489
100000  300000  600000  111.8786604 361.5557255 86.83379908
100000  300000  700000  114.7944686 371.5938132 73.87990318
100000  300000  800000  118.1373355 380.0257011 62.10514836
100000  300000  900000  122.8548691 390.9478707 46.46544517
100000  300000  1000000 133.347506  406.5063351 20.41434392
100000  300000  1100000 141.6791937 418.5889913 0
100000  300000  1200000 141.6791937 418.5889913 0
100000  300000  1300000 141.6791937 418.5889913 0
100000  300000  1400000 141.6791937 418.5889913 0
100000  300000  1500000 141.6791937 418.5889913 0
100000  400000  100000  109.503933  294.4172255 156.3470265
100000  400000  200000  112.000167  311.1933026 137.0747154
100000  400000  300000  114.2526188 322.9057599 123.1098063
100000  400000  400000  116.4791304 333.664824  110.1242305
100000  400000  500000  118.2910122 342.0030905 99.97408228
100000  400000  600000  120.2127847 349.3045772 90.75082313
100000  400000  700000  122.7641259 356.8196711 80.68438801
100000  400000  800000  126.3291166 365.4701912 68.46887722
100000  400000  900000  130.0423749 374.3468141 55.87899601
100000  400000  1000000 137.5204755 386.0880788 36.65963063
100000  400000  1100000 148.9375577 401.141397  10.18923033
100000  400000  1200000 152.8379613 407.4302237 0
100000  400000  1300000 152.8379613 407.4302237 0
100000  400000  1400000 152.8379613 407.4302237 0
100000  400000  1500000 152.8379613 407.4302237 0
100000  500000  100000  117.4879678 283.3733734 159.4068438
100000  500000  200000  121.0579184 298.9825928 140.2276737
100000  500000  300000  123.3707729 310.3330953 126.5643168
100000  500000  400000  125.8724146 320.3948833 114.0008871
100000  500000  500000  127.9615773 328.059964  104.2466436
100000  500000  600000  129.5606613 335.4906683 95.21685541
100000  500000  700000  131.4170772 343.7065728 85.14453506
100000  500000  800000  135.3015477 351.1570032 73.80963419
100000  500000  900000  137.8813788 359.0228767 63.36392947
100000  500000  1000000 144.8898942 370.7656611 44.61262969
100000  500000  1100000 154.7571144 383.9513348 21.55973576
100000  500000  1200000 164.1907262 396.0774588 0
100000  500000  1300000 164.1907262 396.0774588 0
100000  500000  1400000 164.1907262 396.0774588 0
100000  500000  1500000 164.1907262 396.0774588 0
100000  600000  100000  124.7561636 274.0110713 161.50095
100000  600000  200000  128.42286   288.8038063 143.0415186
100000  600000  300000  131.2377241 299.8006811 129.2297798
100000  600000  400000  133.8838584 309.2976404 117.0866862
100000  600000  500000  135.5491074 317.3956571 107.3234204
100000  600000  600000  137.8437737 324.061017  98.36339426
100000  600000  700000  139.5148534 331.0105966 89.74273491
100000  600000  800000  143.1729967 338.4279821 78.66720613
100000  600000  900000  146.6596817 344.9709227 68.63758054
100000  600000  1000000 150.9572297 353.3162164 55.9947389
100000  600000  1100000 159.4602916 366.5904292 34.21746416
100000  600000  1200000 171.026723  381.1619306 8.079531382
100000  600000  1300000 175.4286096 384.8395754 0
100000  600000  1400000 175.4286096 384.8395754 0
100000  600000  1500000 175.4286096 384.8395754 0
100000  700000  100000  132.1183934 264.1955984 163.9541932
100000  700000  200000  135.9907245 278.9421043 145.3353562
100000  700000  300000  138.9508032 289.1447258 132.1726561
100000  700000  400000  141.3695688 299.1572684 119.7413478
100000  700000  500000  143.2089855 306.5047858 110.5544137
100000  700000  600000  145.4980373 313.8396234 100.9305243
100000  700000  700000  147.7033751 319.6546207 92.91018914
100000  700000  800000  150.8276735 327.3557851 82.08472648
100000  700000  900000  153.528077  333.3811995 73.35890853
100000  700000  1000000 156.9484214 339.9871429 63.3326207
100000  700000  1100000 164.6661346 352.2010019 43.40104855
2个回答

问题似乎与 Keras 有关 mean_absolute_percentage_error

检查这个 SO 答案 -链接

在你的情况下,

  • 您在 Y 的第 3 列中有一个output=0
  • 如果您只为一个 Y 列运行模型,它将在前两列中正常工作

any(dataset[:, 6:7]==0)

输出- 真


我刚刚添加了一个来删除 0。它工作正常。

X_input_dataset = dataset[:, 1:4]
X_input_dataset = (X_input_dataset - X_input_dataset.mean())/X_input_dataset.std()
Y_output_dataset = dataset[:, 4:7]

Y_output_dataset[:,-1] = Y_output_dataset[:,-1]+1.0

在此处输入图像描述

你可以做
- 处理 0
- 使用mse作为指标并单独 计算MAPE
- 编写您自己的自定义指标

我没有完全阅读这个问题,因为它太长了,所以很抱歉,但从我所看到的

您需要规范化数据集,ANN 不适用于这种类型的非规范化数据。

仅规范化训练数据并使用训练规范化参数来规范化您的测试数据

然后,当您的数据集很大时,NN 很有用,我认为您只是在尝试练习。

完成数据归一化后,尝试查看偏差方差权衡并使用学习率、神经元数量等调整您的 nn

https://stats.stackexchange.com/questions/7757/data-normalization-and-standardization-in-neural-networks