无法找到适合神经网络的简单函数

人工智能 深度学习
2021-11-06 06:06:00

我一直在尝试将神经网络调整为一个简单的函数:球体的质量。我尝试过不同的架构,例如,一个隐藏层和两个隐藏层,每个隐藏层总是有 128 个神经元,并训练它们 5000 个 epoch。代码是通常的代码。以防万一,我发布其中一个

model = keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])
                        ,keras.layers.Dense(128, activation="relu")
                        ,keras.layers.Dense(1, activation="relu")])
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
history = model.fit(x, y, validation_split=0.2, epochs=5000)

结果显示在图表中。

两个隐藏层,每个 128 个神经元

一个隐藏层,128 个神经元

我怀疑我在某个地方犯了错误,因为我已经看到深度学习能够以更少的时期匹配复杂的函数。我将不胜感激任何解决此问题并与深度学习功能完美契合的提示。

为了清楚起见,我发布了图表的代码。

rs =[x for x in range(20)]
def masas_circulo(x):
    masas_circulos =[]
    rs =[r for r in range(x)]
    for r in rs:
        masas_circulos.append(model.predict([r])[0][0])

   return masas_circulos

 masas_circulos = masas_circulo(20) 
 masas_circulos
 esferas = [4/3*np.pi*r**3 for r in range(20)]
 import matplotlib.pyplot as plt
 plt.plot(rs,masas_circulos,label="DL")
 plt.plot(rs,esferas,label="Real");
 plt.title("Mass of an sphere.\nDL (1hl,128 n,5000 e) vs ground_truth")
 plt.xlabel("Radius")
 plt.ylabel("Sphere")
 plt.legend();
2个回答

您正在尝试学习值爆炸的三次函数,而您的问题是scaling我已经能够通过缩放数据和使用tanh作为激活函数来学习更好的近似值。

代码和结果如下:

在此处输入图像描述

X=100 附近的收敛是由于 tanh 激活而发生的。由于缩放的结果是负值,Relu 不会更好地工作。您可以尝试使用Leaky Relu激活和各种alpha值。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from tensorflow import keras
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

def mass_of_sphere(R):
    return (4/3) * np.pi * (R**3)

X = np.linspace(1, 120, 500000)
y = [mass_of_sphere(x) for x in X]

X = np.array(X).reshape(-1, 1)
y = np.array(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

scaler_X = StandardScaler()
X_train = scaler_X.fit_transform(X_train)
X_test = scaler_X.transform(X_test)

scaler_y = StandardScaler()
y_train = scaler_y.fit_transform(y_train.reshape(-1, 1))
y_test = scaler_y.transform(y_test.reshape(-1, 1)).reshape(-1)

model = keras.Sequential([keras.layers.Dense(1),
                          keras.layers.Dense(128, activation = "tanh"),
                          keras.layers.Dense(1, activation = "tanh")])

early_stopping = keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 5)
model.compile(optimizer = 'rmsprop', loss = 'mse')
history = model.fit(X_train, y_train, epochs = 100, callbacks=[early_stopping], 
                    batch_size = 2048, validation_data=(X_test, y_test))

y_hat = scaler_y.inverse_transform(model.predict(X_test)).reshape(-1)
y_test = scaler_y.inverse_transform(y_test).reshape(-1)

f, ax = plt.subplots(figsize = (12, 4))
ax.plot(sorted(scaler_X.inverse_transform(X_test).reshape(-1)), sorted(y_test), color = 'blue', label = 'Real')
ax.plot(sorted(scaler_X.inverse_transform(X_test).reshape(-1)), sorted(y_hat), color = 'orange', label = 'DL')
ax.legend()

更简单的模型似乎是最好的选择

model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys2 = np.array([4/3*r*np.pi**3 for r in xs])    
model.fit(xs, ys2, epochs=500,validation_split=0.2)

def masa_circulo(x):
    return 4/3*x*np.pi**3

以图形方式测试它

x = [x for x in range(1,int(1e6),int(1e3))]
y_masa_circulo = [ masa_circulo(m) for m in x]
y_masa_predicha= [model.predict([m])[0] for m in x]

import matplotlib.pyplot as plt
fig,axes = plt.subplots(1,2)
axes[0].plot(y_masa_circulo);
axes[0].set_title("y_masa_circulo")
axes[0].set_ylabel("y_masa_circulo")
axes[0].set_xlabel("Radio")

axes[1].plot(y_masa_predicha);
plt.title("y_masa_circulo")
plt.title("y_masa_predicha");
axes[1].set_ylabel("y_masa_predicha");
axes[1].set_xlabel("Radio");

图形比较 function_results vs nn_results

无需缩放数据。