我对深度学习非常陌生(来自数学 PDE 背景),但我正在尝试使用神经网络(通过 tensorflow)解决一些 ODE。我已经解决了一些简单的问题,例如 没问题,但我现在正在尝试一些更努力的事情: 与初始条件 和 .
我主要关注这篇论文,我的解决方案写成, 在哪里 ` 是神经网络的输出。我使用的损失函数只是均方意义上的 ODE 的残差,所以它非常粗糙:. 我很难为这个特定的方程找到一个好的数值解。您可以在下面看到一个典型的结果(橙色是确切的解决方案,蓝色是我的解决方案)。
我当前的设置只有 2 个隐藏层,每个隐藏层 400 个节点(一个泄漏 ReLU 和一个 ReLU),然后是一个线性激活层。我的输入数据是域的均匀分布的离散化。我正在使用批量大小为 32 的 Adam 优化器并运行 400 个 epoch。无论我调整什么参数,我似乎都无法正确捕捉域左侧的振荡行为。
有人对如何改善结果有任何建议吗?我对深度学习和神经网络非常陌生。如果有帮助,我的代码包含在下面。我也应该感谢 Emm 和 vijay m,他们的一维近似代码是我的代码的基础
# Load modules
import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt
from scipy import special
######################################################################
# Routine to solve u''(x) - x*u(x) = f(x), u(0)=A, u'(0)=B in the form
# u(x) = A + B*x + x^2*N(x,w)
# where N(x,w) is the output of the neural network.
######################################################################
# Create the arrays x and y, where x is a discretization of the domain (a,b) and y is the source term f(x)
N = 200
a = -6.0
b = 2.0
x = np.arange(a, b, (b-a)/N).reshape((N,1))
y = np.zeros(N)
# Boundary conditions
A = 1.0
B = 0.0
# Define the number of neurons in each layer
n_nodes_hl1 = 400
n_nodes_hl2 = 400
# Define the number of outputs and the learning rate
n_classes = 1
learn_rate = 0.00003
# Define input / output placeholders
x_ph = tf.placeholder('float', [None, 1],name='input')
y_ph = tf.placeholder('float')
# Define standard deviation for the weights and biases
hl_sigma = 0.02
# Routine to compute the neural network (5 hidden layers)
def neural_network_model(data):
hidden_1_layer = {'weights': tf.Variable(name='w_h1',initial_value=tf.random_normal([1, n_nodes_hl1], stddev=hl_sigma)),
'biases': tf.Variable(name='b_h1',initial_value=tf.random_normal([n_nodes_hl1], stddev=hl_sigma))}
hidden_2_layer = {'weights': tf.Variable(name='w_h2',initial_value=tf.random_normal([n_nodes_hl1, n_nodes_hl2], stddev=hl_sigma)),
'biases': tf.Variable(name='b_h2',initial_value=tf.random_normal([n_nodes_hl2], stddev=hl_sigma))}
output_layer = {'weights': tf.Variable(name='w_o',initial_value=tf.random_normal([n_nodes_hl2, n_classes], stddev=hl_sigma)),
'biases': tf.Variable(name='b_o',initial_value=tf.random_normal([n_classes], stddev=hl_sigma))}
# (input_data * weights) + biases
l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
l1 = tf.nn.leaky_relu(l1)
l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
l2 = tf.nn.relu(l2)
output = tf.add(tf.matmul(l2, output_layer['weights']), output_layer['biases'], name='output')
return output
batch_size = 32
# Feed batch data
def get_batch(inputX, inputY, batch_size):
duration = len(inputX)
for i in range(0,duration//batch_size):
idx = i*batch_size + np.random.randint(0,10,(1))[0]
yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]
# Routine to train the neural network
def train_neural_network_batch(x_ph, predict=False):
prediction = neural_network_model(x_ph)
pred_dx = tf.gradients(prediction, x_ph)
pred_dx2 = tf.gradients(tf.gradients(prediction, x_ph),x_ph)
# Compute u and its second derivative
u = A + B*x_ph + (x_ph*x_ph)*prediction
dudx2 = (x_ph*x_ph)*pred_dx2 + 2.0*x_ph*pred_dx + 2.0*x_ph*pred_dx + 2.0*prediction
# The cost function is just the residual of u''(x) - x*u(x) = 0, i.e. residual = u''(x)-x*u(x)
cost = tf.reduce_mean(tf.square(dudx2-x_ph*u - y_ph))
optimizer = tf.train.AdamOptimizer(learn_rate).minimize(cost)
# cycles feed forward + backprop
hm_epochs = 400
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Train in each epoch with the whole data
for epoch in range(hm_epochs):
epoch_loss = 0
for step in range(N//batch_size):
for inputX, inputY in get_batch(x, y, batch_size):
_, l = sess.run([optimizer,cost], feed_dict={x_ph:inputX, y_ph:inputY})
epoch_loss += l
if epoch %10 == 0:
print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)
# Predict a new input by adding a random number, to check whether the network has actually learned
x_valid = x + 0.0*np.random.normal(scale=0.1,size=(1))
return sess.run(tf.squeeze(prediction),{x_ph:x_valid}), x_valid
# Train network
tf.set_random_seed(42)
pred, time = train_neural_network_batch(x_ph)
mypred = pred.reshape(N,1)
# Compute Airy functions for exact solution
ai, aip, bi, bip = special.airy(time)
# Numerical solution vs. exact solution
fig = plt.figure()
plt.plot(time, A + B*time + (time*time)*mypred)
plt.plot(time, 0.5*(3.0**(1/6))*special.gamma(2/3)*(3**(1/2)*ai + bi))
plt.show()