使用神经网络求解 ODE(通过 Tensorflow)

数据挖掘 神经网络 深度学习 张量流
2021-10-11 16:43:36

我对深度学习非常陌生(来自数学 PDE 背景),但我正在尝试使用神经网络(通过 tensorflow)解决一些 ODE。我已经解决了一些简单的问题,例如u(x)+u(x)=f(x) 没问题,但我现在正在尝试一些更努力的事情: u(x)xu(x)=0 与初始条件 u(0)=Au(0)=B.

我主要关注这篇论文,我的解决方案写成uN(x)=A+Bx+x2N(x,w), 在哪里 N(x,w)` 是神经网络的输出。我使用的损失函数只是均方意义上的 ODE 的残差,所以它非常粗糙:(x,w)=j=1N(uN(x)xuN(x))2. 我很难为这个特定的方程找到一个好的数值解。您可以在下面看到一个典型的结果(橙色是确切的解决方案,蓝色是我的解决方案)。橙线是精确解,蓝线是数值解

我当前的设置只有 2 个隐藏层,每个隐藏层 400 个节点(一个泄漏 ReLU 和一个 ReLU),然后是一个线性激活层。我的输入数据是域的均匀分布的离散化。我正在使用批量大小为 32 的 Adam 优化器并运行 400 个 epoch。无论我调整什么参数,我似乎都无法正确捕捉域左侧的振荡行为。

有人对如何改善结果有任何建议吗?我对深度学习和神经网络非常陌生。如果有帮助,我的代码包含在下面。我也应该感谢 Emm 和 vijay m,他们的一维近似代码是我的代码的基础

# Load modules
import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt
from scipy import special

######################################################################
# Routine to solve u''(x) - x*u(x) = f(x), u(0)=A, u'(0)=B in the form
#     u(x) = A + B*x + x^2*N(x,w)
# where N(x,w) is the output of the neural network.
######################################################################

# Create the arrays x and y, where x is a discretization of the domain (a,b) and y is the source term f(x)
N = 200
a = -6.0
b = 2.0
x = np.arange(a, b, (b-a)/N).reshape((N,1))
y = np.zeros(N)

# Boundary conditions
A = 1.0
B = 0.0

# Define the number of neurons in each layer
n_nodes_hl1 = 400
n_nodes_hl2 = 400

# Define the number of outputs and the learning rate
n_classes = 1
learn_rate = 0.00003

# Define input / output placeholders
x_ph = tf.placeholder('float', [None, 1],name='input')
y_ph = tf.placeholder('float')

# Define standard deviation for the weights and biases
hl_sigma = 0.02

# Routine to compute the neural network (5 hidden layers)
def neural_network_model(data):
    hidden_1_layer = {'weights': tf.Variable(name='w_h1',initial_value=tf.random_normal([1, n_nodes_hl1], stddev=hl_sigma)),
                      'biases': tf.Variable(name='b_h1',initial_value=tf.random_normal([n_nodes_hl1], stddev=hl_sigma))}

    hidden_2_layer = {'weights': tf.Variable(name='w_h2',initial_value=tf.random_normal([n_nodes_hl1, n_nodes_hl2], stddev=hl_sigma)),
                      'biases': tf.Variable(name='b_h2',initial_value=tf.random_normal([n_nodes_hl2], stddev=hl_sigma))}

    output_layer = {'weights': tf.Variable(name='w_o',initial_value=tf.random_normal([n_nodes_hl2, n_classes], stddev=hl_sigma)),
                      'biases': tf.Variable(name='b_o',initial_value=tf.random_normal([n_classes], stddev=hl_sigma))}


    # (input_data * weights) + biases
    l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.leaky_relu(l1)   

    l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    output = tf.add(tf.matmul(l2, output_layer['weights']), output_layer['biases'], name='output')

    return output


batch_size = 32

# Feed batch data
def get_batch(inputX, inputY, batch_size):
    duration = len(inputX)
    for i in range(0,duration//batch_size):
        idx = i*batch_size + np.random.randint(0,10,(1))[0]

        yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]


# Routine to train the neural network
def train_neural_network_batch(x_ph, predict=False):
    prediction = neural_network_model(x_ph)
    pred_dx = tf.gradients(prediction, x_ph)
    pred_dx2 = tf.gradients(tf.gradients(prediction, x_ph),x_ph)

    # Compute u and its second derivative
    u = A + B*x_ph + (x_ph*x_ph)*prediction
    dudx2 = (x_ph*x_ph)*pred_dx2 + 2.0*x_ph*pred_dx + 2.0*x_ph*pred_dx + 2.0*prediction

    # The cost function is just the residual of u''(x) - x*u(x) = 0, i.e. residual = u''(x)-x*u(x)
    cost = tf.reduce_mean(tf.square(dudx2-x_ph*u - y_ph))
    optimizer = tf.train.AdamOptimizer(learn_rate).minimize(cost)


    # cycles feed forward + backprop
    hm_epochs = 400

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Train in each epoch with the whole data
        for epoch in range(hm_epochs):

            epoch_loss = 0
            for step in range(N//batch_size):
                for inputX, inputY in get_batch(x, y, batch_size):
                    _, l = sess.run([optimizer,cost], feed_dict={x_ph:inputX, y_ph:inputY})
                    epoch_loss += l
            if epoch %10 == 0:
                print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)


        # Predict a new input by adding a random number, to check whether the network has actually learned
        x_valid = x + 0.0*np.random.normal(scale=0.1,size=(1))
        return sess.run(tf.squeeze(prediction),{x_ph:x_valid}), x_valid


# Train network
tf.set_random_seed(42)
pred, time = train_neural_network_batch(x_ph)


mypred = pred.reshape(N,1)

# Compute Airy functions for exact solution
ai, aip, bi, bip = special.airy(time)

# Numerical solution vs. exact solution
fig = plt.figure()
plt.plot(time, A + B*time + (time*time)*mypred)
plt.plot(time, 0.5*(3.0**(1/6))*special.gamma(2/3)*(3**(1/2)*ai + bi))
plt.show()
1个回答
  1. 请避免对二阶或高阶 ODE 使用分段线性激活,因为这些函数的二阶和高阶导数为零。或者,Tanh 激活可能是一个不错的选择。

  2. 我建议使用更深的网络。根据我的经验,在使用神经网络求解微分方程时,深度可以显着改善结果。