用浅层神经网络逼近正弦函数

机器算法验证 神经网络
2022-03-27 19:16:18

我想使用一个简单的 1-3 层神经网络来近似 sin 函数的一个区域。但是,我发现我的模型通常会收敛于比数据具有更多局部极值的状态。这是我最近的模型架构:

layers:     x, h1, y

dimensions: 1, 128, 1

activations: tanh, tanh

error function: sum((y_predict - y)^2)

结果,用 4000 个数据点训练 3000 次迭代,学习率 = 4e-7:

具有 tanh 激活的简单一层模型

在相同条件下训练的另一个更深层次的模型:

layers:     x, h1, h2, h3, y

dimensions: 1, 32, 128, 32, 1

activations: tanh, tanh, tanh, tanh

error function: sum((y_predict - y)^2)

在此处输入图像描述

我经常看到前 2 个 X 单元内的输出过于复杂,然后它就稳定下来了。这种噪音的原因是什么,如何修改我的架构以适应整个数据范围,而不会过度拟合这个早期的数据范围?

训练也是高度可变的(Y=损失,X=迭代):

训练示例 1 训练示例 2

我使用 pytorch 来实现模型:

import torch
from torch.autograd import Variable
import numpy
from matplotlib import pyplot

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU


layer_size = 1, 128, 1
layer_functions = ["tanh","tanh"]

m_x = 4000
n_x = layer_size[0]
n_y = layer_size[-1]

x_raw = numpy.random.rand(m_x,n_x)*10 - 1
y_raw = (numpy.sin(x_raw))/2.5 + (numpy.random.randn(m_x,n_x)/20)


# Create Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.from_numpy(x_raw).type(dtype), requires_grad=False)
y = Variable(torch.from_numpy(y_raw).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
layer_weights = list()
for i in range(0,len(layer_size)-1):
    print(layer_size[i],layer_size[i+1])
    layer_weights.append(Variable(torch.randn(layer_size[i],layer_size[i+1]).type(dtype), requires_grad=True))

def forward_step(x,weights,activation):
    if activation == "sigmoid":
        fn = torch.nn.Sigmoid()
    elif activation == "tanh":
        fn = torch.nn.Tanh()
    elif activation == "relu":
        fn = torch.nn.ReLU()
    else:
        exit("ERROR: invalid activation function specified")

    output = fn(x.mm(weights))

    return output

y_pred = None
losses = list()
learning_rate = 4e-7
for t in range(3000):
    # Forward pass: compute predicted y using operations on Variables
    y_pred = forward_step(x,weights=layer_weights[0],activation=layer_functions[0])
    for i in range(1,len(layer_weights)):
        y_pred = forward_step(y_pred, weights=layer_weights[i], activation=layer_functions[i])

    # Compute and print loss using operations on Variables.
    loss = (y_pred-y).pow(2).sum()
    print(t, loss.data[0])
    losses.append(loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call weights.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    for i in range(0,len(layer_weights)):
        layer_weights[i].data -= learning_rate*layer_weights[i].grad.data

        # Manually zero the gradients after running the backward pass
        layer_weights[i].grad.data.zero_()


y_pred = y_pred.data.numpy()

print(y_pred.shape, y_raw.shape)

fig1 = pyplot.figure()
x_loss = list(range(len(losses)))
pyplot.plot(x_loss,losses)
pyplot.show()

fig2 = pyplot.figure()
pyplot.scatter(x_raw,y_raw,marker='o',s=0.2)
pyplot.scatter(x_raw,y_pred,marker='o',s=0.3)
pyplot.show()
1个回答

所以我解决了自己的问题,解决方案是使用更高级的优化器而不是普通的梯度下降。通过使用 pytorch 的“nn”模块,您可以从一系列优化器中进行选择,这些优化器包含“动量”、正则化和学习率衰减等概念,以更可能找到本地的方式更新网络的权重最低限度。

更新:我为那些感兴趣的人创建了一个关于这个问题的交互式教程。它是一个 Jupyter 笔记本,包含运行此问题所需的最少代码,并为用户留出空间通过对层、优化器等的实验来改进模型的拟合:链接

这个页面有一些解释。更多在他们的 cs231n(免费)在线讲座中。

另一个带有漂亮动画的解释。

用 Adam 优化器训练,1000 次迭代,loss=L1Loss (Y=loss, X=iter): Y=损失,X=迭代

结果模型(橙色=预测,蓝色=训练数据): 预言

放大: 放大预测