如何按照 Saxe 等人的建议使用正交矩阵和增益因子初始化神经网络?

机器算法验证 机器学习 神经网络 Python 优化 卷积神经网络
2022-04-11 23:56:06

我正在阅读Bengio、Goodfellow 和 Courville 深度学习书在第 8 章(优化章节)中,他们提到Saxe 等人有一个基于正交矩阵的初始化和一个取决于非线性本章实际上并没有说明如何进行此初始化。为了解决这个问题,我尝试阅读这篇论文,但它似乎有点超出我的(数学)复杂程度。有谁知道他们所指的初始化是做什么的?g

例如,很高兴知道的问题是:

  1. 如何选择正交矩阵?任何权重矩阵的任何 K 个正交矩阵?
  2. 如何根据非线性选择g

我可能应该提到,但如果可能的话,我打算将它与 python/tensorflow 一起使用。


3深度线性神经网络中学习的非线性动力学的精确解决方案,Andrew M. Saxe、James L. McClelland、Surya Ganguli

1个回答

是千层面所做的,它应该回答你的两个问题:

class Orthogonal(Initializer):
    """Intialize weights as Orthogonal matrix.
    Orthogonal matrix initialization [1]_. For n-dimensional shapes where
    n > 2, the n-1 trailing axes are flattened. For convolutional layers, this
    corresponds to the fan-in, so this makes the initialization usable for
    both dense and convolutional layers.
    Parameters
    ----------
    gain : float or 'relu'
        Scaling factor for the weights. Set this to ``1.0`` for linear and
        sigmoid units, to 'relu' or ``sqrt(2)`` for rectified linear units, and
        to ``sqrt(2/(1+alpha**2))`` for leaky rectified linear units with
        leakiness ``alpha``. Other transfer functions may need different
        factors.
    References
    ----------
    .. [1] Saxe, Andrew M., James L. McClelland, and Surya Ganguli.
           "Exact solutions to the nonlinear dynamics of learning in deep
           linear neural networks." arXiv preprint arXiv:1312.6120 (2013).
    """
    def __init__(self, gain=1.0):
        if gain == 'relu':
            gain = np.sqrt(2)

        self.gain = gain

    def sample(self, shape):
        if len(shape) < 2:
            raise RuntimeError("Only shapes of length 2 or more are "
                               "supported.")

        flat_shape = (shape[0], np.prod(shape[1:]))
        a = get_rng().normal(0.0, 1.0, flat_shape)
        u, _, v = np.linalg.svd(a, full_matrices=False)
        # pick the one with the correct shape
        q = u if u.shape == flat_shape else v
        q = q.reshape(shape)
        return floatX(self.gain * q)

这个RNN 教程做同样的事情(减去增益):

# orthogonal initialization for weights
# see Saxe et al. ICLR'14
def ortho_weight(ndim):
    W = numpy.random.randn(ndim, ndim)
    u, s, v = numpy.linalg.svd(W)
    return u.astype('float32')

所以我认为它是正确的(我希望如此,因为这是我使用的代码)。


我可能应该提到,但如果可能的话,我打算将它与 python/tensorflow 一起使用。

TensorFlow中:

def orthogonal_initializer(scale = 1.1):
    ''' From Lasagne and Keras. Reference: Saxe et al., http://arxiv.org/abs/1312.6120
    '''
    print('Warning -- You have opted to use the orthogonal_initializer function')
    def _initializer(shape, dtype=tf.float32):
      flat_shape = (shape[0], np.prod(shape[1:]))
      a = np.random.normal(0.0, 1.0, flat_shape)
      u, _, v = np.linalg.svd(a, full_matrices=False)
      # pick the one with the correct shape
      q = u if u.shape == flat_shape else v
      q = q.reshape(shape) #this needs to be corrected to float32
      print('you have initialized one orthogonal matrix.')
      return tf.constant(scale * q[:shape[0], :shape[1]], dtype=tf.float32)
    return _initializer