AlexNet 的 6000 万个参数在哪里?

数据挖掘 神经网络 喀拉斯 美国有线电视新闻网 卷积神经网络 亚历克斯网
2022-02-15 14:09:51

AlexNet 论文的摘要中,他们声称有 6000 万个参数:

该神经网络有 6000 万个参数和 650,000 个神经元,由五个卷积层组成,其中一些是最大池化层,三个全连接层和最终的 1000 路 softmax。

当我使用 Keras 实现模型时,我得到了大约 2500 万个参数。

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(96, 11, strides=4, activation="relu", input_shape=[227,227,3]),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(256, 5, activation="relu", padding="SAME"),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(256, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(1000, activation="softmax"),
])

请注意,我删除了规范化并将输入设置为 227*227 而不是 224*224。有关详细信息,请参阅此问题。

以下是 Keras 的总结:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 256)       884992    
_________________________________________________________________
dense (Dense)                (None, 13, 13, 4096)      1052672   
_________________________________________________________________
dense_1 (Dense)              (None, 13, 13, 4096)      16781312  
_________________________________________________________________
dense_2 (Dense)              (None, 13, 13, 1000)      4097000   
=================================================================
Total params: 25,678,184
Trainable params: 25,678,184
Non-trainable params: 0
_________________________________________________________________

我离6000万还差得很远。那么,他们是如何对 6000 万个参数求和的呢?

作为参考,这里是第 2 节中描述的模型架构。论文3.5:

第一个卷积层用 96 个大小为 11x11x3 的内核过滤 224x224x3 输入图像,步长为 4 像素(这是内核映射中相邻神经元的感受野中心之间的距离)。第二个卷积层将第一个卷积层的(响应归一化和池化)输出作为输入,并使用 256 个大小为 5x5x48 的内核对其进行过滤。第三、第四和第五卷积层相互连接,没有任何中间池化层或归一化层。第三个卷积层有 384 个大小为 3x3x256 的内核,连接到第二个卷积层的(归一化、池化)输出。第四个卷积层有 384 个大小为 3x3x192 的内核,第五个卷积层有 256 个大小为 3x3x192 的内核。全连接层每层有 4096 个神经元。

1个回答

我忘了在最后一个 Conv2D 层和第一个全连接层之间进行展平。

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(96, 11, strides=4, activation="relu", input_shape=[227,227,3]),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(256, 5, activation="relu", padding="SAME"),
    tf.keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2)),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(384, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Conv2D(256, 3, activation="relu", padding="SAME"),
    tf.keras.layers.Flatten(), # <-- This layer
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(1000, activation="softmax"),
])

添加后,我得到了 6200 万个参数:

Model: "alex_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              multiple                  34944     
_________________________________________________________________
conv2d_1 (Conv2D)            multiple                  614656    
_________________________________________________________________
conv2d_2 (Conv2D)            multiple                  885120    
_________________________________________________________________
conv2d_3 (Conv2D)            multiple                  1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            multiple                  884992    
_________________________________________________________________
max_pooling2d (MaxPooling2D) multiple                  0         
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  37752832  
_________________________________________________________________
dense_1 (Dense)              multiple                  16781312  
_________________________________________________________________
dense_2 (Dense)              multiple                  4097000   
=================================================================
Total params: 62,378,344
Trainable params: 62,378,344
Non-trainable params: 0
_________________________________________________________________

即使这是我的错误,我也将其留在这里以供理解。