在许多论文中,人们使用 ResNet 从图像中提取特征,然后将它们传递给转换器。我想实现相同的。我想获得特征,而不是使用变压器对它们进行分类。这就是我所做的:
- 已下载 CIFAR100
- 从每张图像中提取特征(形状为 (3, 3, 2048))并将这些特征添加到训练数据集中
- 定义模型:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 3, 3, 2048)] 0
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 18432) 0 input_3[0][0]
__________________________________________________________________________________________________
dense_20 (Dense) (None, 4608) 84939264 flatten_2[0][0]
__________________________________________________________________________________________________
dropout_20 (Dropout) (None, 4608) 0 dense_20[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 3, 3, 512) 0 dropout_20[0][0]
__________________________________________________________________________________________________
patch_encoder_1 (PatchEncoder) (None, 3, 3, 512) 264192 reshape_1[0][0]
__________________________________________________________________________________________________
layer_normalization_17 (LayerNo (None, 3, 3, 512) 1024 patch_encoder_1[0][0]
__________________________________________________________________________________________________
multi_head_attention_8 (MultiHe (None, 3, 3, 512) 4200960 layer_normalization_17[0][0]
layer_normalization_17[0][0]
__________________________________________________________________________________________________
add_16 (Add) (None, 3, 3, 512) 0 multi_head_attention_8[0][0]
patch_encoder_1[0][0]
__________________________________________________________________________________________________
layer_normalization_18 (LayerNo (None, 3, 3, 512) 1024 add_16[0][0]
__________________________________________________________________________________________________
dense_22 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_18[0][0]
__________________________________________________________________________________________________
dropout_21 (Dropout) (None, 3, 3, 1024) 0 dense_22[0][0]
__________________________________________________________________________________________________
dense_23 (Dense) (None, 3, 3, 512) 524800 dropout_21[0][0]
__________________________________________________________________________________________________
dropout_22 (Dropout) (None, 3, 3, 512) 0 dense_23[0][0]
__________________________________________________________________________________________________
add_17 (Add) (None, 3, 3, 512) 0 dropout_22[0][0]
add_16[0][0]
__________________________________________________________________________________________________
layer_normalization_19 (LayerNo (None, 3, 3, 512) 1024 add_17[0][0]
__________________________________________________________________________________________________
multi_head_attention_9 (MultiHe (None, 3, 3, 512) 4200960 layer_normalization_19[0][0]
layer_normalization_19[0][0]
__________________________________________________________________________________________________
add_18 (Add) (None, 3, 3, 512) 0 multi_head_attention_9[0][0]
add_17[0][0]
__________________________________________________________________________________________________
layer_normalization_20 (LayerNo (None, 3, 3, 512) 1024 add_18[0][0]
__________________________________________________________________________________________________
dense_24 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_20[0][0]
__________________________________________________________________________________________________
dropout_23 (Dropout) (None, 3, 3, 1024) 0 dense_24[0][0]
__________________________________________________________________________________________________
dense_25 (Dense) (None, 3, 3, 512) 524800 dropout_23[0][0]
__________________________________________________________________________________________________
dropout_24 (Dropout) (None, 3, 3, 512) 0 dense_25[0][0]
__________________________________________________________________________________________________
add_19 (Add) (None, 3, 3, 512) 0 dropout_24[0][0]
add_18[0][0]
__________________________________________________________________________________________________
layer_normalization_21 (LayerNo (None, 3, 3, 512) 1024 add_19[0][0]
__________________________________________________________________________________________________
multi_head_attention_10 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_21[0][0]
layer_normalization_21[0][0]
__________________________________________________________________________________________________
add_20 (Add) (None, 3, 3, 512) 0 multi_head_attention_10[0][0]
add_19[0][0]
__________________________________________________________________________________________________
layer_normalization_22 (LayerNo (None, 3, 3, 512) 1024 add_20[0][0]
__________________________________________________________________________________________________
dense_26 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_22[0][0]
__________________________________________________________________________________________________
dropout_25 (Dropout) (None, 3, 3, 1024) 0 dense_26[0][0]
__________________________________________________________________________________________________
dense_27 (Dense) (None, 3, 3, 512) 524800 dropout_25[0][0]
__________________________________________________________________________________________________
dropout_26 (Dropout) (None, 3, 3, 512) 0 dense_27[0][0]
__________________________________________________________________________________________________
add_21 (Add) (None, 3, 3, 512) 0 dropout_26[0][0]
add_20[0][0]
__________________________________________________________________________________________________
layer_normalization_23 (LayerNo (None, 3, 3, 512) 1024 add_21[0][0]
__________________________________________________________________________________________________
multi_head_attention_11 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_23[0][0]
layer_normalization_23[0][0]
__________________________________________________________________________________________________
add_22 (Add) (None, 3, 3, 512) 0 multi_head_attention_11[0][0]
add_21[0][0]
__________________________________________________________________________________________________
layer_normalization_24 (LayerNo (None, 3, 3, 512) 1024 add_22[0][0]
__________________________________________________________________________________________________
dense_28 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_24[0][0]
__________________________________________________________________________________________________
dropout_27 (Dropout) (None, 3, 3, 1024) 0 dense_28[0][0]
__________________________________________________________________________________________________
dense_29 (Dense) (None, 3, 3, 512) 524800 dropout_27[0][0]
__________________________________________________________________________________________________
dropout_28 (Dropout) (None, 3, 3, 512) 0 dense_29[0][0]
__________________________________________________________________________________________________
add_23 (Add) (None, 3, 3, 512) 0 dropout_28[0][0]
add_22[0][0]
__________________________________________________________________________________________________
layer_normalization_25 (LayerNo (None, 3, 3, 512) 1024 add_23[0][0]
__________________________________________________________________________________________________
multi_head_attention_12 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_25[0][0]
layer_normalization_25[0][0]
__________________________________________________________________________________________________
add_24 (Add) (None, 3, 3, 512) 0 multi_head_attention_12[0][0]
add_23[0][0]
__________________________________________________________________________________________________
layer_normalization_26 (LayerNo (None, 3, 3, 512) 1024 add_24[0][0]
__________________________________________________________________________________________________
dense_30 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_26[0][0]
__________________________________________________________________________________________________
dropout_29 (Dropout) (None, 3, 3, 1024) 0 dense_30[0][0]
__________________________________________________________________________________________________
dense_31 (Dense) (None, 3, 3, 512) 524800 dropout_29[0][0]
__________________________________________________________________________________________________
dropout_30 (Dropout) (None, 3, 3, 512) 0 dense_31[0][0]
__________________________________________________________________________________________________
add_25 (Add) (None, 3, 3, 512) 0 dropout_30[0][0]
add_24[0][0]
__________________________________________________________________________________________________
layer_normalization_27 (LayerNo (None, 3, 3, 512) 1024 add_25[0][0]
__________________________________________________________________________________________________
multi_head_attention_13 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_27[0][0]
layer_normalization_27[0][0]
__________________________________________________________________________________________________
add_26 (Add) (None, 3, 3, 512) 0 multi_head_attention_13[0][0]
add_25[0][0]
__________________________________________________________________________________________________
layer_normalization_28 (LayerNo (None, 3, 3, 512) 1024 add_26[0][0]
__________________________________________________________________________________________________
dense_32 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_28[0][0]
__________________________________________________________________________________________________
dropout_31 (Dropout) (None, 3, 3, 1024) 0 dense_32[0][0]
__________________________________________________________________________________________________
dense_33 (Dense) (None, 3, 3, 512) 524800 dropout_31[0][0]
__________________________________________________________________________________________________
dropout_32 (Dropout) (None, 3, 3, 512) 0 dense_33[0][0]
__________________________________________________________________________________________________
add_27 (Add) (None, 3, 3, 512) 0 dropout_32[0][0]
add_26[0][0]
__________________________________________________________________________________________________
layer_normalization_29 (LayerNo (None, 3, 3, 512) 1024 add_27[0][0]
__________________________________________________________________________________________________
multi_head_attention_14 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_29[0][0]
layer_normalization_29[0][0]
__________________________________________________________________________________________________
add_28 (Add) (None, 3, 3, 512) 0 multi_head_attention_14[0][0]
add_27[0][0]
__________________________________________________________________________________________________
layer_normalization_30 (LayerNo (None, 3, 3, 512) 1024 add_28[0][0]
__________________________________________________________________________________________________
dense_34 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_30[0][0]
__________________________________________________________________________________________________
dropout_33 (Dropout) (None, 3, 3, 1024) 0 dense_34[0][0]
__________________________________________________________________________________________________
dense_35 (Dense) (None, 3, 3, 512) 524800 dropout_33[0][0]
__________________________________________________________________________________________________
dropout_34 (Dropout) (None, 3, 3, 512) 0 dense_35[0][0]
__________________________________________________________________________________________________
add_29 (Add) (None, 3, 3, 512) 0 dropout_34[0][0]
add_28[0][0]
__________________________________________________________________________________________________
layer_normalization_31 (LayerNo (None, 3, 3, 512) 1024 add_29[0][0]
__________________________________________________________________________________________________
multi_head_attention_15 (MultiH (None, 3, 3, 512) 4200960 layer_normalization_31[0][0]
layer_normalization_31[0][0]
__________________________________________________________________________________________________
add_30 (Add) (None, 3, 3, 512) 0 multi_head_attention_15[0][0]
add_29[0][0]
__________________________________________________________________________________________________
layer_normalization_32 (LayerNo (None, 3, 3, 512) 1024 add_30[0][0]
__________________________________________________________________________________________________
dense_36 (Dense) (None, 3, 3, 1024) 525312 layer_normalization_32[0][0]
__________________________________________________________________________________________________
dropout_35 (Dropout) (None, 3, 3, 1024) 0 dense_36[0][0]
__________________________________________________________________________________________________
dense_37 (Dense) (None, 3, 3, 512) 524800 dropout_35[0][0]
__________________________________________________________________________________________________
dropout_36 (Dropout) (None, 3, 3, 512) 0 dense_37[0][0]
__________________________________________________________________________________________________
add_31 (Add) (None, 3, 3, 512) 0 dropout_36[0][0]
add_30[0][0]
__________________________________________________________________________________________________
layer_normalization_33 (LayerNo (None, 3, 3, 512) 1024 add_31[0][0]
__________________________________________________________________________________________________
flatten_3 (Flatten) (None, 4608) 0 layer_normalization_33[0][0]
__________________________________________________________________________________________________
dropout_37 (Dropout) (None, 4608) 0 flatten_3[0][0]
__________________________________________________________________________________________________
dense_38 (Dense) (None, 2048) 9439232 dropout_37[0][0]
__________________________________________________________________________________________________
dropout_38 (Dropout) (None, 2048) 0 dense_38[0][0]
__________________________________________________________________________________________________
dense_39 (Dense) (None, 1024) 2098176 dropout_38[0][0]
__________________________________________________________________________________________________
dropout_39 (Dropout) (None, 1024) 0 dense_39[0][0]
__________________________________________________________________________________________________
dense_40 (Dense) (None, 100) 102500 dropout_39[0][0]
==================================================================================================
Total params: 138,869,348
Trainable params: 138,869,348
Non-trainable params: 0
你可以在这里找到 PatchEncoder 层https://keras.io/examples/vision/image_classification_with_vision_transformer/
然后我开始训练模型。我停止训练是因为指标不好: https ://pastebin.com/cHJjj9Lt
我应该训练这个模型更多的时期吗?
之后我用最大池化定义了 ResNet50。现在提取的特征有形状(2048,)我开始为这个形状创建数据集,但我不知道如何定义模型。
我有一些问题:
- 我应该将哪些功能传递给变压器模型?形状为 (3, 3, 2048) 的特征还是形状为 (2048,) 的特征?
- 如果我应该使用形状(2048,)的特征,而不是帮助我创建模型。我不知道如何创建第一层。我将从上面的链接复制更深层。3)如果我应该使用形状(3、3、2048)的特征,请告诉我我做错了什么。
请帮我实现 ResNet50 + Transformer。我真的很感激任何帮助。
我使用了这里的代码:https ://keras.io/examples/vision/image_classification_with_vision_transformer/