如何从卷积特征图提出感兴趣的区域?

数据挖掘 Python 喀拉斯 卷积神经网络
2021-10-15 03:35:22

问题

Keras 没有直接实现感兴趣的区域池化。我知道如何执行最大池化,但我不知道如何从卷积层传递的特征图中获取边界框。

有没有办法直接实现区域提议算法?


例子

假设有这样的架构:

在此处输入图像描述

所以我们有一个多输入的神经网络架构,最终导致 ROI MaxPool 层。我们有三个输入,屏幕截图、文本图和候选,让我们把候选去掉。然后我们会在 Keras 中有这样的代码:

from keras.models import Model
from keras.layers import Input, Dense, Conv2D, ZeroPadding2D, MaxPooling2D, BatchNormalization, concatenate
from keras.activations import relu
from keras.initializers import RandomUniform, Constant, TruncatedNormal

#  Network 1, Layer 1
screenshot = Input(shape=(1280, 1280, 0),
                   dtype='float32',
                   name='screenshot')
# padded1 = ZeroPadding2D(padding=5, data_format=None)(screenshot)
conv1 = Conv2D(filters=96,
               kernel_size=11,
               strides=(4, 4),
               activation=relu,
               padding='same')(screenshot)
# conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=relu, padding='same')(padded1)
pooling1 = MaxPooling2D(pool_size=(3, 3),
                        strides=(2, 2),
                        padding='same')(conv1)
normalized1 = BatchNormalization()(pooling1)  # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn

# Network 1, Layer 2

# padded2 = ZeroPadding2D(padding=2, data_format=None)(normalized1)
conv2 = Conv2D(filters=256,
               kernel_size=5,
               activation=relu,
               padding='same')(normalized1)
# conv2 = Conv2D(filters=256, kernel_size=5, activation=relu, padding='same')(padded2)
normalized2 = BatchNormalization()(conv2)
# padded3 = ZeroPadding2D(padding=1, data_format=None)(normalized2)
conv3 = Conv2D(filters=384,
               kernel_size=3,
               activation=relu,
               padding='same',
               kernel_initializer=TruncatedNormal(stddev=0.01),
               bias_initializer=Constant(value=0.1))(normalized2)
# conv3 = Conv2D(filters=384, kernel_size=3, activation=relu, padding='same',
#               kernel_initializer=RandomUniform(stddev=0.1),
#               bias_initializer=Constant(value=0.1))(padded3)

# Network 2, Layer 1

textmaps = Input(shape=(160, 160, 128),
                 dtype='float32',
                 name='textmaps')
txt_conv1 = Conv2D(filters=48,
                   kernel_size=1,
                   activation=relu,
                   padding='same',
                   kernel_initializer=TruncatedNormal(stddev=0.01),
                   bias_initializer=Constant(value=0.1))(textmaps)

# (Network 1 + Network 2), Layer 1

merged = concatenate([conv3, txt_conv1], axis=-1)
merged_padding = ZeroPadding2D(padding=2, data_format=None)(merged)
merged_conv = Conv2D(filters=96,
                     kernel_size=5,
                     activation=relu, padding='same',
                     kernel_initializer=TruncatedNormal(stddev=0.01),
                     bias_initializer=Constant(value=0.1))(merged_padding)

如果您查看代码的末尾(以及架构本身),我们会从两个不同的 Conv+ReLu 层传递连接的激活,然后将其传递给 ROI MaxPool 层。


谢谢!

1个回答

要实施区域提案,您需要两个主要部分:

  • 生成一组候选边界框的区域提议网络。它可以简单地实现为两个卷积层,以 1)预测对象的存在和 2)预测默认的偏移量(锚定边界框)

  • 为任意大小的提案提供固定大小的特征向量的 ROI 池化层。

这里是一个Faster R-CNN在Keras中的实现,这里是模型和代码的详细解释

这里是 RPN 的实现,这里是 ROI 池的实现。