问题
Keras 没有直接实现感兴趣的区域池化。我知道如何执行最大池化,但我不知道如何从卷积层传递的特征图中获取边界框。
有没有办法直接实现区域提议算法?
例子
假设有这样的架构:
所以我们有一个多输入的神经网络架构,最终导致 ROI MaxPool 层。我们有三个输入,屏幕截图、文本图和候选,让我们把候选去掉。然后我们会在 Keras 中有这样的代码:
from keras.models import Model
from keras.layers import Input, Dense, Conv2D, ZeroPadding2D, MaxPooling2D, BatchNormalization, concatenate
from keras.activations import relu
from keras.initializers import RandomUniform, Constant, TruncatedNormal
# Network 1, Layer 1
screenshot = Input(shape=(1280, 1280, 0),
dtype='float32',
name='screenshot')
# padded1 = ZeroPadding2D(padding=5, data_format=None)(screenshot)
conv1 = Conv2D(filters=96,
kernel_size=11,
strides=(4, 4),
activation=relu,
padding='same')(screenshot)
# conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=relu, padding='same')(padded1)
pooling1 = MaxPooling2D(pool_size=(3, 3),
strides=(2, 2),
padding='same')(conv1)
normalized1 = BatchNormalization()(pooling1) # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn
# Network 1, Layer 2
# padded2 = ZeroPadding2D(padding=2, data_format=None)(normalized1)
conv2 = Conv2D(filters=256,
kernel_size=5,
activation=relu,
padding='same')(normalized1)
# conv2 = Conv2D(filters=256, kernel_size=5, activation=relu, padding='same')(padded2)
normalized2 = BatchNormalization()(conv2)
# padded3 = ZeroPadding2D(padding=1, data_format=None)(normalized2)
conv3 = Conv2D(filters=384,
kernel_size=3,
activation=relu,
padding='same',
kernel_initializer=TruncatedNormal(stddev=0.01),
bias_initializer=Constant(value=0.1))(normalized2)
# conv3 = Conv2D(filters=384, kernel_size=3, activation=relu, padding='same',
# kernel_initializer=RandomUniform(stddev=0.1),
# bias_initializer=Constant(value=0.1))(padded3)
# Network 2, Layer 1
textmaps = Input(shape=(160, 160, 128),
dtype='float32',
name='textmaps')
txt_conv1 = Conv2D(filters=48,
kernel_size=1,
activation=relu,
padding='same',
kernel_initializer=TruncatedNormal(stddev=0.01),
bias_initializer=Constant(value=0.1))(textmaps)
# (Network 1 + Network 2), Layer 1
merged = concatenate([conv3, txt_conv1], axis=-1)
merged_padding = ZeroPadding2D(padding=2, data_format=None)(merged)
merged_conv = Conv2D(filters=96,
kernel_size=5,
activation=relu, padding='same',
kernel_initializer=TruncatedNormal(stddev=0.01),
bias_initializer=Constant(value=0.1))(merged_padding)
如果您查看代码的末尾(以及架构本身),我们会从两个不同的 Conv+ReLu 层传递连接的激活,然后将其传递给 ROI MaxPool 层。
谢谢!