考虑一个典型的卷积神经网络,比如这个例子,它从 CIFAR-10 数据集中识别 10 种不同的对象:
https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_cifar10.py
""" Convolutional network applied to CIFAR-10 dataset classification task.
References:
Learning Multiple Layers of Features from Tiny Images, A. Krizhevsky, 2009.
Links:
[CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
"""
from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.data_utils import shuffle, to_categorical
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
# Data loading and preprocessing
from tflearn.datasets import cifar10
(X, Y), (X_test, Y_test) = cifar10.load_data()
X, Y = shuffle(X, Y)
Y = to_categorical(Y, 10)
Y_test = to_categorical(Y_test, 10)
# Real-time data preprocessing
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
# Real-time data augmentation
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)
# Convolutional network building
network = input_data(shape=[None, 32, 32, 3],
data_preprocessing=img_prep,
data_augmentation=img_aug)
network = conv_2d(network, 32, 3, activation='relu')
network = max_pool_2d(network, 2)
network = conv_2d(network, 64, 3, activation='relu')
network = conv_2d(network, 64, 3, activation='relu')
network = max_pool_2d(network, 2)
network = fully_connected(network, 512, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 10, activation='softmax')
network = regression(network, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.001)
# Train using classifier
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit(X, Y, n_epoch=50, shuffle=True, validation_set=(X_test, Y_test),
show_metric=True, batch_size=96, run_id='cifar10_cnn')
这是一个具有多层的 CNN,以 10 个输出结束,每个输出用于识别的每种类型的对象。
但现在想一个稍微不同的问题:假设我只想识别一种类型的对象,而且还要检测它在图像帧中的位置。假设我想区分:
- 物体在中心
- 对象位于中心左侧
- 对象在中心的右边
- 没有可识别的物体
假设我构建了一个与 CIFAR-10 示例中的 CNN 完全相同的 CNN,但只有 3 个输出:
- 中央
- 剩下
- 对
当然,如果没有任何输出触发,那么就没有可识别的对象。
假设我有一个大的图像训练语料库,在图像中的许多不同位置有相同类型的对象,该集合被正确分组和注释,我使用通常的方法训练 CNN。
我应该期望 CNN 只是“神奇地”工作吗?或者是否需要不同类型的架构来处理对象位置?如果是这样,那些架构是什么?