使用卷积神经网络进行文本定位

数据挖掘 Python 神经网络 张量流 美国有线电视新闻网
2022-03-07 07:16:34

我是 TensorFlow 的新手。我正在尝试构建一个能够识别文档中的字母和单词的神经网络。

正如ICDAR2017中提到的那样,我将任务分为 3 个阶段

  1. 文本本地化
  2. 裁剪的单词识别
  3. 端到端识别

我在文本本地化的第一阶段遇到了一些问题。我使用了一种名为EAST的架构。

架构阶段:

  1. 全卷积网络
  2. NMS合并阶段

这是我开始使用的模型。它工作正常,但我在检测某些字符和字母时遇到了一些问题,例如:

在此处输入图像描述

模型层:

def model(images, weight_decay=1e-5, is_training=True):
        '''
        define the model, we use slim's implemention of resnet
        '''
        images = mean_image_subtraction(images)

        with slim.arg_scope(resnet_v1.resnet_arg_scope(weight_decay=weight_decay)):
            logits, end_points = resnet_v1.resnet_v1_50(images, is_training=is_training, scope='resnet_v1_50')

        with tf.variable_scope('feature_fusion', values=[end_points.values]):
            batch_norm_params = {
            'decay': 0.997,
            'epsilon': 1e-5,
            'scale': True,
            'is_training': is_training
            }
            with slim.arg_scope([slim.conv2d],
                                activation_fn=tf.nn.relu,
                                normalizer_fn=slim.batch_norm,
                                normalizer_params=batch_norm_params,
                                weights_regularizer=slim.l2_regularizer(weight_decay)):
                f = [end_points['pool5'], end_points['pool4'],
                     end_points['pool3'], end_points['pool2']]
                for i in range(4):
                    print('Shape of f_{} {}'.format(i, f[i].shape))
                g = [None, None, None, None]
                h = [None, None, None, None]
                num_outputs = [None, 128, 64, 32]
                for i in range(4):
                    if i == 0:
                        h[i] = f[i]
                    else:
                        c1_1 = slim.conv2d(tf.concat([g[i-1], f[i]], axis=-1), num_outputs[i], 1)
                        h[i] = slim.conv2d(c1_1, num_outputs[i], 3)
                    if i <= 2:
                        g[i] = unpool(h[i])
                    else:
                        g[i] = slim.conv2d(h[i], num_outputs[i], 3)
                    print('Shape of h_{} {}, g_{} {}'.format(i, h[i].shape, i, g[i].shape))

                # here we use a slightly different way for regression part,
                # we first use a sigmoid to limit the regression range, and also
                # this is do with the angle map
                F_score = slim.conv2d(g[3], 1, 1, activation_fn=tf.nn.sigmoid, normalizer_fn=None)
                # 4 channel of axis aligned bbox and 1 channel rotation angle
                geo_map = slim.conv2d(g[3], 4, 1, activation_fn=tf.nn.sigmoid, normalizer_fn=None) * FLAGS.text_scale
                angle_map = (slim.conv2d(g[3], 1, 1, activation_fn=tf.nn.sigmoid, normalizer_fn=None) - 0.5) * np.pi/2 # angle is between [-45, 45]
                F_geometry = tf.concat([geo_map, angle_map], axis=-1)

        return F_score, F_geometry

有人建议我使用Pyramid Networks,我不知道谁来将此 Pyramid Networks 与 Tensorflow 中的 EAST 模型集成,这是否值得。

我还想知道在基于表格的数据中提高文本本地化准确性的选项

0个回答
没有发现任何回复~