数据挖掘 - 使用 one-hot 编码时如何解决张量流中的标签形状问题？ - 吾爱随笔录

我使用 tensorflow 通过卷积神经网络从自然图像中识别文本；文本中没有特定数量的字符。为了进行成功的训练，我应该使用 one-hot 编码将分类标签转换为二进制。因此，对于每个标签，我对每个字符使用整数编码并将它们存储在一个 numpy 数组中以创建 TFRecords。例如：

alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
TrainLabel  = ["CNN in Tensorflow"]

# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
integer_encoded = [char_to_int[char] for char in TrainLabel[0]]

if (len(TrainLabel[0])) < 51:
    for j in xrange(51- (len(TrainLabel[0]))):
        integer_encoded.append(52)

# one hot encode
onehot_encoded = []
for value in integer_encoded:
    letter = [0 for _ in range(len(alphabet))]
    letter[value] = 1
    onehot_encoded.append(letter)

label = np.array(onehot_encoded, np.float32)

51 是文本中的最大字符数，因此如果文本少于 51 个字符，则用空格将其填充为 51 个字符。

如果我们打印标签，它将是这样的::

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],  
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]], dtype=float32)

创建批处理队列后，标签具有 shape [batch_size, 2703]。2703来自51*53其中 53 是类数

我的问题在于损失函数:: tf.nn.sparse_softmax_cross_entropy_with_logits()中的标签形状必须是 [batch_size]，但是我在这里使用的标签具有这种形状 [batch_size, 53] 因为我使用了单热编码？

我该如何处理？

这就是问题：：

(labels_static_shape.ndims, logits.get_shape().ndims)) ValueError：排名不匹配：标签排名（收到 2）应该等于 logits 排名减去 1（收到 2）。