数据挖掘 - 如何在 TensorFlow 中使用列表？ - 吾爱随笔录

如何在 TensorFlow 中使用列表？

数据挖掘张量流

2021-10-04 06:33:25

我有许多列表，例如[1,2,3,4]，[2,3,4]，[1,2]，[2,3,4,6,8,10]，它们的长度是显然不一样。

如何将其用作 Tensorflow 中占位符的输入？

正如我所尝试的，以下设置会引发错误。

tf.constant([[1,2],[1,2,3]...],dtype=tf.int32)

所以我猜占位符不能由列表的上部输入设置。

有什么解决办法吗？

编辑：

以下是我的例子。如何让它运行没有错误？

4个回答

当你像这样创建一个 Numpy 数组时：

x_data = np.array( [[1,2],[4,5,6],[1,2,3,4,5,6]])

内部 Numpy dtype 是“对象”：

array([[1, 2], [4, 5, 6], [1, 2, 3, 4, 5, 6]], dtype=object)

这不能用作 TensorFlow 中的张量。在任何情况下，张量在每个维度中都必须具有相同的大小，它们不能“参差不齐”，并且必须具有由每个维度中的单个数字定义的形状。TensorFlow 基本上假设它的所有数据类型都是如此。尽管 TensorFlow 的设计者理论上可以编写它，使其接受不规则数组并包含转换函数，但这种自动转换并不总是一个好主意，因为它可能会隐藏输入代码中的问题。

因此，您需要填充输入数据以使其成为可用的形状。在快速搜索中，我在 Stack Overflow 中找到了这种方法，复制为对代码的更改：

import tensorflow as tf
import numpy as np

x = tf.placeholder( tf.int32, [3,None] )
y = x * 2

with tf.Session() as session:
    x_data = np.array( [[1,2],[4,5,6],[1,2,3,4,5,6]] )

    # Get lengths of each row of data
    lens = np.array([len(x_data[i]) for i in range(len(x_data))])

    # Mask of valid places in each row
    mask = np.arange(lens.max()) < lens[:,None]

    # Setup output array and put elements from data into masked positions
    padded = np.zeros(mask.shape)
    padded[mask] = np.hstack((x_data[:]))

    # Call TensorFlow
    result = session.run(y, feed_dict={x:padded})

    # Remove the padding - the list function ensures we 
    # create same datatype as input. It is not necessary in the case
    # where you are happy with a list of Numpy arrays instead
    result_without_padding = np.array(
       [list(result[i,0:lens[i]]) for i in range(lens.size)]
    )
    print( result_without_padding )

输出是：

[[2, 4] [8, 10, 12] [2, 4, 6, 8, 10, 12]]

您不必在最后删除填充 - 仅当您需要以相同的参差不齐的数组格式显示输出时才这样做。另请注意，当您将结果padded数据提供给更复杂的例程时，零或其他填充数据（如果您更改它）可能会被您实施的任何算法使用。

如果您有许多短数组而只有一两个非常长的数组，那么您可能需要考虑使用稀疏张量表示来节省内存并加快计算速度。

作为使用填充数组的替代方法，您可以将所有数据作为一个大意大利面条串提供，然后在 tensorflow 图中进行折纸

例子：

import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

noodle = tf.placeholder(tf.float32, [None])
chop_indices = tf.placeholder(tf.int32, [None,2])

do_origami = lambda list_idx: tf.gather(noodle, tf.range(chop_indices[list_idx,0], chop_indices[list_idx,1]))

print( [do_origami(list_idx=i).eval({noodle:[1,2,3,2,3,6], chop_indices:[[0,2],[2,3],[3,6]]}).tolist() for i in range(3)] )

结果：

[[1.0, 2.0], [3.0], [2.0, 3.0, 6.0]]

但是，如果您有可变数量的内部列表，那么祝您好运。您不能从 tf.while_loop 返回列表，也不能只使用上面的列表推导，因此您必须为每个内部列表分别进行计算。

import tensorflow as tf

sess = tf.InteractiveSession()

my_list = tf.Variable(initial_value=[1,2,3,4,5])

init = tf.global_variables_initializer()

sess.run(init)

sess.run(my_list)

结果：数组（[1, 2, 3, 4, 5]）

2021 年更新。显然现在支持参差不齐的张量：

t = tf.ragged.constant([[1,2,3,4], [2,3,4], [1,2], [2,3,4,6,8,10]])

会很好用

其它你可能感兴趣的问题