数据挖掘 - 如何在 Keras 中批量输入一个 numpy 数组 - 吾爱随笔录

如何在 Keras 中批量输入一个 numpy 数组

数据挖掘机器学习神经网络深度学习喀拉斯麻木的

2021-10-07 01:49:49

我有以下格式的数据：

1：数据数字数组（trainX）

一组 3d np 数组的 numpy 数组的 numpy 数组。更清楚地说，格式是：[[3d 数据]，[3d 数据]，[3d 数据]，[3d 数据]，...]

2：目标数组（trainY）

这由上述数组的相应目标值的 numpy 数组组成。

格式为 [target1, target2, target3]

numpy 数组变得非常大，考虑到我将使用深度神经网络，还有许多参数需要适应内存。

如何为 trainX 和 trainY 批量推送 numpy 数组

2个回答

您应该实现一个生成器并将其提供给model.fit_generator().

您的生成器可能如下所示：

def batch_generator(X, Y, batch_size = BATCH_SIZE):
    indices = np.arange(len(X)) 
    batch=[]
    while True:
            # it might be a good idea to shuffle your data before each epoch
            np.random.shuffle(indices) 
            for i in indices:
                batch.append(i)
                if len(batch)==batch_size:
                    yield X[batch], Y[batch]
                    batch=[]

然后，在您的代码中的某处：

train_generator = batch_generator(trainX, trainY, batch_size = 64)
model.fit_generator(train_generator , ....)

UPD.： 我为了避免事先将所有数据放入内存，您可以修改生成器以仅使用数据集的标识符，然后按需加载数据：

def batch_generator(ids, batch_size = BATCH_SIZE):
    batch=[]
    while True:
            np.random.shuffle(ids) 
            for i in ids:
                batch.append(i)
                if len(batch)==batch_size:
                    yield load_data(batch)
                    batch=[]

您的加载程序函数可能如下所示：

def load_data(ids):
   X = []
   Y = []

   for i in ids:
     # read one or more samples from your storage, do pre-processing, etc.
     # for example:
     x = imread(f'image_{i}.jpg')
     ...
     y = targets[i]

     X.append(x)
     Y.append(y)

   return np.array(X), np.array(Y)

使用 Keras Sequence 类的另一种方法：

class DataGenerator(keras.utils.Sequence):
  def __init__(self, x_data, y_data, batch_size):
    self.x, self.y = x_data, y_data
    self.batch_size = batch_size
    self.num_batches = np.ceil(len(x_data) / batch_size)
    self.batch_idx = np.array_split(range(len(x_data)), self.num_batches)

  def __len__(self):
    return len(self.batch_idx)

  def __getitem__(self, idx):
    batch_x = self.x[self.batch_idx[idx]]
    batch_y = self.y[self.batch_idx[idx]]
    return batch_x, batch_y

train_generator = DataGenerator(x_train, y_train, batch_size = 128)
model.fit(train_generator,...)

```

其它你可能感兴趣的问题

上一篇如何为 CNN 构建图像数据集？下一篇混淆矩阵不支持连续变量