您应该实现一个生成器并将其提供给model.fit_generator().
您的生成器可能如下所示:
def batch_generator(X, Y, batch_size = BATCH_SIZE):
indices = np.arange(len(X))
batch=[]
while True:
# it might be a good idea to shuffle your data before each epoch
np.random.shuffle(indices)
for i in indices:
batch.append(i)
if len(batch)==batch_size:
yield X[batch], Y[batch]
batch=[]
然后,在您的代码中的某处:
train_generator = batch_generator(trainX, trainY, batch_size = 64)
model.fit_generator(train_generator , ....)
UPD.:
我为了避免事先将所有数据放入内存,您可以修改生成器以仅使用数据集的标识符,然后按需加载数据:
def batch_generator(ids, batch_size = BATCH_SIZE):
batch=[]
while True:
np.random.shuffle(ids)
for i in ids:
batch.append(i)
if len(batch)==batch_size:
yield load_data(batch)
batch=[]
您的加载程序函数可能如下所示:
def load_data(ids):
X = []
Y = []
for i in ids:
# read one or more samples from your storage, do pre-processing, etc.
# for example:
x = imread(f'image_{i}.jpg')
...
y = targets[i]
X.append(x)
Y.append(y)
return np.array(X), np.array(Y)