从您的标签中,我看到您使用的是 keras。Keras 为您提供了ImageDataGenerator具有该方法的类flow_from_directory()(请参见此处)。此方法从硬盘驱动器批量加载训练目录中的图像,并仅将当前批次存储在 RAM 中。这消除了您当前在加载图像时面临的瓶颈。
为了解决类不平衡,推荐的方法是使用class_weightkeras 分类器的参数。此参数为数据中的每个类别分配一个权重,允许您对少数类别的图像赋予更高的重要性。这个答案显示了如何计算班级权重。
在下面的代码中,我将所有内容放在一起:
# Define constants - change them according to your requirements
BATCH_SIZE = 128
EPOCHS = 50
IMAGE_SIZE = 224
# Set up Image Data Generator
train_datagen = ImageDataGenerator(dtype=np.float16) # here you can also do some data augmentation
# Set up flow from directory
train_generator = train_datagen.flow_from_directory(directory="path/to/your/directory",
class_mode="categorical",
target_size=(IMAGE_SIZE, IMAGE_SIZE), # resize the images if required
batch_size=BATCH_SIZE)
# Calculate class weights
counter = Counter(train_generator.classes)
max_val = float(max(counter.values()))
class_weights = {class_id: max_val/num_images for class_id, num_images in counter.items()}
#
# Here you set up your model ...
#
# After compiling the model, you fit it to your data using fit_generator
model.fit_generator(train_generator,
steps_per_epoch=train_generator.n // BATCH_SIZE,
epochs=EPOCHS,
class_weight=class_weights, # use the class_weights as method parameter
verbose=1)