数据挖掘 - 在对我的数据进行欠采样后，二元分类器仅预测一个类 - 吾爱随笔录

我正在尝试将双联图像分类为 2 类。我最初的成绩很差，我认为这是因为我的班级严重不平衡（大约 88-12）。我对我的数据进行了采样，并使用了带有卷积的两个分支网络。我的模型在每个 epoch 的训练集上获得大约 50% 的准确率，然后在测试数据上获得 85% 的准确率。我真的很困惑如何解决这个问题。

XZ_input = keras.Input( shape = (100,80,1), name = 'xz_img')
YZ_input = keras.Input( shape = (100,80,1), name = 'yz_img')

xz = layers.Conv2D(8, (3,3), activation = 'relu', padding= 'same')(XZ_input)
xz = layers.MaxPooling2D((3,3))(xz)
xz = layers.Flatten()(xz)

yz = layers.Conv2D(8, (3,3), activation = 'relu', padding= 'same')(YZ_input)
yz = layers.MaxPooling2D((3,3))(yz)
yz = layers.Flatten()(yz)

x = layers.concatenate([xz,yz])
x = layers.Dense(20, activation = 'relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(5, activation = 'relu')(x)

pred = layers.Dense(1, activation = 'sigmoid')(x)

model = keras.Model(
    inputs=[XZ_input, YZ_input],
    outputs=[pred])
model.summary()

majority_indices = np.where(y_train ==1)[0]
minority_indices = np.where(y_train == 0)[0]
random_majority_indices = np.random.choice(majority_indices,len(minority_indices), replace = False )
under_sampling = np.concatenate([random_majority_indices, minority_indices])
y_train = y_train[under_sampling]
X_train = X_train[under_sampling]
X__train = X_train.reshape(y_train.shape[0],2,100,80)
X1__train = X__train[:,0].reshape(y_train.shape[0], 100,80,1)
X2__train = X__train[:,1].reshape(y_train.shape[0], 100,80,1)

model.fit({'xz_img': X1__train, 'yz_img': X2__train},y_train, epochs = 10, batch_size = 500, validation_data = ([X1__test, X2__test], y_test))

输出：

Epoch 1/10
24/24 [==============================] - 42s 2s/step - loss: 0.6922 - binary_accuracy: 0.5048 - val_loss: 0.6884 - val_binary_accuracy: 0.8290
Epoch 2/10
24/24 [==============================] - 37s 2s/step - loss: 0.6911 - binary_accuracy: 0.5195 - val_loss: 0.6849 - val_binary_accuracy: 0.8442
Epoch 3/10
24/24 [==============================] - 35s 1s/step - loss: 0.6891 - binary_accuracy: 0.5193 - val_loss: 0.6783 - val_binary_accuracy: 0.8528
Epoch 4/10
24/24 [==============================] - 35s 1s/step - loss: 0.6861 - binary_accuracy: 0.5207 - val_loss: 0.6688 - val_binary_accuracy: 0.8609
Epoch 5/10
24/24 [==============================] - 34s 1s/step - loss: 0.6827 - binary_accuracy: 0.5099 - val_loss: 0.6582 - val_binary_accuracy: 0.8680
Epoch 6/10
24/24 [==============================] - 34s 1s/step - loss: 0.6794 - binary_accuracy: 0.5115 - val_loss: 0.6511 - val_binary_accuracy: 0.8552
Epoch 7/10
24/24 [==============================] - 34s 1s/step - loss: 0.6765 - binary_accuracy: 0.5163 - val_loss: 0.6447 - val_binary_accuracy: 0.8379
Epoch 8/10
24/24 [==============================] - 39s 2s/step - loss: 0.6738 - binary_accuracy: 0.5309 - val_loss: 0.6399 - val_binary_accuracy: 0.8043
Epoch 9/10
24/24 [==============================] - 35s 1s/step - loss: 0.6715 - binary_accuracy: 0.5503 - val_loss: 0.6347 - val_binary_accuracy: 0.7772
Epoch 10/10
24/24 [==============================] - 35s 1s/step - loss: 0.6691 - binary_accuracy: 0.5609 - val_loss: 0.6292 - val_binary_accuracy: 0.7567
Out[44]:

一点分析：

pred = model.predict([X1__test, X2__test])

pred[:10], y_test[:10]

(array([[0.50534785],
        [0.5246037 ],
        [0.503593  ],
        [0.49585894],
        [0.49691   ],
        [0.49851283],
        [0.509586  ],
        [0.5941074 ],
        [0.63272274],
        [0.5186754 ]], dtype=float32),
 array([1, 1, 1, 1, 1, 1, 0, 1, 1, 1]))

基本上，我的模型并不比掷硬币好..