我正在尝试训练自动编码器进行降维,并希望用于异常检测。我的数据规格如下。
- 未标记
- 100 万个数据点
- 9个特点
我正在尝试将其减少到 2 个压缩特征,以便我可以更好地可视化聚类。
latent_dim = 2我的自动编码器如下input_dim = 9
class Autoencoder(tf.keras.Model):
def __init__(self,latent_dim,input_dim):
super(Autoencoder32x, self).__init__()
self.latent_dim = latent_dim
self.input_dim = input_dim
self.dropout_factor = 0.5
self.encoder = Sequential([
# Dense(16, activation='relu', input_shape=(self.input_dim,)),
#Dropout(self.dropout_factor),
Dense(8, activation='relu'),
Dropout(self.dropout_factor),
Dense(4, activation='relu'),
Dropout(self.dropout_factor),
Dense(self.latent_dim, activation='relu')
])
self.decoder = Sequential([
Dense(4, activation='relu', input_shape=(self.latent_dim,)),
Dropout(self.dropout_factor),
Dense(8, activation='relu'),
Dropout(self.dropout_factor),
#Dense(16, activation='relu'),
#Dropout(self.dropout_factor),
Dense(self.input_dim, activation=None)
])
def call(self, inputs):
encoder_out = self.encoder(inputs)
return self.decoder(encoder_out)
模型编译
ae_train_x, ae_test_x, ae_train_y, ae_test_y = train_test_split(scaled_df[COLUNMS_FOR_AUTOENCODER], scaled_df[COLUNMS_FOR_AUTOENCODER], test_size=0.33)
autoencoder = Autoencoder(latent_dim=2,input_dim=9)
autoencoder.compile(loss='mse', optimizer='adam',metrics=['accuracy'])
最后训练
ae_history = autoencoder_10_32x.fit(ae_train_x, ae_train_y, validation_data=(ae_test_x, ae_test_y), epochs=50)
培训输出
Epoch 1/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.3330 - accuracy: 0.9646 - val_loss: 0.2816 - val_accuracy: 0.9999
Epoch 2/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.2664 - accuracy: 0.9999 - val_loss: 0.2818 - val_accuracy: 0.9999
Epoch 3/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.2649 - accuracy: 0.9999 - val_loss: 0.2845 - val_accuracy: 0.9999
可能是什么问题呢?我认为网络正在学习传递价值观。但是对于瓶颈层和 dropout 层,这应该是不可能的。我也减少了层数,但结果仍然相同。我该如何解决?