用于时间序列压缩的自动编码器

数据挖掘 喀拉斯 时间序列 lstm 卷积 自动编码器
2021-09-25 21:28:42

我正在尝试使用自动编码器(简单、卷积、LSTM)来压缩时间序列。

这是我尝试过的模型。

简单的自动编码器:

    from keras.layers import Input, Dense
    from keras.models import Model
    import keras

    # this is the size of our encoded representations
    encoding_dim = 50

    # this is our input placeholder
    input_ts = Input(shape=(2100,))
    # "encoded" is the encoded representation of the input
    encoded = Dense(encoding_dim, activation='relu')(input_ts) #, activity_regularizer=regularizers.l2(10e-5)
    # "decoded" is the lossy reconstruction of the input
    decoded = Dense(2100, activation='tanh')(encoded)

    # this model maps an input to its reconstruction
    autoencoder = Model(input_ts, decoded)

    # this model maps an input to its encoded representation
    encoder = Model(input_ts, encoded)

    # create a placeholder for an encoded (%encoding_dim%-dimensional) input
    encoded_input = Input(shape=(encoding_dim,))
    # retrieve the last layer of the autoencoder model
    decoder_layer = autoencoder.layers[-1]
    # create the decoder model
    decoder = Model(encoded_input, decoder_layer(encoded_input))

    autoencoder.summary()

    adamax = keras.optimizers.Adamax(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.01)
    autoencoder.compile(optimizer=adamax, loss='mean_absolute_percentage_error')

卷积自动编码器:

    from keras.layers import Input, Dense, Conv1D, MaxPooling1D, UpSampling1D
    from keras.models import Model

    window_length = 518

    input_ts = Input(shape=(window_length,1))

    x = Conv1D(32, 3, activation="relu", padding="valid")(input_ts)
    x = MaxPooling1D(2, padding="valid")(x)

    x = Conv1D(1, 3, activation="relu", padding="valid")(x)

    encoded = MaxPooling1D(2, padding="valid")(x)

    encoder = Model(input_ts, encoded)

    x = Conv1D(16, 3, activation="relu", padding="valid")(encoded)
    x = UpSampling1D(2)(x) 

    x = Conv1D(32, 3, activation='relu', padding="valid")(x)
    x = UpSampling1D(2)(x)

    decoded = Conv1D(1, 1, activation='tanh', padding='valid')(x)

    convolutional_autoencoder = Model(input_ts, decoded)

    convolutional_autoencoder.summary()

    optimizer = "nadam"
    loss = "mean_absolute_error"

    convolutional_autoencoder.compile(optimizer=optimizer, loss=loss)

LSTM 自动编码器:

    from keras.layers import Input, LSTM, RepeatVector
    from keras.models import Model

    inputs = Input(shape=(1, 500))
    encoded = LSTM(128)(inputs)

    decoded = RepeatVector(1)(encoded)

    decoded = LSTM(500, return_sequences=True)(decoded)

    sequence_autoencoder = Model(inputs, decoded)
    encoder = Model(inputs, encoded)

    sequence_autoencoder.summary()

    sequence_autoencoder.compile(optimizer='nadam', loss='mean_absolute_error')

为了检查压缩损失,我使用SMAPE公式。

我得到了这样的结果。简单自动编码器的平均损失为 14.28%,卷积自动编码器为 8.04%,LSTM-自动编码器为 9.25%。

我的问题是:如果压缩时间无关紧要,使用神经网络压缩有损失的时间序列是否可行?也许我应该注意其他方法?如何获得更好的压缩效果?

谢谢!

1个回答

最重要的是,您应该注意时间序列尽管如此,AE 被彻底用于时间序列,尤其是 LSTM+AE。你改变拓扑了吗?对于 CAE,它看起来很合理,但其他模型缺少一些层,或者?此外,一些常规建议是标准化输入、更改激活函数(tanh 在输出层对我来说效果很好)以及每层的神经元数量和一般的层数。你能提供输入数据的 head() 吗?AE 期望 X 适合 X,也许你错过了?