数据挖掘 - 有人能发现我的 LSTM 外汇模型有什么问题吗？ - 吾爱随笔录

下面的模型从 csv 文件中读取数据（日期、开盘价、最高价、最低价、收盘价、成交量），整理数据，然后构建 LSTM 模型，尝试根据前几天的收盘价预测下一天的收盘价。

但是，无论我是否... - 更改超参数 - 使其成为深度模型 - 使用更多的功能，而不仅仅是关闭，验证准确度约为 53.8%

为了测试我是否犯了一个简单的错误，我生成了另一个数据源，该数据源是我通过添加正弦、余弦和一点噪声创建的信号，因此我知道模型应该能够被训练，并且确实如此。下面的模型在没有任何调整的情况下得到了大约 94% 的验证。

考虑到这一点。当我尝试在实际数据（欧元 1 分钟数据）上使用它时，它似乎不起作用。

任何看到错误或可以指出我正确方向的人？

import pandas as pd
import numpy as np

fpath = 'Data/'
fname = 'EURUSD_M1_1'
df = pd.read_csv(fpath + fname + '_clean.csv')
# file contains date, open, high, low, close, volume

# "y" is whether the next period's Close value is higher or lower than current Close value
outlook = 1
df['y'] = df['Close']<df['Close'].shift(-outlook)

# Drop all NAN's
df.dropna(how="any",inplace=True)

# Get X and y. To keep it simple, just use Close
X_df = df['Close']

# "y" is whether the next period's Close value is higher or lower than current Close value
outlook = 1
y_df = df['Close']<df['Close'].shift(-outlook)

# Train/test split
def train_test_split(X_df,y_df,train_perc):
    idx = int(train_perc/100*X_df.shape[0])
    X_train_df = X_df.iloc[0:idx]
    X_test_df = X_df.iloc[idx:]
    y_train_df = y_df.iloc[0:idx]
    y_test_df = y_df.iloc[idx:]
    return X_train_df.as_matrix(), X_test_df.as_matrix(), y_train_df.as_matrix(), y_test_df.as_matrix()

X_train_df, X_test_df, y_train, y_test = train_test_split(X_df,y_df,90)

# Scaling
def scale(X):
    Xmax = max(X)
    Xmin = min(X)
    return (X-Xmin)/(Xmax - Xmin)

X_train_scaled = scale(X_train_df)
X_test_scaled = scale(X_test_df)

# Build the model
import tensorflow as tf
from keras.models import Sequential
from keras.layers import LSTM, Dense

# ### Constants
num_time_steps = 5   # Num of steps in batch (also used for prediction steps into the future)
num_features = 1   # Number of features
num_neurons = 97
num_outputs = 1 # Just one output (True/False), predicted time series
learning_rate = 0.0001 # learning rate, 0.0001 default, but you can play with this
nb_epochs = 10 # how many iterations to go through (training steps), you can play with this
batch_size = 32

# Reshaping
X_train_scaled = np.reshape(X_train_scaled,[-1,1])
nb_samples_train = X_train_scaled.shape[0]  - num_time_steps

X_train_scaled_reshaped = np.zeros((nb_samples_train, num_time_steps, num_features))
y_train_reshaped = np.zeros((nb_samples_train))

for i in range(nb_samples_train):
    y_position = i + num_time_steps
    X_train_scaled_reshaped[i] = X_train_scaled[i:y_position]
    y_train_reshaped[i] = y_train[y_position]

model = Sequential()

stacked = False
if stacked == True:
    model.add(LSTM(num_neurons, return_sequences=True, input_shape=(num_time_steps,num_features), activation='relu', dropout=0.5))
    model.add(LSTM(num_neurons, activation='relu', dropout=0.5))
else:
    model.add(LSTM(num_neurons, input_shape=(num_time_steps,num_features), activation='relu', dropout=0.5))

model.add(Dense(units=num_outputs, activation='sigmoid'))

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy'])

history = model.fit(X_train_scaled_reshaped,
                    y_train_reshaped,
                    batch_size = batch_size,
                    epochs = nb_epochs,
                    validation_split=0.3)