如何将训练数据上使用的相同 minmaxscaler 与新数据一起使用?

数据挖掘 Python scikit-学习 时间序列 数据 特征缩放
2021-09-23 11:59:01

我使用 keras LSTM 模型进行预测,上面的代码是对数据进行缩放:输入的形状像 (n, 11, 1),标签是 1D
DailyDemand.py

#scaling data
scaler_x = preprocessing.MinMaxScaler(feature_range =(-1, 1))
x = np.array(x).reshape ((len(x),11 ))
x = scaler_x.fit_transform(x)
scaler_y = preprocessing.MinMaxScaler(feature_range =(-1, 1))
y = np.array(y).reshape ((len(y), 1))
y = scaler_y.fit_transform(y)

# Split train and test data
x_train=x[0: train_end ,]
x_test=x[train_end +1: ,]
y_train=y[0: train_end]
y_test=y[train_end +1:] 
x_train=x_train.reshape(x_train.shape +(1,))
x_test=x_test.reshape(x_test.shape + (1,))
# Train and save the Model named fit1 in a json and h5 files 
[....]
# serialize model to JSON
model_json = fit1.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
fit1.save_weights("model.h5")
print(">>>> Model saved to model.h5 in the disk")

现在我试图用这个训练有素的模型预测新数据的新值。所以我从文件中加载了模型:

预测.py

from DailyDemand import scaler_y
from DailyDemand import scaler_x
[...]
# load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")

########################################
# make prediction with the loaded model

FeaturesTest = [267,61200,695,677,70600,116700,130200,768,659,741,419300]
xaa = np.array(FeaturesTest).reshape ((1,11 )).astype(float)
print(xaa)
xaa = scaler_x.fit_transform(xaa) 
xaa = xaa.reshape(xaa.shape +(1,))
print("print FeaturesTest scalled: ")
print(xaa) # incorrect scalled value, always returns -1 ones 
xaa = [[[-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]
  [-1.]]]
tomorrowDemand = loaded_model.predict(xaa)
print("tomorrowDemand scalled: ", tomorrowDemand)
prediction = scaler_y.inverse_transform(np.array(tomorrowDemand).reshape ((len(tomorrowDemand), 1))).astype(int)
print ("the real demand is 95900 and the prediction is: ", prediction)

问题是我如何在新数据的训练中使用相同的缩放器?我想知道我是否在此代码中犯了一个错误以在新数据上使用相同的调用程序?

1个回答

您正在改装scaler_x您不想要的测试集。更改此行:

xaa = scaler_x.fit_transform(xaa)

xaa = scaler_x.transform(xaa)

你得到[-1, -1, ..., -1]是因为有了一个样本,每个特征都等于最小值。