在理解如何使用 TSFESH 库(版本 0.4.0)来预测特定系列的下一个 N 值方面存在一些问题。在我的代码下面:
# load data train/test datasets
train, Y, test, YY = prepare_train_test()
# add series ID
train['TS_ID'] = pd.Categorical(train['QTR_HR_START']).codes
test['TS_ID'] = pd.Categorical(test['QTR_HR_START']).codes
# add ordered id for concrete event of series
for id in sorted(train['TS_ID'].unique()):
train.ix[train.TS_ID == id, 'TIME_ORDER_ID'] = pd.Categorical(train[train.TS_ID == id]['DATETIME']).codes
for id in sorted(test['TS_ID'].unique()):
test.ix[test.TS_ID == id, 'TIME_ORDER_ID'] = pd.Categorical(test[test.TS_ID == id]['DATETIME']).codes
# perform feature extraction for my signal
extraction_settings = FeatureExtractionSettings()
extraction_settings.IMPUTE = impute # Fill in Infs and NaNs
X = extract_features(train, column_id='TS_ID', feature_extraction_settings=extraction_settings).values
XT = extract_features(test, column_id='TS_ID', feature_extraction_settings=extraction_settings).values
# there should be as example
# model = xgb.DMatrix(X, label=Y, missing=np.nan)
# model.fit()
# model.predict(XT)
但是,在X = extract_features(...)我在调试器中看到以下结果之后

这意味着初始X-dataset/features(shape=(722,10) 被转换为形状 (80, 1899)。
“80”从何而来?我猜train.TS_ID来自。但我的XT-dataset 仍然包含 722 行(9 天 * 每天 80 个不同的系列)。
那么,如何提前9天预测呢?还是只有下一个时期的预测?