使用 TSFRESH 库来预测值

数据挖掘 Python 预测建模 时间序列 特征选择 特征提取
2021-10-09 20:21:10

在理解如何使用 TSFESH 库(版本 0.4.0)来预测特定系列的下一个 N 值方面存在一些问题。在我的代码下面:

    # load data train/test datasets        
    train, Y, test, YY = prepare_train_test()
    # add series ID         
    train['TS_ID'] = pd.Categorical(train['QTR_HR_START']).codes
    test['TS_ID'] = pd.Categorical(test['QTR_HR_START']).codes
    # add ordered id for concrete event of series
    for id in sorted(train['TS_ID'].unique()):
        train.ix[train.TS_ID == id, 'TIME_ORDER_ID'] =  pd.Categorical(train[train.TS_ID == id]['DATETIME']).codes
    for id in sorted(test['TS_ID'].unique()):
        test.ix[test.TS_ID == id, 'TIME_ORDER_ID'] = pd.Categorical(test[test.TS_ID == id]['DATETIME']).codes
    # perform feature extraction for my signal
    extraction_settings = FeatureExtractionSettings()
    extraction_settings.IMPUTE = impute  # Fill in Infs and NaNs
    X = extract_features(train, column_id='TS_ID', feature_extraction_settings=extraction_settings).values
    XT = extract_features(test, column_id='TS_ID', feature_extraction_settings=extraction_settings).values

    # there should be as example 
    # model = xgb.DMatrix(X, label=Y, missing=np.nan)
    # model.fit()
    # model.predict(XT)

但是,在X = extract_features(...)我在调试器中看到以下结果之后 在此处输入图像描述

这意味着初始X-dataset/features(shape=(722,10) 被转换为形状 (80, 1899)。

“80”从何而来?我猜train.TS_ID来自。但我的XT-dataset 仍然包含 722 行(9 天 * 每天 80 个不同的系列)。

那么,如何提前9天预测呢?还是只有下一个时期的预测?

1个回答

TSFRESH 已经支持时间序列预测。

在此处此处查看详细信息和示例