statsmodels 是否完全支持 MultiIndex?

数据挖掘 python pandas statsmodels data-indexing-techniques
2022-02-13 16:28:17

下面的代码片段显示了 statsmodels 如何通过使用下划线“_”连接 MultiIndex 元组来展平它们。

import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS

K = 2
N = 10
ERROR_VOL = 1

np.random.seed(0)
X = np.random.rand(N, K)
coefs = np.linspace(0.1, 1, K)
noise = np.random.rand(N)
y = X @ coefs + noise * ERROR_VOL

index_ = pd.MultiIndex.from_tuples([('some_var','feature_0'), ('some_var','feature_1')])
df = pd.DataFrame(X, columns=index_)
ols_fit = OLS(y, df, hasconst=False).fit()
print(ols_fit.params)

结果是

>>> some_var_feature_0    0.230474
some_var_feature_1    1.646789
dtype: float64

由于上述扁平化,以下以及依赖名称匹配的类似操作失败:

params_stdzd = ols_fit.params * df.std()
>>> ValueError: cannot join with no level specified and no overlapping names

问题

  1. 有没有办法让 statsmodels 尊重 pandas MultiIndex 而不是展平它?

如果不:

  1. 有没有办法将扁平字符设置为下划线以外的东西?

  2. can I rely on OLS.params respecting the order of df.columns? If so I could just reindex OLS.params with df.columns to get a properly indexed params Series.

  3. Are there better ways to get MultiIndex interoperabilty with statsmodels?

0个回答
没有发现任何回复~