下面的代码片段显示了 statsmodels 如何通过使用下划线“_”连接 MultiIndex 元组来展平它们。
import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS
K = 2
N = 10
ERROR_VOL = 1
np.random.seed(0)
X = np.random.rand(N, K)
coefs = np.linspace(0.1, 1, K)
noise = np.random.rand(N)
y = X @ coefs + noise * ERROR_VOL
index_ = pd.MultiIndex.from_tuples([('some_var','feature_0'), ('some_var','feature_1')])
df = pd.DataFrame(X, columns=index_)
ols_fit = OLS(y, df, hasconst=False).fit()
print(ols_fit.params)
结果是
>>> some_var_feature_0 0.230474
some_var_feature_1 1.646789
dtype: float64
由于上述扁平化,以下以及依赖名称匹配的类似操作失败:
params_stdzd = ols_fit.params * df.std()
>>> ValueError: cannot join with no level specified and no overlapping names
问题
- 有没有办法让 statsmodels 尊重 pandas MultiIndex 而不是展平它?
如果不:
有没有办法将扁平字符设置为下划线以外的东西?
can I rely on OLS.params respecting the order of df.columns? If so I could just reindex OLS.params with df.columns to get a properly indexed params Series.
Are there better ways to get MultiIndex interoperabilty with statsmodels?