实际上,scikit-learn
确实提供了这样的功能,尽管实现起来可能有点棘手。这是一个建立在三个模型之上的平均回归器的完整工作示例。首先,让我们导入所有需要的包:
from sklearn.base import TransformerMixin
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
然后,我们需要将我们的三个回归模型转换为转换器。这将允许我们将他们的预测合并到一个单一的特征向量中FeatureUnion
:
class RidgeTransformer(Ridge, TransformerMixin):
def transform(self, X, *_):
return self.predict(X).reshape(len(X), -1)
class RandomForestTransformer(RandomForestRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X).reshape(len(X), -1)
class KNeighborsTransformer(KNeighborsRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X).reshape(len(X), -1)
现在,让我们为我们的 frankenstein 模型定义一个构建器函数:
def build_model():
ridge_transformer = Pipeline(steps=[
('scaler', StandardScaler()),
('poly_feats', PolynomialFeatures()),
('ridge', RidgeTransformer())
])
pred_union = FeatureUnion(
transformer_list=[
('ridge', ridge_transformer),
('rand_forest', RandomForestTransformer()),
('knn', KNeighborsTransformer())
],
n_jobs=2
)
model = Pipeline(steps=[
('pred_union', pred_union),
('lin_regr', LinearRegression())
])
return model
最后,让我们拟合模型:
print('Build and fit a model...')
model = build_model()
X, y = make_regression(n_features=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('Done. Score:', score)
输出:
Build and fit a model...
Done. Score: 0.9600413867438636
为什么要麻烦以这种方式使事情复杂化?好吧,这种方法允许我们使用标准scikit-learn
模块优化模型超参数,例如GridSearchCV
or RandomizedSearchCV
。此外,现在可以轻松地从磁盘保存和加载预训练模型。