如何拟合 KNN,然后与这些邻居进行线性回归?

数据挖掘 scikit-学习 线性回归 k-nn
2022-02-24 12:43:58

我如何适合 KNN 来获得k最近的邻居,然后在 Scikit-Learn 中使用线性回归(而不是加权平均值)将这些邻居聚合成一个拟合?

我尝试使用KNeighborsTransformer然后创建管道,LinearRegression但这似乎没有做正确的事情。

1个回答

KNeighborsTransformer只给你最近邻居的索引和距离。您需要做更多的工作来检索点以适合您的线性回归。

这是一个似乎有效的草稿:

from sklearn.neighbors import NearestNeighbors
from sklearn.base import RegressorMixin, BaseEstimator, clone
from sklearn.linear_model import Lasso
from sklearn.utils import check_X_y
import numpy as np

class LocalLinearRegressor(RegressorMixin, BaseEstimator):
    def __init__(self, n_neighbors=10, linear_model=Lasso()):
        self.n_neighbors = n_neighbors
        self.linear_model = linear_model

    def fit(self, X, y=None):
        "Fits the neighbors search."
        X, y = check_X_y(X, y)
        self._fit_X = X
        self._fit_y = y
        self.neighbor_search = NearestNeighbors(n_neighbors=self.n_neighbors)
        self.neighbor_search.fit(X)
        self.local_regressors_ = {}
        return self

    def predict(self, X):
        """Fits linear regressions on the k nearest training points to predict new values.
        
        We don't fit these linear regressions at fit time because there would be so many.
        However, we do save the regressions as we see them to speed up predictions.
        """
        neighbors = self.neighbor_search.kneighbors(X, return_distance=False)
        ksets, mapper = np.unique(neighbors, return_inverse=True, axis=0)
        for kset in ksets:
            if tuple(kset) in self.local_regressors_:
                continue
            local_X = self._fit_X[kset, :]
            local_y = self._fit_y[kset]
            self.local_regressors_[tuple(kset)] = clone(self.linear_model).fit(local_X, local_y)
        return np.array([
            self.local_regressors_[tuple(ksets[mapper[i]])].predict(X[i, :].reshape(1, -1))[0]
            for i in range(X.shape[0])
        ])

这是一个 Colab 笔记本,展示了它的实际应用。