机器算法验证 - 套索交叉验证 - 吾爱随笔录

我想执行交叉验证以找到 Lasso 的正则化参数。我在 python 中使用 scikit-learn 库。我首先生成数据集，然后执行 k 折交叉验证。这是我的代码（大部分来自 scikit-learn 网站上的示例）：

# generate some sparse data to play with
import numpy as np
n_samples, n_features = 5000, 200
X = np.random.randn(n_samples, n_features)
coef = 3 * np.random.randn(n_features)
coef[10:] = 0  # sparsify coef
y = np.dot(X, coef)

# add noise
y += 0.01 * np.random.normal((n_samples,))

# Split data in train set and test set
n_samples = X.shape[0]
X_train, y_train = X[:n_samples / 2], y[:n_samples / 2]
X_test, y_test = X[n_samples / 2:], y[n_samples / 2:]

###############################################################################
# Lasso
from sklearn.linear_model import Lasso
from sklearn.cross_validation import KFold
from matplotlib import pyplot as plt

kf = KFold(X_train.shape[0], n_folds = 10,)


alphas = np.logspace(-16, 3, num = 50, base = 2)

e_alphas = list()
e_alphas_r = list()  #holds average r2 error
for alpha in alphas:
    lasso = Lasso(alpha=alpha)
    err = list()
    err_2 = list()
    for tr_idx, tt_idx in kf:
        X_tr , X_tt = X_train[tr_idx], X_test[tt_idx]
        y_tr, y_tt = y_train[tr_idx], y_test[tt_idx]
        lasso.fit(X_tr, y_tr)
        y_hat = lasso.predict(X_tt)
        err_2.append(lasso.score(X_tt,y_tt))
        err.append(np.average((y_hat - y_tt)**2))
    e_alphas.append(np.average(err))
    e_alphas_r.append(np.average(err_2))

plt.figsize = (15,10)
fig = plt.figure()     
ax = fig.add_subplot(111)
ax.plot(alphas, e_alphas, 'b-')
ax.plot(alphas, e_alphas_r, 'g--')
ax.set_xlabel("alpha")
plt.show()

误差曲线如下图所示：

我知道在 scikit-learn 中还有其他方法可以进行 lassoCV，但我只想知道在给定我得到的那种图表的情况下如何选择参数。感谢您的回复。