数据挖掘 - 如何在python中进行k-folds同时分成3组？ - 吾爱随笔录

考虑以下数据：

   import pandas as pd
    wine = pd.read_csv(r'wine_data.csv', names = ["Cultivator", "Alchol", "Malic_Acid", "Ash", "Alcalinity_of_Ash", "Magnesium", "Total_phenols", "Falvanoids", "Nonflavanoid_phenols", "Proanthocyanins", "Color_intensity", "Hue", "OD280", "Proline"])
    X = wine.drop('Cultivator',axis=1) #input
    y = wine['Cultivator'] #output

y 是我要预测的内容，X 是输入，我将使用某种 mlp 分类器。我想要做的是将这些数据拆分为测试、训练和验证，然后应用 K-folds。我正在努力看看你是如何做到这一点的..

我知道我可以通过以下方式获得验证、测试和培训：

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

但是我现在想要做的是应用 k 折叠，这样对于每个折叠我都有 3 组：验证、测试、训练，而不仅仅是 2 组。

我知道我可以将以下内容用于 Kfolds：

kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
X_np=np.array(X)
y_np=np.array(y)

转换为 numpy 数组后，我可以这样做：

for train_index, test_index in kf.split(X_np):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X_np[train_index], X_np[test_index]
    y_train, y_test = y_np[train_index], y_np[test_index]

但是我如何获得'validation_index'。一般来说，问题是当我有 3 组而不是 2 组时如何使用 k 折叠？

我什么时候对数据进行规范化；当我像上面那样分成 X_train、X_test 时，我应该标准化......还是我以前做过？

任何帮助表示赞赏。