发现样本数量不一致的输入变量

数据挖掘 Python scikit-学习 交叉验证
2021-09-22 11:09:51

如果您能告诉我如何解决此错误,我将不胜感激:

代码:

X = np.array(pd.read_csv('my_X_table1-1c.csv',header=None).values)
y = np.array(pd.read_csv('my_y_table1-1c.csv',header=None).values.ravel())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=7)

def Ridgecv(alpha):
    return cross_val_score(Ridge(alpha=float(alpha), random_state=2),
                           X_train, y_train, 'mae', cv=5).mean()

该错误与X_train, y_train

ValueError: Found input variables with inconsistent numbers of samples: [1052, 1052, 3]
2个回答

看来我错过了“得分”这个词。事实上,额外的 3 与 'mae' 的字符数有关。

def Ridgecv(alpha):
    return cross_val_score(Ridge(alpha=float(alpha), random_state=2),
                           X_train, y_train, scoring='mae', cv=5).mean()

它应该按顺序:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,Y,random_state=101,test_size=0.3)

然后它应该适合方法(x_train,y train)