sklearn管道ValueError:找到样本数量不一致的输入变量

数据挖掘 机器学习 scikit-学习
2022-02-24 17:17:42

我收到以下错误。我检查了 X 和 y 的形状,但没有发现错误

from sklearn.model_selection import train_test_split
from sklearn.utils import check_consistent_length

labels = ['non-role','role']
X = df[["POS", "NER", "DEF", "SYN"]]
y = df["Label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2, shuffle=True)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

print(check_consistent_length(X_train, y_train))

这是输出:

(25238, 4)

(25238,)

(6310, 4)

(6310,)

没有任何

我试图适应模型:

NB_pipeline = Pipeline([('tfidf-vect', TfidfVectorizer()),('clf', RandomForestClassifier())])
NB_pipeline.fit(X_train, y_train)

但收到以下错误:

ValueError: Found input variables with inconsistent numbers of samples: [4, 25238]
0个回答
没有发现任何回复~