数据挖掘 - sklearn管道ValueError：找到样本数量不一致的输入变量 - 吾爱随笔录

我收到以下错误。我检查了 X 和 y 的形状，但没有发现错误

from sklearn.model_selection import train_test_split
from sklearn.utils import check_consistent_length

labels = ['non-role','role']
X = df[["POS", "NER", "DEF", "SYN"]]
y = df["Label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2, shuffle=True)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

print(check_consistent_length(X_train, y_train))

这是输出：

(25238, 4)

(25238,)

(6310, 4)

(6310,)

没有任何

我试图适应模型：

NB_pipeline = Pipeline([('tfidf-vect', TfidfVectorizer()),('clf', RandomForestClassifier())])
NB_pipeline.fit(X_train, y_train)

但收到以下错误：

ValueError: Found input variables with inconsistent numbers of samples: [4, 25238]