训练模型期间不兼容错误维度的值错误

数据挖掘 机器学习 分类 scikit-学习 决策树 麻木的
2022-02-25 15:13:24

我正在数据集实施决策树在此之前,我想用CountVectorizer转换一个特定的列。为此,我使用管道使其更简单。

但是存在行尺寸不兼容的错误

代码

# Imported the libraries....
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc

transformer=ct(transformers=[('review_counts',cv(),['verified_reviews']),
                             ('variation_dummies', ohe(),['variation'])
                            ],remainder='passthrough')

pipe= mp(transformer,dtc(random_state=42))

x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback

x_train,x_test,y_train,y_test= tts(x,y,test_size=0.3,random_state=42,stratify=y)
print(x_train.shape,y_train.shape)             # ((2205, 3), (2205,))

pipe.fit(x_train,y_train)                       # Error on this line

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-79-a981c354b190> in <module>()
----> 1 pipe.fit(x_train,y_train)

7 frames
/usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
    584                                                     exp=brow_lengths[i],
    585                                                     got=A.shape[0]))
--> 586                     raise ValueError(msg)
    587 
    588                 if bcol_lengths[j] == 0:

ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.

图片

问题

  1. 这种不兼容的行尺寸形成错误是如何形成的?
  2. 如何解决?
1个回答

根据文档,每当转换器需要一维数组作为输入时,这些列都被指定为字符串(“xxx”)。对于需要 2D 数据的转换器,我们需要将列指定为字符串列表 (["xxx"])。

所以下面的代码将起作用。

## Important: i have passed the columns a string to CV and list of columns to OHE

transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'), 
                             ('variation_dummies', ohe(),['variation'])
                            ],remainder='passthrough')

归功于另一个帮助我的人。