在使用 SMOTE 将数据集拆分为训练和测试分区后,我正在尝试对数据集进行重新采样。这是我的代码:
smote_X = df[cols]
smote_Y = df[target_col]
#Split train and test data
smote_train_X,smote_test_X,smote_train_Y,smote_test_Y = train_test_split(smote_X,smote_Y,test_size = .25,random_state = 111)
smote_train_Y_series = smote_train_Y.iloc[:,0]
#oversampling minority class using smote
os = SMOTE(random_state = 0)
os_smote_X,os_smote_Y = os.fit_sample(smote_train_X,smote_train_Y_series)
我添加了第 5 行以将来自 Series 的 DataFrame 转换train_test_split为新版本的 SMOTE fit_sample( docs ) 需要这种数据类型,但它现在会引发以下错误。
任何想法如何解决它?
-------------------------------------------------- ------------------------ KeyError Traceback(最近一次调用最后一次)在 16 #oversampling 少数类中使用 smote 17 os = SMOTE(random_state = 0) - --> 18 os_smote_X,os_smote_Y = os.fit_sample(smote_train_X,smote_train_Y_series) 19 os_smote_X = pd.DataFrame(data = os_smote_X,columns=cols) 20 os_smote_Y = pd.DataFrame(data = os_smote_Y,columns=target_col)
/opt/conda/lib/python3.6/site-packages/imblearn/base.py in fit_resample(self, X, y) 86 如果 self._X_columns 不是 None: 87 X_ = pd.DataFrame(output[0], columns=self._X_columns) ---> 88 X_ = X_.astype(self._X_dtypes) 89 else: 90 X_ = output[0]
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in astype(自我,dtype,复制,错误,**kwargs)5863
results.append(5864 col.astype(-> 5865 dtype=dtype[col_name],copy=copy,errors=errors,**kwargs 5866)5867
)/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs) 5846 if len(dtype) > 1 or self.name not在 dtype: 5847
raise KeyError( -> 5848 "只有系列名称可以用于 " 5849 " 系列 dtype 映射中的键。" 5850 )KeyError:“只有系列名称可用于系列 dtype 映射中的键。”
更新 2020 年 1 月 28 日: 尝试了另外两个选项,但到目前为止没有运气。还在寻求帮助。
A. 传递原始输出train_test_split:
#oversampling minority class using smote
os = SMOTE(random_state = 0)
os_smote_X,os_smote_Y = os.fit_sample(smote_train_X,smote_train_Y)
-------------------------------------------------- ------------------------- AttributeError Traceback (last recent call last) in 1 #oversampling small class using smote 2 os = SMOTE(random_state = 0) ----> 3 os_smote_X,os_smote_Y = os.fit_resample(smote_train_X,smote_train_Y) 4 os_smote_X = pd.DataFrame(data = os_smote_X,columns=cols) 5 os_smote_Y = pd.DataFrame(data = os_smote_Y,columns=target_col)
/opt/conda/lib/python3.6/site-packages/imblearn/base.py in fit_resample(self, X, y) 73 """ 74 check_classification_targets(y) ---> 75 X, y, binarize_y = self ._check_X_y(X, y) 76 77 self.sampling_strategy_ = check_sampling_strategy(
/opt/conda/lib/python3.6/site-packages/imblearn/base.py in _check_X_y(self, X, y, accept_sparse) 148 if hasattr(y, "loc"): 149 # 存储构建系列的信息--> 150 self._y_name = y.name 151 self._y_dtype = y.dtype 152 其他:
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in getattr (self, name) 5177 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5178
return self[name] -> 5179 return目的。getattribute (self, name) 5180 5181 def setattr (self, name, value):AttributeError:“DataFrame”对象没有属性“名称”
B.在将其与转换为系列smote_train_X一起传递之前转换为矩阵:smote_train_Y
smote_train_X_matrix = smote_train_X.as_matrix()
smote_train_Y_series = smote_train_Y.iloc[:,0]
#oversampling minority class using smote
os = SMOTE(random_state = 0)
os_smote_X,os_smote_Y = os.fit_resample(smote_train_X_matrix,smote_train_Y_series)
请注意,生成的矩阵和系列分别显示 (4633, 46) 和 (4633,) 的形状。
-------------------------------------------------- ------------------------- ValueError Traceback(最近一次调用最后一次)/opt/conda/lib/python3.6/site-packages/pandas/ core/internals/managers.py in create_block_manager_from_blocks(blocks, axes) 1677
blocks = [ -> 1678 make_block(values=blocks[0], placement=slice(0, len(axes[0]))) 1679 ]/opt/conda/lib/python3.6/site-packages/pandas/core/internals/blocks.py in make_block(值,放置,klass,ndim,dtype,fastpath)3283
-> 3284 返回 klass(值,ndim=ndim,placement=placement)3285
/opt/conda/lib/python3.6/site-packages/pandas/core/internals/blocks.py in init(self,values,placement,ndim)127“传递的项目数量错误{val},位置暗示” - -> 128 "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs)) 129)
ValueError:错误的项目数通过 46,位置意味着 44
在处理上述异常的过程中,又出现了一个异常:
2 os = SMOTE(random_state = 0) 3 os_smote_X,os_smote_Y = os.fit_resample(smote_train_X_matrix,smote_train_Y_series) ----> 4 os_smote_X = pd.DataFrame(data = os_smote_X,columns=cols) 中的 ValueError Traceback (最近一次调用最后一次) ) 5 os_smote_Y = pd.DataFrame(data = os_smote_Y,columns=target_col) 6 ###
/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in init (self, data, index, columns, dtype, copy) 438 mgr = init_dict({data.name: data}, index, columns, dtype=dtype) 439 else: --> 440 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy) 441 442 # 对于数据是类似列表的,或者是可迭代的(将消耗到列表中)
/opt/conda/lib/python3.6/site-packages/pandas/core/internals/construction.py init_ndarray(values, index, columns, dtype, copy) 211 block_values = [values] 212 --> 213 return create_block_manager_from_blocks (块值,[列,索引])214 215
/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes) 1686
blocks = [getattr(b, "values", b) for b in blocks] 1687
tot_items = sum(b.shape[0] for b in blocks)-> 1688 construction_error(tot_items,blocks[0].shape[1:],axes,e)1689 1690/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in construction_error(tot_items,block_shape,axes,e)1717
引发ValueError(“指定索引传递的空数据。”)1718 raise ValueError( -> 1719 "传递值的形状是 {0},索引暗示 {1}".format(passed,implicit) 1720 ) 1721ValueError:传递值的形状为 (8410, 46),索引暗示 (8410, 44)