简单估算器不能按列估算

数据挖掘 机器学习 scikit-学习
2022-02-14 06:32:39

我有 X_train 形状 (14599, 13),我试图用列的中位数估算 NaN,但不知何故,它用行结果错误估算,因为在一行中有日期,而不是整数值。如果 SimpleImputer 具有轴参数,我已经查找但找不到它存在。如何解决这个问题?

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split


plt.close('all')
avo_sales = pd.read_csv('avocados.csv')
avo_sales.rename(columns = {'4046':'small PLU sold',
                            '4225':'large PLU sold',
                            '4770':'xlarge PLU sold'},
                 inplace= True)
avo_sales.columns = avo_sales.columns.str.replace(' ','')

plt.scatter(avo_sales.Date,avo_sales.TotalBags)

x = np.array(avo_sales.drop(['TotalBags'],1))
y = np.array(avo_sales.TotalBags)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

imp = SimpleImputer(strategy='median')
X_train = imp.fit_transform(X_train)

输出

ValueError: Cannot use median strategy with non-numeric data:
could not convert string to float: '12/31/2017'
```
1个回答

我不认为它试图跨行估算;相反,它试图在日期列中进行估算。您可能希望使用 ColumnTransformer 来选择要估算的列。