熊猫索引错误

数据挖掘 机器学习 Python scikit-学习 熊猫
2022-03-06 19:55:42

我正在尝试使用train_test_split拆分我的数据。但是,我收到一个索引错误。我在下面粘贴了部分错误消息。我正在使用 Python 3.5 版本和sklearn0.18.1。该代码适用于我以前不同的数据集。这里的特征在 Pandas 中DataFrame,标签在 Pandas 中 Series

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.4, random_state=1)

KeyError                                  
Traceback (most recent call last)
/apps/anaconda/anaconda-3.5/lib/python3.5/site-
    packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:<br><br>
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4443)()<br><br>
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)()<br><br>
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733)()<br><br>
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687)()<br><br>
KeyError: 0
1个回答

Pandas 索引不同:

X[some_slice]   # in Numpy is NOT equal to
df[some_slice]  # in Pandas, but is instead equal to
df.iloc[some_slice]

您可以通过在拆分之前features调用它们将数据框转换为 numpy 数组:.values

X_train, X_test, y_train, y_test = \
    train_test_split(features.values, labels.values, test_size=0.4, random_state=1)