ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值

数据挖掘 机器学习 Python scikit-学习 熊猫 麻木的
2021-10-08 19:52:52

我正在尝试将我的数据拟合到以 numpy 作为输入的模型中,因此我向模型提供了数据框值

stacked_averaged_models.fit(train.values, y_train1)

我收到以下错误

ValueError                                Traceback (most recent call last)
<ipython-input-145-9ba69af8df05> in <module>()
      1 X_traintrain = train.as_matrix().astype(np.float)
      2 from sklearn.metrics import r2_score
----> 3 stacked_averaged_models.fit(train.values, y_train1)
      4 stacked_train_pred = stacked_averaged_models.predict(train.values)
      5 stacked_pred = np.expm1(stacked_averaged_models.predict(test.values))

<ipython-input-140-dfca4af6e9d1> in fit(self, X, y)
     18                 instance = clone(model)
     19                 self.base_models_[i].append(instance)
---> 20                 instance.fit(X[train_index], y[train_index])
     21                 y_pred = instance.predict(X[holdout_index])
     22                 out_of_fold_predictions[holdout_index, i] = y_pred

~\Anaconda3\envs\deeplearning\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
    248         Xt, fit_params = self._fit(X, y, **fit_params)
    249         if self._final_estimator is not None:
--> 250             self._final_estimator.fit(Xt, y, **fit_params)
    251         return self
    252 

~\Anaconda3\envs\deeplearning\lib\site-packages\sklearn\linear_model\coordinate_descent.py in fit(self, X, y, check_input)
    705                              order='F', dtype=[np.float64, np.float32],
    706                              copy=self.copy_X and self.fit_intercept,
--> 707                              multi_output=True, y_numeric=True)
    708             y = check_array(y, order='F', copy=False, dtype=X.dtype.type,
    709                             ensure_2d=False)

~\Anaconda3\envs\deeplearning\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    574     if multi_output:
    575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
--> 576                         dtype=None)
    577     else:
    578         y = column_or_1d(y, warn=True)

~\Anaconda3\envs\deeplearning\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    451                              % (array.ndim, estimator_name))
    452         if force_all_finite:
--> 453             _assert_all_finite(array)
    454 
    455     shape_repr = _shape_repr(array.shape)

~\Anaconda3\envs\deeplearning\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X)
     42             and not np.isfinite(X).all()):
     43         raise ValueError("Input contains NaN, infinity"
---> 44                          " or a value too large for %r." % X.dtype)
     45 
     46 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我检查了 NaN 和无穷大,它确实通过了测试

X_traintrain = train.as_matrix().astype(np.float)
print(np.any(np.isnan(X_traintrain)))
print(np.all(np.isfinite(X_traintrain)))

输出:

False
True

我还能如何解决,或者至少调试这个?

X1      X2       X3     X4       X5    X6   X7     X8   Y1      Y2
0.64    784.00  343.00  220.50  3.50    5   0.00    0   10.56   16.67
0.62    808.50  367.50  220.50  3.50    2   0.00    0   8.60    12.07
0.62    808.50  367.50  220.50  3.50    5   0.00    0   8.50    12.04
0.98    514.50  294.00  110.25  7.00    2   0.10    1   24.58   26.47

这是我的数据集的几行

2个回答

我已经尝试了很多建议的解决方案,但我发现这个解决了问题。

data =data[~data.isin([np.nan, np.inf, -np.inf]).any(1)]

看看这个讨论中的一个答案是否有帮助。在我的情况下,错误是由以下原因引起的df = df.reindex(index=my_index):数据帧的索引从 1 开始,但my_index包含一个 0,所以 pandas 默默地插入了一行满是 NaN 的...