我已经在这里提到了这篇文章,但没有答案。
我正在使用随机森林分类器进行二元分类。我的数据集形状是 (977,8),类别比例为 77:23。我的系统有 4 个内核和 8 个逻辑处理器。
由于我的数据集不平衡,我使用了 Balancedbaggingclassifier(以随机森林作为估计器)。
因此,我使用 gridsearchCV 来识别平衡袋分类器模型的最佳参数来训练/拟合模型,然后进行预测。
我的代码如下所示
n_estimators = [100, 300, 500, 800, 1200]
max_samples = [5, 10, 25, 50, 100]
max_features = [1, 2, 5, 10, 13]
hyperbag = dict(n_estimators = n_estimators, max_samples = max_samples,
max_features = max_features)
skf = StratifiedKFold(n_splits=10, shuffle=False)
gridbag = GridSearchCV(rf_boruta,hyperbag,cv = skf,scoring='f1',verbose = 3, n_jobs=-1)
gridbag.fit(ord_train_t, y_train)
但是,在 jupyter 控制台中生成的日志具有以下消息,其中 gridsearchcv 分数是nan针对某些 cv 执行的,如下所示。
您可以看到,对于某些 CV 执行,gridscore 是nan. 可以帮帮我吗?而且一直运行了半个多小时还没有输出
为什么gridsearchCV返回nan?
[CV 10/10] END max_features=1, max_samples=25, n_estimators=500;, score=nan total time= 4.5min
[CV 4/10] END max_features=1, max_samples=25, n_estimators=500;, score=0.596 total time=10.4min
[CV 5/10] END max_features=1, max_samples=25, n_estimators=500;, score=0.622 total time=10.4min
[CV 6/10] END max_features=1, max_samples=25, n_estimators=500;, score=0.456 total time=10.5min
[CV 9/10] END max_features=1, max_samples=25, n_estimators=500;, score=0.519 total time=10.5min
[CV 5/10] END max_features=1, max_samples=25, n_estimators=800;, score=nan total time= 3.3min
[CV 4/10] END max_features=1, max_samples=25, n_estimators=800;, score=nan total time= 9.9min
[CV 8/10] END max_features=1, max_samples=25, n_estimators=800;, score=nan total time= 7.0min
[CV 6/10] END max_features=1, max_samples=25, n_estimators=800;, score=nan total time=10.7min
[CV 1/10] END max_features=1, max_samples=25, n_estimators=800;, score=0.652 total time=16.4min
[CV 9/10] END max_features=1, max_samples=25, n_estimators=800;, score=nan total time= 7.6min
[CV 2/10] END max_features=1, max_samples=25, n_estimators=800;, score=0.528 total time=16.6min
[CV 3/10] END max_features=1, max_samples=25, n_estimators=800;, score=0.571 total time=16.4min
[CV 7/10] END max_features=1, max_samples=25, n_estimators=800;, score=0.553 total time=16.1min
[CV 4/10] END max_features=1, max_samples=25, n_estimators=1200;, score=nan total time= 6.7min
[CV 8/10] END max_features=1, max_samples=25, n_estimators=1200;, score=nan total time= 1.7min
[CV 10/10] END max_features=1, max_samples=25, n_estimators=800;, score=0.489 total time=16.0min
[CV 3/10] END max_features=1, max_samples=25, n_estimators=1200;, score=nan total time=18.6min
[CV 1/10] END max_features=1, max_samples=50, n_estimators=100;, score=0.652 total time= 2.4min
更新 - 错误跟踪报告 - 拟合失败原因
he above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<timed exec> in <module>
~\AppData\Roaming\Python\Python39\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
889 return results
890
--> 891 self._run_search(evaluate_candidates)
892
893 # multimetric is determined here because in the case of a callable
~\AppData\Roaming\Python\Python39\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1390 def _run_search(self, evaluate_candidates):
1391 """Search all candidates in param_grid"""
-> 1392 evaluate_candidates(ParameterGrid(self.param_grid))
1393
1394
~\AppData\Roaming\Python\Python39\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
836 )
837
--> 838 out = parallel(
839 delayed(_fit_and_score)(
840 clone(base_estimator),
~\AppData\Roaming\Python\Python39\site-packages\joblib\parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~\AppData\Roaming\Python\Python39\site-packages\joblib\parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~\AppData\Roaming\Python\Python39\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~\Anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
443 raise CancelledError()
444 elif self._state == FINISHED:
--> 445 return self.__get_result()
446 else:
447 raise TimeoutError()
~\Anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
388 if self._exception:
389 try:
--> 390 raise self._exception
391 finally:
392 # Break a reference cycle with the exception in self._exception
ValueError: The target 'y' needs to have more than 1 class. Got 1 class instead