机器算法验证 - R 中的派对包是否提供随机森林模型的袋外误差估计？ - 吾爱随笔录

R 中的派对包是否提供随机森林模型的袋外误差估计？

机器算法验证 r 机器学习随机森林

2022-03-29 01:56:05

我是 R 的新用户，也是随机森林建模的新手。我似乎无法弄清楚如何获得使用 R 中的 Party 包构建的 cforest 模型的袋外 (OOB) 误差估计值。在 randomForest 包中，如果您只是“打印”模型，则会显示 OOB 误差估计值对象，但派对包的工作方式不同。

使用 randomForest 包运行随机森林模型：

> SBrf<- randomForest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr + DaylightHours + Bathy + Slope + MoonPhase + factor(Region), data = SBrfImpute, ntree = 500, replace = FALSE, importance = TRUE)
> print(SBrf)

Call:
 randomForest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr + DaylightHours + Bathy + Slope + MoonPhase + factor(Region),      data = SBrfImpute, ntree = 500, replace = FALSE, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 23.67%
Confusion matrix:
    0   1 class.error
0 823 127   0.1336842
1 211 267   0.4414226

使用派对包运行随机森林模型：

> SBcf<- cforest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr+ DaylightHours + Bathy + Slope + MoonPhase + factor(Region), data = bll_SB_noNA, control = cforest_unbiased())
> print(SBcf)

Random Forest using Conditional Inference Trees
Number of trees:  500 

Response:  factor(SB_Pres) 
Inputs:  SST, Chla, Dist2Shr, DaylightHours, Bathy, Slope, MoonPhase, factor(Region) 
Number of observations:  534

我已经阅读了手册和小插曲，但似乎找不到答案。一旦您使用派对包运行了随机森林模型，有谁知道如何检索 OOB 错误估计？还是我完全错过了两个包之间的一些非常重要的区别，导致使用派对包构建的随机森林模型没有 OOB 错误估计？

2个回答

该caret软件包有一种获取方法。您可以train用作接口。例如：

> mod1 <- train(Species ~ ., 
+               data = iris, 
+               method = "cforest", 
+               tuneGrid = data.frame(.mtry = 2),
+               trControl = trainControl(method = "oob"))
> mod1
150 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: 

Summary of sample sizes:  

Resampling results

  Accuracy  Kappa
  0.967     0.95 

Tuning parameter 'mtry' was held constant at a value of 2

或者，如果您想直接访问，可以使用一个内部函数，cforest但您必须使用命名空间运算符调用它：

> mod2 <- cforest(Species ~ ., data = iris,
+                 controls = cforest_unbiased(mtry = 2))
> caret:::cforestStats(mod2)
 Accuracy     Kappa 
0.9666667 0.9500000

高温下，

最大限度

根据手册，也许为了扩展 Momo 的答案，派对默认不提供 OOB 估计，但它的计算并不太难。在predict函数中，您可以使用参数OOB=T，并将参数保留为newdata默认值NULL（即，使用训练数据）。

像这样的东西应该可以工作（稍微改编自派对手册）：

### honest (i.e., out-of-bag) cross-classification of
### true vs. predicted classes
set.seed(290875)
data("mammoexp", package = "TH.data")
forestmodel=cforest(ME ~ ., data = mammoexp)
oobPredicted=predict(forestmodel,OOB=T)
table(mammoexp$ME,oobPredicted)

其它你可能感兴趣的问题

上一篇如何从数据集中学习贝叶斯网络结构？下一篇良好的内部因素结构但较差的 Cronbach'sαα?