机器算法验证 - 如何绘制具有三个响应变量的 ROC 曲线？ - 吾爱随笔录

如何绘制具有三个响应变量的 ROC 曲线？

机器算法验证 r 分类鹏

2022-03-31 11:58:37

我在下面复制了一个玩具示例，其中响应变量具有三个可能的类。我正在尝试创建一个 ROC，但不知道当有三个类时如何处理它。任何帮助将不胜感激。谢谢

library(ipred)
control = rpart.control(maxdepth = 20, minsplit = 20, cp = 0.01, maxsurrogate=2, surrogatestyle = 0, xval=25)
n <- 500; p <- 10
f <- function(x,a,b,d) return( a*(x-b)^2+d )
x1 <- runif(n/2,0,4)
y1 <- f(x1,-1,2,1.7)+runif(n/2,-1,1)
x2 <- runif(n/2,2,6)
y2 <- f(x2,1,4,-1.7)+runif(n/2,-1,1)
y <- c(rep(-1,floor(n/3)),rep(0,ceiling(n/3)), rep(1,ceiling(n/3)))
dat <- data.frame(y=factor(y),x1=c(x1,x2),x2=c(y1,y2), matrix(rnorm(n*(p-2)),ncol=(p-2)))
names(dat)<-c("y",paste("x",1:p,sep=""))
dat

plot(dat$x1,dat$x2,pch=c(1:2)[y], col=c(1,8)[y], 
     xlab=names(dat)[2],ylab=names(dat)[3])
indtrain<-sample(1:n,300,replace=FALSE)
train<-dat[indtrain,]; dim(train) 
test<-dat[setdiff(1:n,indtrain),]; dim(test) 
test

mod <- bagging(y~.,  data=train, control=control, coob=TRUE, nbagg=25, keepX = TRUE)
mod
pred<-predict(mod, newdata=test[,-1],type="prob", aggregation= "average"); pred

对于两个类的情况，我用来执行以下操作，但它对三个类不再有效。

yhat <- pred[,2]
y = test[, -1]
plot.roc(y, yhat)

2个回答

您可能想查看以下文章中定义的 ROC 曲面下的体积：

Ferri C、Hernández-orallo J、Salido MA。多类问题的 ROC 曲面下的体积。近似值的精确计算和评估。第 14 届欧洲机器学习会议记录。2003；108-120。
He X，弗雷 EC。三级 ROC 曲面 (VUS) 下体积的意义和用途。IEEE 医学影像汇刊。2008;27(5):577-588。

ROC 分析设计用于仅处理两个变量：噪声和无噪声，因此将其用于 3 个或更多变量几乎没有意义。

但是，对于任何多分类问题，您都可以使用一堆二元分类器并进行所谓的One-Vs-All 分类

例如，考虑 IRIS 数据集：有 3 类：setosa、versicolor 和 virginica。所以我们可以构建 3 个分类器（例如朴素贝叶斯）：用于 setosa、用于 vesicolor 和用于 virginica。然后为每个模型绘制 ROC 曲线，并分别调整每个模型的阈值。在这种情况下，AUC 可能只是各个模型的 AUC 的平均值。

这是 IRIS 数据集的 ROC 曲线：

这种情况下的 AUC 为 $\approx 0.98 = \frac{1 + 0.98 + 0.97}{3}$

代码：

library(ROCR)
library(klaR)

data(iris)

lvls = levels(iris$Species)
testidx = which(1:length(iris[, 1]) %% 5 == 0) 
iris.train = iris[testidx, ]
iris.test = iris[-testidx, ]

aucs = c()
plot(x=NA, y=NA, xlim=c(0,1), ylim=c(0,1),
     ylab='True Positive Rate',
     xlab='False Positive Rate',
     bty='n')

for (type.id in 1:3) {
  type = as.factor(iris.train$Species == lvls[type.id])

  nbmodel = NaiveBayes(type ~ ., data=iris.train[, -5])
  nbprediction = predict(nbmodel, iris.test[,-5], type='raw')

  score = nbprediction$posterior[, 'TRUE']
  actual.class = iris.test$Species == lvls[type.id]

  pred = prediction(score, actual.class)
  nbperf = performance(pred, "tpr", "fpr")

  roc.x = unlist(nbperf@x.values)
  roc.y = unlist(nbperf@y.values)
  lines(roc.y ~ roc.x, col=type.id+1, lwd=2)

  nbauc = performance(pred, "auc")
  nbauc = unlist(slot(nbauc, "y.values"))
  aucs[type.id] = nbauc
}

lines(x=c(0,1), c(0,1))

mean(aucs)

灵感来源：http: //karchinlab.org/fcbb2_spr14/Lectures/Machine_Learning_R.pdf

其它你可能感兴趣的问题

上一篇在一组量表上进行验证性因子分析时是否测试项目的正态性（和变换）？下一篇受访者可以选择多个响应的分类变量如何用作多元回归中的预测变量？