机器算法验证 - 凸包重叠概率的蒙特卡罗估计 - 吾爱随笔录

凸包重叠概率的蒙特卡罗估计

机器算法验证蒙特卡洛

2022-04-02 14:49:12

这是我的Math.SE帖子的统计版本。

给定自然数和，均匀随机选择单位正方形内的将点称为蓝点，将点称为红点。令为（红点的凸包）与（蓝点的凸包）重叠的概率。如何通过蒙特卡洛（或者更好的）方法 $b$ $r$ $b+r$ $b$ $r$ $p(b,r)$ $H_r$ $H_b$ $p(b,r)$

我可以考虑对大量随机选择的测试用例的概率进行平均，但是如何以给我错误界限的方式对搜索空间进行采样？

1个回答

我知道@Ganesh 发布这个问题已经有一段时间了，但希望你仍然对回复感兴趣。

我已经编写了一些 R 代码，可以满足您的要求，我认为：

library(ggplot2)
library(MASS)

#################################################
#################################################
## Steps:                                      ##
##  1. draw b 'blue' points and r 'red' points ##
##  2. perform LDA                             ##
##  3. check error rate                        ##
#################################################
#################################################

##################################################
# function: drawPoints                           #
#                                                #
# description: draws b + r points uniformly from #
#              a unit square                     #
#                                                #
# inputs: b - number of 'blue' points to draw    #
#         r - number of 'red' points to draw     #
#                                                #
# outputs: data frame containing points          #
##################################################

drawPoints <- function(b,r) {
    x <- runif(b+r)
    y <- runif(b+r)
    class <- c(rep('b',b),rep('r',r))
    return(data.frame(x = x, y = y, class = class))
}

##################################################
# function: checkOverlap                         #
#                                                #
# description: if the data is linearly separable #
#              there is no overlap in the convex #
#              hulls of the different classes    #
#                                                #
# inputs: df - data frame containing classified  #
#              points                            #
#                                                #
# outputs: FALSE if 0 error rate                 #
#          TRUE otherwise                        #
##################################################

checkOverlap <- function(df) {
    disc.anal <- lda(class ~ x + y, data = df)
    return(!identical(predict(disc.anal)$class,df$class))
}

######################################################
######################################################
## Simulate many trials to estimate rate of overlap ##
######################################################
######################################################

trials <- 1000
performance <- rep(as.numeric(NA),10)
for(i in 1:10) {
    results <- replicate(n = trials, expr = checkOverlap(drawPoints(10,i)))
    performance[i] <- prop.table(table(results))['TRUE']
}
qplot(x = 1:length(performance), y = performance,
  xlab = 'Number of Red Points',
  ylab = 'Proportion of Simulations with Overlap',
  main = 'Proportion of Simulations with Overlap, 10 Blue Points')

其它你可能感兴趣的问题

上一篇研究项目时使用什么研究工具？下一篇为什么股票和指数收益回归的累积残差均值回归