机器算法验证 - 朴素贝叶斯无法完美预测 - 吾爱随笔录

假设我有一个变量可以完美地预测我的数据集中的一个类：

set.seed(668130)
dat <- iris
dat$X <- sample(1:3, nrow(iris), replace=TRUE)
    dat$X  <- ifelse(dat$Species=='setosa', 1, dat$X)
> table(dat$X, dat$Species)

    setosa versicolor virginica
  1     50         12        15
  2      0         18        15
  3      0         20        20

为什么NaiveBayes 算法在此数据集上失败？

library(klaR)
> NaiveBayes(Species ~ ., dat)
Error in NaiveBayes.default(X, Y, ...) : 
  Zero variances for at least one class in variables: X

在我看来，如果 X=1，100% 的时间输出“setosa”的分类是合理的。其他算法（例如 randomForest）这样做：

library(randomForest)
> randomForest(Species ~ ., dat)

Call:
 randomForest(formula = Species ~ ., data = dat) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08

在这种情况下，NaiveBayes 算法在数学上是否未定义？我知道特定的数据集有点做作，但是当我交叉验证 NaiveBayes 模型时，偶尔会出现问题。