XgBoost 错误:对比只能应用于具有 2 个或更多级别的因子

数据挖掘 r xgboost 错误处理
2022-02-19 03:50:34
Error in XGBoost: Error in contrasts<-(tmp, value = contr.funs[1 + isOF[nn]]): 
  contrasts can be applied only to factors with 2 or more levels

param <- list(  objective           = "binary:logistic", 
                booster             = "gbtree",
                eval_metric         = "auc",
                eta                 = 0.01,
                max_depth           = 5,
                subsample           = 0.6,
                colsample_bytree    = 0.6
)


t1.y <- t1$target   

t1 <- sparse.model.matrix(target ~ ., data = t1)
#Error occurs after the above command. If i change the above statement to (replacing . by -1 to drop collinear columns)
t1 <- sparse.model.matrix(target ~ -1, data = t1)

更改为 -1 后不会发生错误,但生成的 dgcMatrix 看起来很可疑,因为它只包含一列和很多行。我期望原始矩阵的列是单热编码的,而不是得到一列作为输出。如果我继续忽略对生成的稀疏矩阵的检查,那么在运行下面的 xgboost 后会出现错误:

Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : 
  [05:14:55] amalgamation/../src/objective/regression_obj.cc:108: label must be in [0,1] for logistic regression


dtrain <- xgb.DMatrix(data=t1, label=t1.y)
watchlist <- list(train=dtrain)
bst1 <- xgboost(   params              = param, 
                   data                = dtrain, 
                   nrounds             = 1750,
                   #watchlist           = watchlist,
                   verbose             = 1,
                   maximize            = FALSE
)

我尝试将目标列转换为因子和数字(将其转换为 0 和 1,因为 as.numeric 将其转换为 1 和 2)。但这并没有帮助:'(

当我将目标转换为数字然后运行 ​​Xgboost 后,我​​收到以下错误:Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : [05:20:12] amalgamation/../src/tree/updater_colmaker.cc:162: Check failed: (n) > (0) colsample_bytree=0.6 is too small that no feature can be included

*t1 是一个data.frame,最初包含因子、数字和整数,我将其转换为稀疏矩阵

谢谢您的帮助!

0个回答
没有发现任何回复~