我想知道是否有允许用户自定义损失函数的 R 包?
例如,如果像 ranger 这样的随机森林包具有最小化 OOB MSE 的损失函数。是否可以以某种方式将其定制为最小化负对数可能性的模型?
如果有人知道在玩具数据集上执行此操作的示例/代码,将不胜感激
我想知道是否有允许用户自定义损失函数的 R 包?
例如,如果像 ranger 这样的随机森林包具有最小化 OOB MSE 的损失函数。是否可以以某种方式将其定制为最小化负对数可能性的模型?
如果有人知道在玩具数据集上执行此操作的示例/代码,将不胜感激
使用 XGBoost 调用自定义损失函数相对容易。还有很多已经实现的选项。
它看起来像:
library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)
# Data
df = ISLR::Hitters %>% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]
# XGBoost Matrix
dtrain <- xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest <- xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist <- list(eval = dtest)
# Custom objective function (Huber)
# Reference for gradients: https://stackoverflow.com/questions/45006341/xgboost-how-to-use-mae-as-objective-function
myobjective <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
d = preds - labels
h = 5
scale = 1 + (d / h)^2
scale_sqrt = sqrt(scale)
grad = d / scale_sqrt
hess = 1 / scale / scale_sqrt
return(list(grad = grad, hess = hess))
}
# Custom Metric
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
u = (preds-labels)^2
err <- (sum(u) / length(u))^(1/2)
return(list(metric = "MyError", value = err))
}
# Model Parameter
param <- list(booster = 'gbtree'
, learning_rate = 0.1
, objective = myobjective
, eval_metric = evalerror
, set.seed = 2020)
# Train Model
xgb <- xgb.train(params = param
, data = dtrain
, nrounds = 500
, watchlist
, maximize = FALSE
, early_stopping_rounds = 5
,verbose=1)
# Predict
pred = predict(xgb, dtest)
mae = mae(test$Salary, pred)
print(mae)