数据挖掘 - 来自 R 和 Python 的 xgboost 预测不匹配 - 吾爱随笔录

想也许这里有人可以帮助我们解决一个谜（https://github.com/dmlc/xgboost/issues/1623）：

我们正在尝试在 R 中为在 Python 中训练的模型构建一个 xgboost 预测函数，结果不匹配。有关如何重现的示例，请参见下文。

xgboost 相当新，特别是跨语言使用它，所以可能会遗漏一些明显的东西。

重现步骤：

(1) 下载此模型文件： http: //ml.stat.purdue.edu/hafen/WTKG.model

(2) 在 R 中运行此脚本：

library(xgboost)
mod <- xgb.load("WTKG.model")
x <- c(91, 9, 9, NA, NA, 273, 20, 170, NA, NA, 14, 14, 0,
  2, 0.94289404091, 0.94289404091, 0.93087973569, 0.0120143052199997, 0.95490834613,
  0.95490834613, 1, 90, 0.95490834613, 1, 90,
  0.93087973569, 357, -266, 0.93087973569, 357, -266,
  0.95490834613, NA, 0.93087973569, NA, NA, NA)
d <- xgb.DMatrix(matrix(x, nrow = 1), missing = NA)
predict(mod, d)
# [1] 0.6483372

(3) 在 Python 中运行此脚本：

import numpy as np
import xgboost as xgb

bst = xgb.Booster({'nthread': 4})
bst.load_model('WTKG.model')
x = [91, 9, 9, np.nan, np.nan, 273, 20, 170, np.nan, np.nan, 14, 14, 0,
  2, 0.94289404091, 0.94289404091, 0.93087973569, 0.0120143052199997, 0.95490834613,
  0.95490834613, 1, 90, 0.95490834613, 1, 90,
  0.93087973569, 357, -266, 0.93087973569, 357, -266,
  0.95490834613, np.nan, 0.93087973569, np.nan, np.nan, np.nan]
d = xgb.DMatrix(data=[x], missing=np.nan)
bst.predict(d)[0]
# 1.3775804