数据挖掘 - 与非嵌套结果相比，如何评估嵌套交叉验证结果？ - 吾爱随笔录

我有一个非线性回归模型，从 0 到 1 对基因进行评分，以确定它们是否可能导致疾病。训练数据是 53 个特征的约 700 个基因样本。

目前我使用 xgboost 得到的结果如下：

r2 Nested CV Average: 0.807
MSE Nested CV Average: -0.016

Non-nested Results:
XGBR Train r2: 0.949 Test r2: 0.805
XGBR Train MSE: 0.002 Test MSE: 0.018
r2: 0.895
Predicted r2: 0.871

我应该担心非嵌套训练结果与测试结果相比是否过拟合，还是应该只依靠我的 5 倍嵌套交叉验证来确定过拟合被最小化？

作为参考，我使用的 XGBoost 模型调整如下：

xgbr = xgboost.XGBRegressor(random_state=seed, objective='reg:squarederror') 
xgbr_params = {
    'max_depth': (1, 10),
    'learning_rate': (0.01, 0.5), 
    'n_estimators': (20, 50), 
    'reg_alpha': (1, 10),
    'reg_lambda': (1, 10),
    'gamma': (0, 0.5), 
    'min_child_weight': (1, 5),
    'subsample': (0.1, 1),
    'colsample_bytree': (0.1, 1)}

#Best parameter output:
xgbr = xgboost.XGBRegressor(random_state=seed, subsample=0.8258568992489053, min_child_weight=1, 
n_estimators=50, gamma=0.0, objective='reg:squarederror', colsample_bytree=1.0, 
learning_rate= 0.3987519903467713, max_depth=4, reg_alpha=1, reg_lambda=10)