在 R 中尝试 glmnet() 时出错:“storage.mode(xd) <-“double”中的错误:不能强制“list”对象键入“double””

数据挖掘 机器学习 r glm
2022-02-19 04:08:40

我正在尝试使用 Ridge 创建逻辑回归模型,这是代码:

glmnet(X_Train, Y_Train, family='binomial', alpha=0, type.measure='auc')

这是我收到的错误消息:

Error in storage.mode(xd) <- "double" : 'list' object cannot be coerced to type 'double'

我尝试将所有变量转换为“数字”,但仍然不起作用。

我将发布这两个数据集的代码,以便您可以重现它:

图书馆:

library(dplyr)
library(fastDummies)
library(missForest)
library(glmnet)

数据:

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
crx <- read.csv(url, sep = ",", header = F)

摆脱空值:

crx[crx == "?"] <- NA
crx <- type.convert(crx, as.is=FALSE)
crx.i <- missForest(as.data.frame(crx))
crx <- crx.i$ximp

数据转换:

crx <- crx %>% 
rename(Gender = V1,
         Age = V2,
         Debt =  V3,
         Married = V4,
         BankCustomer = V5,
         EducationLevel = V6,
         Ethnicity = V7,
         YearsEmployed = V8,
         PriorDefault = V9,
         Employed = V10,
         CreditScore = V11,
         DriversLicense = V12,
         Citizen = V13,
         ZipCode = V14,
         Income = V15,
         ApprovalStatus = V16)

crx = subset(crx, select = -ZipCode)

crx <- crx %>% 
mutate(ApprovalStatus = recode(ApprovalStatus, 
                  "+" = "1", 
                  "-" = "0")) 

# Normalizing numeric variables:
crx$Age <- scale(crx$Age)
crx$Debt <- scale(crx$Debt)
crx$YearsEmployed <- scale(crx$YearsEmployed)
crx$CreditScore <- scale(crx$CreditScore)
crx$Income <- scale(crx$Income)

crx$Gender <- NULL
crx$DriversLicense <- NULL

创建虚拟变量:

df <- dummy_cols(crx, remove_selected_columns = T)

df$ApprovalStatus_0 <- NULL
df$ApprovalStatus_1 <- NULL
df$Married_l <- NULL
df$BankCustomer_gg <- NULL

df$ApprovalStatus <- crx$ApprovalStatus

创建训练数据集和测试数据集:

X <- df %>% dplyr::select(-ApprovalStatus)
Y <- df$ApprovalStatus

X_Train <- X[0:590, ]
Y_Train <- Y[0:590]

X_Test <- X[591:nrow(X), ]
Y_Test <- Y[591:length(Y)]

并尝试使用 glmnet:

glmnet(X_Train, Y_Train, family='binomial', alpha=0, type.measure='auc')

我做了一些研究,发现一篇文章说您必须将所有内容转换为数字类,所以我尝试将所有内容都转换为数字变量,如下所示:

Y_Train <- as.numeric(Y_Train)
X_Train <- as.data.frame(apply(X_Train, 2, as.numeric))

而且还是不行。我到底在做什么错?

1个回答

glmnet 需要一个矩阵作为两者的输入,Xy. 因此,您需要as.matrix()在所有模型输入上进行定义。

有关更多示例,请参见 Trevor Hastie 和 Junyang Qian 的Glmnet Vignette