我正在尝试使用 Ridge 创建逻辑回归模型,这是代码:
glmnet(X_Train, Y_Train, family='binomial', alpha=0, type.measure='auc')
这是我收到的错误消息:
Error in storage.mode(xd) <- "double" : 'list' object cannot be coerced to type 'double'
我尝试将所有变量转换为“数字”,但仍然不起作用。
我将发布这两个数据集的代码,以便您可以重现它:
图书馆:
library(dplyr)
library(fastDummies)
library(missForest)
library(glmnet)
数据:
url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
crx <- read.csv(url, sep = ",", header = F)
摆脱空值:
crx[crx == "?"] <- NA
crx <- type.convert(crx, as.is=FALSE)
crx.i <- missForest(as.data.frame(crx))
crx <- crx.i$ximp
数据转换:
crx <- crx %>%
rename(Gender = V1,
Age = V2,
Debt = V3,
Married = V4,
BankCustomer = V5,
EducationLevel = V6,
Ethnicity = V7,
YearsEmployed = V8,
PriorDefault = V9,
Employed = V10,
CreditScore = V11,
DriversLicense = V12,
Citizen = V13,
ZipCode = V14,
Income = V15,
ApprovalStatus = V16)
crx = subset(crx, select = -ZipCode)
crx <- crx %>%
mutate(ApprovalStatus = recode(ApprovalStatus,
"+" = "1",
"-" = "0"))
# Normalizing numeric variables:
crx$Age <- scale(crx$Age)
crx$Debt <- scale(crx$Debt)
crx$YearsEmployed <- scale(crx$YearsEmployed)
crx$CreditScore <- scale(crx$CreditScore)
crx$Income <- scale(crx$Income)
crx$Gender <- NULL
crx$DriversLicense <- NULL
创建虚拟变量:
df <- dummy_cols(crx, remove_selected_columns = T)
df$ApprovalStatus_0 <- NULL
df$ApprovalStatus_1 <- NULL
df$Married_l <- NULL
df$BankCustomer_gg <- NULL
df$ApprovalStatus <- crx$ApprovalStatus
创建训练数据集和测试数据集:
X <- df %>% dplyr::select(-ApprovalStatus)
Y <- df$ApprovalStatus
X_Train <- X[0:590, ]
Y_Train <- Y[0:590]
X_Test <- X[591:nrow(X), ]
Y_Test <- Y[591:length(Y)]
并尝试使用 glmnet:
glmnet(X_Train, Y_Train, family='binomial', alpha=0, type.measure='auc')
我做了一些研究,发现一篇文章说您必须将所有内容转换为数字类,所以我尝试将所有内容都转换为数字变量,如下所示:
Y_Train <- as.numeric(Y_Train)
X_Train <- as.data.frame(apply(X_Train, 2, as.numeric))
而且还是不行。我到底在做什么错?