我认为您需要一种无监督的估算方法。那是一种不使用目标值进行估算的方法。如果您只有很少的预测特征向量,则可能很难发现数据结构。相反,您可以将您的预测与已经估算的训练特征向量混合,并使用此结构再次估算。请注意,此过程可能违反独立性假设,因此将整个过程包装在外部交叉验证中以检查严重的过拟合。
我刚刚从对这个问题的评论中了解到了missForest 。missForest 似乎可以解决问题。我在虹膜数据上模拟了你的问题。(没有外部交叉验证)
rm(list=ls())
data("iris")
set.seed(1234)
n.train = 100
train.index = sample(nrow(iris),n.train)
feature.train = as.matrix(iris[ train.index,1:4])
feature.test = as.matrix(iris[-train.index,1:4])
#simulate 40 NAs in train
n.NAs = 40
NA.index = sample(length(feature.train),n.NAs)
NA.feature.train = feature.train; NA.feature.train[NA.index] = NA
#imputing 40 NAs unsupervised
library(missForest)
imp.feature.train = missForest(NA.feature.train)$ximp
#check how well imputation went, seems promsing for this data set
plot( feature.train[NA.index],xlab="true value",
imp.feature.train[NA.index],ylab="imp value",)
#simulate random NAs in feature test
feature.test[sample(length(feature.test),20)] = NA
#mix feature.test with imp.feature.train
nrow.test = nrow(feature.test)
mix.feature = rbind(feature.test,imp.feature.train)
imp.feature.test = missForest(mix.feature)$ximp[1:nrow.test,]
#train RF and predict
library(randomForest)
rf = randomForest(imp.feature.train,iris$Species[train.index])
pred.test = predict(rf,imp.feature.test)
table(pred.test, iris$Species[-train.index])
Printing...
-----------------
pred.test setosa versicolor virginica
setosa 12 0 0
versicolor 0 20 2
virginica 0 1 15