我正在尝试执行散列技巧,然后使用 scala 执行随机森林。我有以下代码:
val documents: RDD[Seq[String]] = sc.textFile("hdfs:///tmp/new_cromosoma12v2.csv").map(_.split(",").toSeq)
val hashingTF = new HashingTF()
val tf: RDD[Vector] = hashingTF.transform(documents)
val splits = tf.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val numClasses = 3
val categoricalFeaturesInfo = Map[Int, Int]()
val numTrees = 10
val featureSubsetStrategy = "auto"
val impurity = "gini"
val maxDepth = 8
val maxBins = 32
**val trainingData2=LabeledPoint(1.0,trainingData.collect())**
val model = RandomForest.trainClassifier(trainingData2, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)
但我有错误
找到:Array[org.apache.spark.mllib.linalg.Vector] 需要:org.apache.spark.mllib.linalg.Vector
在粗线中。
你知道我该如何解决吗?
谢谢,
莱娅