在Spark中,有一种RowMatrix.columnSimilarities()返回“该矩阵的列之间余弦相似度的 nxn 稀疏上三角矩阵“。
我应该怎么读?如果我尝试从https://stackoverflow.com/a/1750187实现一个示例,如下所示:
JavaRDD<Vector> rows = sc.parallelize(Arrays.asList(
new DenseVector(new double[]{2, 1, 0, 2, 0, 1, 1, 1}),
new DenseVector(new double[]{2, 1, 1, 1, 1, 0, 1, 1})
));
RowMatrix mat = new RowMatrix(rows.rdd());
List<Vector> sims = mat.columnSimilarities().toRowMatrix().rows().toJavaRDD().collect();
for(Vector v: sims) {
System.out.println(v);
}
我明白了
(8,[6,7],[0.7071067811865475,0.7071067811865475])
(8,[1,2,3,4,5,6,7],[0.9999999999999998,0.7071067811865475,0.9486832980505137,0.7071067811865475,0.7071067811865475,0.9999999999999998,0.9999999999999998])
(8,[2,3,4,5,6,7],[0.7071067811865475,0.9486832980505137,0.7071067811865475,0.7071067811865475,0.9999999999999998,0.9999999999999998])
(8,[7],[0.9999999999999998])
(8,[4,5,6,7],[0.4472135954999579,0.8944271909999159,0.9486832980505137,0.9486832980505137])
(8,[6,7],[0.7071067811865475,0.7071067811865475])
(8,[3,4,6,7],[0.4472135954999579,1.0,0.7071067811865475,0.7071067811865475])
我应该如何解释它?如引用的 StackOverflow 帖子中所述,如何从中获得余弦角 0.822?
谢谢!