机器算法验证 - Google 的关联中使用了什么方法？ - 吾爱随笔录

这是最近的 Google 相关查询：
http ://www.google.com/trends/correlate/search?e=internet+usage&t=weekly#

正如您在该链接的搜索框中看到的那样，我输入了“互联网使用情况”，其余的由 Google 完成。它显示值 0.9298 作为与查询“数据挖掘”的“相关性”。但是，当我阅读Google 白皮书 [PDF] 的第 2 页时，它说：

The objective of Google Correlate is to surface the queries in
the database whose spatial or temporal pattern is most highly correlated
with a target pattern. Google Correlate employs a novel approximate nearest
neighbor (ANN) algorithm over millions of candidate queries in an online
search tree to produce results similar to the batch-based approach employed
by Google Flu Trends but in a fraction of a second. For additional details,
please see the Methods section below....

所以，我的问题是：
谷歌是使用正常的 Pearson 或 Spearman 相关性来查找这些东西，还是他们使用其他东西？如果是这样，你能解释一下一般技术吗？

===================

此外，请注意图中“互联网使用”（和“数据挖掘”）的搜索量在夏季月份下降，并且在圣诞节前后下降。我猜孩子和他们的家庭作业与此有关。