这是最近的 Google 相关查询:
http ://www.google.com/trends/correlate/search?e=internet+usage&t=weekly#
正如您在该链接的搜索框中看到的那样,我输入了“互联网使用情况”,其余的由 Google 完成。它显示值 0.9298 作为与查询“数据挖掘”的“相关性”。但是,当我阅读Google 白皮书 [PDF] 的第 2 页时,它说:
The objective of Google Correlate is to surface the queries in the database whose spatial or temporal pattern is most highly correlated with a target pattern. Google Correlate employs a novel approximate nearest neighbor (ANN) algorithm over millions of candidate queries in an online search tree to produce results similar to the batch-based approach employed by Google Flu Trends but in a fraction of a second. For additional details, please see the Methods section below....
所以,我的问题是:
谷歌是使用正常的 Pearson 或 Spearman 相关性来查找这些东西,还是他们使用其他东西?如果是这样,你能解释一下一般技术吗?
===================
此外,请注意图中“互联网使用”(和“数据挖掘”)的搜索量在夏季月份下降,并且在圣诞节前后下降。我猜孩子和他们的家庭作业与此有关。