我收集了 2012 年至 2018 年的大量人口普查数据。我想应用一些聚类算法来比较大都会统计区 (MSA)。理想情况下,一旦我运行了聚类算法,我想看看哪个 MSA 可以与另一个相媲美。
我选择管理集群的功能如下:
'Bachelors+',
'Estimate Total $10,000 to $14,999',
'Estimate Total $100,000 to $124,999',
'Estimate Total $125,000 to $149,999',
'Estimate Total $15,000 to $19,999',
'Estimate Total $150,000 to $199,999',
'Estimate Total $20,000 to $24,999',
'Estimate Total $200,000 or more',
'Estimate Total $25,000 to $29,999',
'Estimate Total $30,000 to $34,999',
'Estimate Total $75,000 to $99,999',
'Median Age',
'Median Gross rent as % of household inc',
'Number of educational and health service workers',
'Number of finance and real estate workers',
'Number of people in management, business, science, and arts',
'Number of service workers',
'Number of tech workers',
'Pct Asian',
'Pct Black',
'Pct Other Race',
'Pct White',
'Total Population',
'Total Population over 25'
现在我的一个问题是,我拥有的数据是 2012 年至 2018 年美国每个 MSA 的区域级别的数据。我是否首先需要聚合数据,以便通过其关联的 MSA 获得上述特征,然后进行聚类算法从那里?
从那里我如何按集群识别 MSA?