Wikipedia 在描述特征散列时提供了以下示例;但映射似乎与定义的字典不一致
例如,to
应该3
根据字典转换为,但它被编码为1
代替。
描述有错误吗?特征哈希是如何工作的?
文本:
John likes to watch movies. Mary likes too. John also likes to watch football games.
可以转换,使用字典
{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10}
到矩阵
[[1 2 1 1 1 0 0 0 1 1] [1 1 1 1 0 1 1 1 0 0]]