数据挖掘 - python中的最佳频繁项集包 - 吾爱随笔录

python中的最佳频繁项集包

数据挖掘 Python 关联规则

2021-10-02 03:59:53

谁能推荐一个好的python频繁项集包？我只需要找到频繁项集，不需要找到关联规则。谢谢！

3个回答

我还推荐MLXtend库用于频繁项集。

用法示例：

dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()

te_ary = te.fit(dataset).transform(dataset)

df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

print frequent_itemsets

Orange3-Associate包提供frequent_itemsets()基于FP-growth算法的功能。

MLXtend 库对我来说真的很有用。在其文档中有一个输出频繁项集的 Apriori 实现。

请查看 http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/中提供的第一个示例。

其它你可能感兴趣的问题

上一篇在 DataFrame 中查找连续的零并进行条件替换下一篇为 xor 函数创建神经网络