可视化经常一起购买的物品
数据挖掘
Python
r
数据挖掘
可视化
关联规则
2021-09-17 07:10:24
4个回答
我认为您可能想要的是热图的离散版本。例如,见下文。红色表示最常一起购买,而绿色单元格从不一起购买。
这实际上很容易与 Pandas DataFrames 和 matplotlib 放在一起。
import numpy as np
from pandas import DataFrame
import matplotlib
matplotlib.use('agg') # Write figure to disk instead of displaying (for Windows Subsystem for Linux)
import matplotlib.pyplot as plt
####
# Get data into a data frame
####
data = [
['Banana', 'Water', 'Rice'],
['Rice', 'Water'],
['Bread', 'Banana', 'Juice'],
]
# Convert the input into a 2D dictionary
freqMap = {}
for line in data:
for item in line:
if not item in freqMap:
freqMap[item] = {}
for other_item in line:
if not other_item in freqMap:
freqMap[other_item] = {}
freqMap[item][other_item] = freqMap[item].get(other_item, 0) + 1
freqMap[other_item][item] = freqMap[other_item].get(item, 0) + 1
df = DataFrame(freqMap).T.fillna(0)
print (df)
#####
# Create the plot
#####
plt.pcolormesh(df, edgecolors='black')
plt.yticks(np.arange(0.5, len(df.index), 1), df.index)
plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)
plt.savefig('plot.png')
data = {{"Banana", "Water", "Rice"},
{"Rice", "Water"},
{"Bread", "Banana", "Juice"}};
获得成对计数。
counts = Sort /@ Flatten[Subsets[#, {2}] & /@ data, 1] // Tally
{{{"Banana", "Water"}, 1}, {{"Banana", "Rice"}, 1}, {{"Rice", "Water"}, 2}, {{"Banana", "Bread"}, 1}, {{"Bread", "Juice"}, 1}, {{"Banana", "Juice"}, 1}}
获取命名刻度的索引。
indices = Thread[# -> Range[Length@#]] &@Sort@DeleteDuplicates@Flatten[data]
{"Banana" -> 1, "Bread" -> 2, "Juice" -> 3, "Rice" -> 4, "Water" -> 5}
MatrixPlot
使用绘图SparseArray
。也可以使用ArrayPlot
。
MatrixPlot[
SparseArray[Rule @@@ counts /. indices, ConstantArray[Length@indices, 2]],
FrameTicks -> With[{t = {#2, #1} & @@@ indices}, {{t, None}, {t, None}}],
PlotLegends -> Automatic
]
请注意,它是上三角形。
希望这可以帮助。
您可以在 python 中使用 seaborn 可视化库(构建在 matplotlib 之上)来执行此操作。
data = [
['Banana', 'Water', 'Rice'],
['Rice', 'Water'],
['Bread', 'Banana', 'Juice'],
]
# Pull out combinations
from itertools import combinations
data_pairs = []
for d in data:
data_pairs += [list(sorted(x)) + [1] for x in combinations(d, 2)]
# Add reverse as well (this will mirror the heatmap)
data_pairs += [list(sorted(x))[::-1] + [1] for x in combinations(d, 2)]
# Shape into dataframe
import pandas as pd
df = pd.DataFrame(data_pairs)
df_zeros = pd.DataFrame([list(x) + [0] for x in combinations(df[[0, 1]].values.flatten(), 2)])
df = pd.concat((df, df_zeros))
df = df.groupby([0, 1])[2].sum().reset_index().pivot(0, 1, 2).fillna(0)
import seaborn as sns
from matplotlib.pyplot import plt
sns.heatmap(df, cmap='YlGnBu')
plt.show()
最终的数据框df
如下所示:
结果可视化是:
其它你可能感兴趣的问题