数据挖掘 - 无法使用 seaborn 生成误差线 - 吾爱随笔录

无法使用 seaborn 生成误差线

数据挖掘 Python 图表海运

2022-02-18 13:05:05

你能帮我在我的图表中添加误差线吗？

这是csv：

run,testcase,algorithm,group,avg_weightedcost,std_weight
1,1,QI,0,20007037.36,0
2,1,Q2,0,60000000,3.76E-09
3,1,Q4,0,181801581.2,13353630.74
4,1,Q3,0,585605657.3,54852458.59
6,1,QI,1,10003518.68,0
7,1,Q2,1,292802828.7,2.00E+01
8,1,Q4,1,90900790.6,13353630.74
9,1,Q3,1,292802828.7,27426229.3

这是生成的代码和图表：

g = sns.barplot(y="algorithm", x="avg_weightedcost", hue="group", 
capsize=.2, data=df)

这就是我尝试添加错误栏的方式：

g = sns.barplot(y="algorithm", x="avg_weightedcost", hue="group", 
xerr="std_weight", capsize=.2, data=df)

这是错误：

ValueError: err must be [ scalar | N, Nx1 or 2xN array-like ]

修改的：

.csv 文件：

run,testcase,algorithm,group,avg_weightedcost,std_weight,avg,err
1,1,QI,0.00,20007037.36,0.00,100.00,5.00
2,1,Q2,0.00,60000000.00,0.00,50.00,20.00
3,1,Q4,0.00,181801581.20,13353630.74,50.00,10.00
4,1,Q3,0.00,585605657.30,54852458.59,20.00,1.00
6,1,QI,1.00,10003518.68,0.00,20.00,20.00
7,1,Q2,1.00,292802828.65,20.00,30.00,10.00
8,1,Q4,1.00,90900790.60,13353630.74,10.00,10.00
9,1,Q3,1.00,292802828.65,27426229.30,50.00,20.00

代码：

g = sns.barplot(x=data2['avg_weightedcost'], y=data2['algorithm'], 
hue=data2['group']) 

g.errorbar(x=data2['avg_weightedcost'], y=data2['algorithm'], 
xerr=data2['std_weight'], ecolor='red', linewidth=0, capsize=15)

错误：

ValueError: could not convert string to float: 'Q3'

4个回答

试试这个：

g = sns.barplot(y="algorithm", x="avg_weightedcost", hue="group", 
    xerr=df[std_weight]*1, capsize=.2, data=df)

希望能帮助到你

当我试图找到一种方法将我自己预先计算的自定义误差线（标准偏差）添加到具有分组值的 Seaborn 条形图中时，我发现了这个问题。我终于设法找到了一种解决方法......这个想法是复制从具有所需均值和标准偏差的正态分布中绘制的观察结果。然后，sns.barplot 中内置的“ci”选项会完成剩下的工作。不是最干净的解决方案......但它可以解决问题（至少对于小型数据集）。一个例子：

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

dfBarPlot = pd.read_csv('my_precalculated_values.csv')
#duplicate observations to get good std bars
dfCopy = dfBarPlot.copy()
duplicates = 30 # increase this number to increase precision
for index, row in dfBarPlot.iterrows():
    for times in range(duplicates):
        new_row = row.copy()
        new_row['Y'] = np.random.normal(row['Y'],row['precomputed_std']) 
        dfCopy = dfCopy.append(new_row, ignore_index=True)

# Now Seaborn does the rest
sns.set_style("whitegrid")
fig = sns.barplot(x='X',
                  y='Y',
                  hue='Cases',
                  ci='sd',
                  data=dfCopy)

plt.legend(loc='upper right')
sns.set(rc={'figure.figsize':(8,5)})
plt.show()

更新 2020-01-15

barplot_err 函数现在作为hhpy包的一部分提供

免责声明：我是hhpy的创造者

import hhpy.plotting as hpt
import pandas as pd

df = pd.read_csv('test.csv')

fig,ax = plt.subplots(figsize=(9,9),nrows=2)
hpt.barplot_err(y="algorithm", x="avg_weightedcost", xerr="std_weight", hue="group", 
capsize=.2, data=df, ax=ax[0])
barplot_err(x="algorithm", y="avg_weightedcost", yerr="std_weight", hue="group", 
capsize=.2, data=df, ax=ax[1])
plt.show()

这个答案基于 1gnaci0 7's。但是您不需要通过从正态分布中绘制来创建 30 个重复项。如果您的唯一目标是显示误差线，那么您只需要 3 个重复项 ( x-xerr, x, x+xerr) [分别对应于 y]，您就可以开始了（而且大多数误差线都是准确的！）。

所以我把它变成了一个适用于 x 和 y 方向的函数，并且“仅”将初始数据帧的大小增加三倍。

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def barplot_err(x, y, xerr=None, yerr=None, data=None, **kwargs):

    _data = []
    for _i in data.index:

        _data_i = pd.concat([data.loc[_i:_i]]*3, ignore_index=True, sort=False)
        _row = data.loc[_i]
        if xerr is not None:
            _data_i[x] = [_row[x]-_row[xerr], _row[x], _row[x]+_row[xerr]]
        if yerr is not None:
            _data_i[y] = [_row[y]-_row[yerr], _row[y], _row[y]+_row[yerr]]
        _data.append(_data_i)

    _data = pd.concat(_data, ignore_index=True, sort=False)

    _ax = sns.barplot(x=x,y=y,data=_data,ci='sd',**kwargs)

    return _ax

df = pd.read_csv('test.csv')

fig,ax = plt.subplots(figsize=(9,9),nrows=2)
barplot_err(y="algorithm", x="avg_weightedcost", xerr="std_weight", hue="group", 
capsize=.2, data=df, ax=ax[0])
barplot_err(x="algorithm", y="avg_weightedcost", yerr="std_weight", hue="group", 
capsize=.2, data=df, ax=ax[1])
plt.show()

与@1gnaci0 7 相同的想法，但复制行的速度更快：

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

duplicates=1000
dfBarPlot = pd.read_csv('my_precalculated_values.csv')
#duplicate observations to get good std bars
dfCopy = dfBarPlot.loc[dfBarPlot.index.repeat(duplicates)].copy()
dfCopy['Y'] = np.random.normal(dfCopy['Y'].values,dfCopy['precomputed_std'].values)

fig = sns.barplot(x='X', y='Y', hue='Cases', ci='sd', data=dfCopy)

其它你可能感兴趣的问题

上一篇二进制分类下一篇Keras - 计算两个 3D 张量的余弦相似度矩阵