数据挖掘 - 如何在熊猫中对按两列分组的值求和 - 吾爱随笔录

如何在熊猫中对按两列分组的值求和

数据挖掘 Python 熊猫数据框

2021-10-11 22:30:56

我有一个像这样的 Pandas DataFrame：

df = pd.DataFrame({
    'Date': ['2017-1-1', '2017-1-1', '2017-1-2', '2017-1-2', '2017-1-3'],
    'Groups': ['one', 'one', 'one', 'two', 'two'],
    'data': range(1, 6)})

    Date      Groups     data  
0  2017-1-1    one       1
1  2017-1-1    one       2
2  2017-1-2    one       3
3  2017-1-2    two       4
4  2017-1-3    two       5

我怎样才能像这样生成一个新的DataFrame：

    Date       one     two 
0  2017-1-1    3        0
1  2017-1-2    3        4
2  2017-1-3    0        5

3个回答

pivot_table为此制作的：

df.pivot_table(index='Date',columns='Groups',aggfunc=sum)

结果是

         data
Groups    one  two
Date
2017-1-1  3.0  NaN
2017-1-2  3.0  4.0
2017-1-3  NaN  5.0

就我个人而言，我发现这种方法更容易理解，而且肯定比复杂的 groupby 操作更 Pythonic。然后，如果您想要指定格式，您可以整理一下：

df.fillna(0,inplace=True)
df.columns = df.columns.droplevel()
df.columns.name = None
df.reset_index(inplace=True)

这给了你

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0

熊猫黑魔法：

df = df.groupby(['Date', 'Groups']).sum().sum(
    level=['Date', 'Groups']).unstack('Groups').fillna(0).reset_index()

# Fix the column names
df.columns = ['Date', 'one', 'two']

结果df：

       Date  one  two
0  2017-1-1  3.0  0.0
1  2017-1-2  3.0  4.0
2  2017-1-3  0.0  5.0

@tuomastik 的答案的一个（可能更习惯用语）替代方案：

df.groupby(['Date', 'Groups']).sum().unstack('Groups', fill_value=0).reset_index()

其它你可能感兴趣的问题

上一篇AlphaGo的策略网络和价值网络的区别下一篇如何为决策树中的连续变量选择分裂点？