如何生成这样的输出?

数据挖掘 Python 数据科学模型
2022-03-06 03:53:56

在此处输入图像描述

你能解释一下吗?

2个回答

使用@JahKnows 提供的示例数据框:

import pandas as pd

df = pd.DataFrame(data={'shopify_customer_id': [1,1,2,3,4,5], 
                        'financial_status': ['paid', 'refunded', 'paid', 'refunded','paid', 'paid'],
                        'count': [8, 1, 13, 1, 1, 1], 
                        'order_amt': [19, 19, 0, 19, 0, 24]})   

您还可以执行以下操作:

df=df.groupby(['shopify_customer_id','financial_status']).sum().unstack()
df.columns = ['_'.join(col).strip() for col in df.columns.values]
df.reset_index(inplace=True)
df.fillna(0,inplace=True)

这给了你这个(如果这是你想要的!):

   shopify_customer_id  count_paid  count_refunded  order_amt_paid  order_amt_refunded
0                    1         8.0             1.0            19.0                19.0
1                    2        13.0             0.0             0.0                 0.0
2                    3         0.0             1.0             0.0                19.0
3                    4         1.0             0.0             0.0                 0.0
4                    5         1.0             0.0            24.0                 0.0

数据框可以定义如下

import pandas as pd

df = pd.DataFrame(data={'shopify_customer_id': [1,1,2,3,4,5], 
                        'financial_status': ['paid', 'refunded', 'paid', 'refunded', 'paid', 'paid'],
                        'count': [8, 1, 13, 1, 1, 1], 
                        'order_amt': [19, 19, 0, 19, 0, 24]})

这给出了您上面描述的表格。现在我们将执行转换。我们将使用字典来跟踪数据库中的所有客户。下面代码中的第一个 if 语句将添加尚未在列表中的客户。然后我们将检查交易是“销售”还是“退款”。我们将相应地更新给定客户的字典。

shopify_customers = {}
for i in df.iterrows():
    row = i[1]
    if row['shopify_customer_id'] not in shopify_customers:
        shopify_customers.update({row['shopify_customer_id']: 
                                  {'shopify_customer_id': row['shopify_customer_id'],
                                   'order_count_paid': 0,
                                   'order_count_refunded': 0,
                                   'order_amt_paid': 0,
                                   'order_amt_refunded': 0}})

    if row['financial_status'] == 'paid':
        shopify_customers[row['shopify_customer_id']]['order_count_paid'] += row['count']
        shopify_customers[row['shopify_customer_id']]['order_amt_paid'] += row['order_amt']

    elif row['financial_status'] == 'refunded':
        shopify_customers[row['shopify_customer_id']]['order_count_refunded'] += row['count']
        shopify_customers[row['shopify_customer_id']]['order_amt_refunded'] += row['order_amt']

最后我们将数据放入 pandas DataFrame

pd.DataFrame(data = [shopify_customers[i] for i in shopify_customers])