如何生成这样的输出?
数据挖掘
Python
数据科学模型
2022-03-06 03:53:56
2个回答
使用@JahKnows 提供的示例数据框:
import pandas as pd
df = pd.DataFrame(data={'shopify_customer_id': [1,1,2,3,4,5],
'financial_status': ['paid', 'refunded', 'paid', 'refunded','paid', 'paid'],
'count': [8, 1, 13, 1, 1, 1],
'order_amt': [19, 19, 0, 19, 0, 24]})
您还可以执行以下操作:
df=df.groupby(['shopify_customer_id','financial_status']).sum().unstack()
df.columns = ['_'.join(col).strip() for col in df.columns.values]
df.reset_index(inplace=True)
df.fillna(0,inplace=True)
这给了你这个(如果这是你想要的!):
shopify_customer_id count_paid count_refunded order_amt_paid order_amt_refunded
0 1 8.0 1.0 19.0 19.0
1 2 13.0 0.0 0.0 0.0
2 3 0.0 1.0 0.0 19.0
3 4 1.0 0.0 0.0 0.0
4 5 1.0 0.0 24.0 0.0
数据框可以定义如下
import pandas as pd
df = pd.DataFrame(data={'shopify_customer_id': [1,1,2,3,4,5],
'financial_status': ['paid', 'refunded', 'paid', 'refunded', 'paid', 'paid'],
'count': [8, 1, 13, 1, 1, 1],
'order_amt': [19, 19, 0, 19, 0, 24]})
这给出了您上面描述的表格。现在我们将执行转换。我们将使用字典来跟踪数据库中的所有客户。下面代码中的第一个 if 语句将添加尚未在列表中的客户。然后我们将检查交易是“销售”还是“退款”。我们将相应地更新给定客户的字典。
shopify_customers = {}
for i in df.iterrows():
row = i[1]
if row['shopify_customer_id'] not in shopify_customers:
shopify_customers.update({row['shopify_customer_id']:
{'shopify_customer_id': row['shopify_customer_id'],
'order_count_paid': 0,
'order_count_refunded': 0,
'order_amt_paid': 0,
'order_amt_refunded': 0}})
if row['financial_status'] == 'paid':
shopify_customers[row['shopify_customer_id']]['order_count_paid'] += row['count']
shopify_customers[row['shopify_customer_id']]['order_amt_paid'] += row['order_amt']
elif row['financial_status'] == 'refunded':
shopify_customers[row['shopify_customer_id']]['order_count_refunded'] += row['count']
shopify_customers[row['shopify_customer_id']]['order_amt_refunded'] += row['order_amt']
最后我们将数据放入 pandas DataFrame
pd.DataFrame(data = [shopify_customers[i] for i in shopify_customers])
其它你可能感兴趣的问题
