基于另一列中的值的多个过滤熊猫列

数据挖掘 Python 熊猫
2022-02-03 21:44:30

我有一个熊猫数据框df1

df1

现在,我想根据来自另一个数据帧df1的唯一组合过滤行,如下所示:(Campaign, Merchant)df2

在此处输入图像描述

我尝试的是使用.isin,其代码类似于以下代码:

df1.loc[df1['Campaign'].isin(df2['Campaign']) &
        df1['Merchant'].isin(df2['Merchant'])]

这里的问题是条件是独立的,例如:我想检查(A,1)fromdf2是否在df1,但是在上述条件下,由于我正在检查所有列表,而不是逐行检查,它将返回列为ORdf1的所有行是CampaignAMerchant1

你对这个多重熊猫过滤有什么建议吗?

3个回答

有点晚了,但我首选的解决方案是

# verbetim from @tuomastik

import pandas as pd

df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6),
                "Campaign": ["A"] * 5 + ["B"],
                "Merchant": [1, 1, 1, 2, 3, 1]})

df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6),
                "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2,
                "Merchant": [1, 2, 1, 2, 1, 2]})

# modification

def pair_columns(df, col1, col2):
   return df[col1] + df[col2]

def paired_mask(df1, df2, col1, col2):
   return pair_columns(df1, col1, col2).isin(pair_columns(df2, col1, col2))

identical = df1.loc[paired_mask(df1, df2, "Campaign", "Merchant")]
import pandas as pd

df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6),
                    "Campaign": ["A"] * 5 + ["B"],
                    "Merchant": [1, 1, 1, 2, 3, 1]})

df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6),
                    "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2,
                    "Merchant": [1, 2, 1, 2, 1, 2]})

columns_consider = ["Campaign", "Merchant"]
combined = pd.concat((df1[columns_consider].drop_duplicates(),
                      df2[columns_consider].drop_duplicates()), ignore_index=True)

identical = combined[combined.duplicated()]

print(identical)

输出:

  Campaign  Merchant
4        A         1
5        A         2
6        B         1

我总是这样做的方法是创建一个查找列:

df1['lookup'] = df1['Campaign'] + "_" + df1['Merchant'].astype(str)
df2['lookup'] = df2['Campaign'] + "_" + df2['Merchant'].astype(str)

然后使用 loc 过滤并删除查找列:

df1.loc[df1['lookup'].isin(df2['lookup'])]
df1.drop(columns='lookup', inplace=True)

我仍在寻找更好的解决方案。