我试图在两个数据帧中找到相同的出现次数这是我上一个问题的后续问题
我有 2 个数据帧
df1=pd.DataFrame([[1,None],[1,None,],[1,None],[1,'item_a'],[2,'item_a'],[2,'item_b'],[2,'item_f'],[3,'item_e'],[3,'item_e'],[3,'item_g'],[3,'item_h']],columns=['id','A'])
df2=pd.DataFrame([[1,'item_a'],[1,'item_b'],[1,'item_c'],[1,'item_d'],[2,'item_a'],[2,'item_b'],[2,'item_c'],[2,'item_d'],[3,'item_e'],[3,'item_f'],[3,'item_g'],[3,'item_h']],columns=['id','A'])
df1
id A
0 1 None
1 1 None
2 1 None
3 1 item_a # id 1 has 1 occurrences in total in df1
4 2 item_a
5 2 item_b
6 2 item_f #id 2 has 3 occurrences in total in df1(id 2 has 3 occurrences here)
7 3 item_e
8 3 item_e
9 3 item_g
10 3 item_h #id3 has 4 ccurrences in total in df1
df2
id A
0 1 item_a
1 1 item_b
2 1 item_c
3 1 item_d
4 2 item_a
5 2 item_b
6 2 item_c
7 2 item_d
8 3 item_e
9 3 item_f
10 3 item_g
11 3 item_h
我得到了关于如何通过使用找到相似之处的答案
previous result:
d=pd.merge(df1,df2,how='inner')
id A
3 1 item_a # id 1 has 1 occurrences in total in d
4 2 item_a
5 2 item_b # id 2 has 2 occurrences in total in d(id 2 has 2 occurrences here which does not match all the occurrences(3) in df1)
7 3 item_e
8 3 item_e
9 3 item_g
10 3 item_h #id 3 has 4 occurrences in total in d
我试图在两个数据框中找到相同数量的出现:
d[d['id'].value_counts()==df1['id'].value_counts()]
Which gave me an error:Can only compare identically-labeled Series objects
我还尝试了不同的方法,使用 rename 为 value_counts 放置列名并合并它们但失败了。
匹配:df1 中出现的计数,结果数据帧 d 中出现的 id 匹配计数
cnt_in_df1|cntin_d
for id1: 1 | 1 count #match => id 1 should be in the desired output.
for id2: 3 | 2 count #mismatch=> id 2 should not be in the desired output
for id3: 4 | 4 count #match => id 3 should be in the desired output.
My desired output for this question:
id count
0 1 1
1 3 4