替换熊猫中的列值

数据挖掘 Python 熊猫 数据框
2022-02-27 12:55:08

我有一个包含三列的数据框,如下所示。数据框中大约有 10,000 个条目,并且也有重复项。

Hospital_ID   District_ID  Employee
Hospital 1    District 19   5 
Hospital 1    District 19   10
Hospital 1    District 19   6
Hospital 2    District 10   50
Hospital 2    District 10   51

现在我想删除重复项,但我想用它们的平均值替换原始数据框中的值,使其看起来像这样:

Hospital 1    District 19   7.0000
Hospital 2    District 10   50.5000
2个回答

正如Emre已经提到的,您可以使用groupby功能。之后,您应该应用reset_index将 MultiIndex移动到列:

import pandas as pd

df = pd.DataFrame( [ ['Hospital 1', 'District 19', 5],
                     ['Hospital 1', 'District 19', 10],
                     ['Hospital 1', 'District 19', 6],
                     ['Hospital 2', 'District 10', 50],
                     ['Hospital 2', 'District 10', 51]], columns = ['Hospital_ID', 'District_ID', 'Employee'] )

df = df.groupby( ['Hospital_ID', 'District_ID'] ).mean()

这给了你:

  Hospital_ID  District_ID  Employee
0  Hospital 1  District 19       7.0
1  Hospital 2  District 10      50.5

你想做的叫做聚合重复数据删除或重复删除是另一回事。我认为代码不言自明:

df.groupby(['Hospital_ID', 'District_ID']).mean()