如果我的数据框中的某些观察值包含 Python 中的目标词,如何向数据框添加一列?

数据挖掘 Python 数据挖掘 数据集
2022-02-25 06:32:58

这是我的数据框的样子:

Number  Age      Famous_for                            
1       35       "businessman chairman of IBM (1973–1981)"
2       42       "musician (House of Freaks Gutterball)"
3       87       "baseball player (Oakland Athletics)"

我想创建一个额外的列,该列将包含一个虚拟变量,无论人是否参与娱乐业务。像这样的东西:

Number..Age..................Famous_for.....................Entertaining                            
1.......35.......businessman chairman of IBM (1973–1981)........0              
2.......42.......musician (House of Freaks Gutterball)..........1                
3.......87.......baseball player (Oakland Athletics)............0       

如何根据名列中的某些词(例如“音乐家”、“俱乐部”、“演员”等)创建一个列?我尝试了以下方法:

df['entertaining'] = np.where(df['famous_for']>="musician", 1, 0)

但这行不通。我怎样才能在 Python 中做到这一点?

1个回答

您可以在 DataFrame 上使用 str.contains:

df = pd.DataFrame({'Age' : pd.Series([35,42,87], index=[1,2,3]),
        'Famous_for': pd.Series(['businessman chairman of IBM (1973–1981)',
                                 'musician (House of Freaks Gutterball)',
                                 'baseball player (Oakland Athletics)'], 
        index=[1,2,3])
        })

df['entertaining'] = df['Famous_for'].str.contains('musician')
print (df)

   Age                               Famous_for entertaining
1   35  businessman chairman of IBM (1973–1981)        False
2   42    musician (House of Freaks Gutterball)         True
3   87      baseball player (Oakland Athletics)        False

请注意 str.contains 也接受正则表达式,因此如果您希望搜索可能使用的各种单词

df['entertaining'] = df['Famous_for'].str.contains('musician|businessman')