如何使用另一列替换一列的部分字符串值

数据挖掘 Python 熊猫 数据清理
2022-02-18 13:27:33

如何使用另一列替换一列的部分字符串值。

我的数据集是:

ID          Product Name                            Size ID    Size Name
1   24 Mantra Ancient Grains Foxtail Millet 500 gm      1       500 gm
2   24 Mantra Ancient Grains Little Millet 500 gm       2       500 gm
3   24 Mantra Naturals Almonds 100 gm                   3       100 gm
4   24 Mantra Naturals Kismis 100 gm                    4       100 gm
5   24 Mantra Organic Ajwain 100 gm                     5       100 gm
6   24 Mantra Organic Apple Blast Drink 250 ml          6       250 ml
7   24 Mantra Organic Apple Juice 1 Ltr Tetra Pack      7       1000 ml
8   24 Mantra Organic Apple Juice 200 ml                8       200 ml
9   24 Mantra Organic Assam Tea 100 gm                  9       100 gm

这里的要求是Product Name列值是24 Mantra Ancient Grains Foxtail Millet 500 gm并且Size Name列有500 Gm在这种情况下,我的输出将是24 Mantra Ancient Grains Foxtail Millet. 如果Size Name包含在Product Name字符串中删除大小名称字忽略大小写,否则无需采取任何行动。

3个回答
data['Product Name'] = data['Product Name'].str.replace('\d+','')

如果那是您正在寻找的,这应该摆脱数字。我不确定你所说的“chomped”是什么意思。

这应该会有所帮助。

import pandas as pd
Product_Name = ["24 Mantra Ancient Grains Foxtail Millet 500 gm",
                "24 Mantra Ancient Grains Little Millet 500 gm",
                "24 Mantra Naturals Almonds 100 gm",
                "24 Mantra Naturals Kismis 100 gm",
                "24 Mantra Organic Ajwain 100 gm"]

Size_Name = ["500 gm", "500 gm", "100 gm", "100 gm", "100 gm"]

data = pd.DataFrame(
        {'Product_Name': Product_Name,
         'Size_Name': Size_Name 
        })

# Remove characters from one column based on string of another column
data['Product_Name'] = data['Product_Name'].replace(data['Size_Name'],'', regex = True)

试试这个

data['Product Name'] = data['Product Name'].apply(lambda x: re.sub(data.loc[data['Product Name'] == x, 'Size Name'].values[0], '', x))