数据挖掘 - 处理具有 2 行数据的行 - 吾爱随笔录

处理具有 2 行数据的行

数据挖掘 Python 熊猫缺失数据

2022-01-19 19:27:24

我的数据框如下所示：

有一些行（例如：297），其中“价格”列有两个值（插头和夸脱），我用前一行填充了 Nans，因为它属于相同的拉丁名称。但是，我正在考虑将价格列进一步拆分为名称为“Quarts”和“Plugs”的两列并填写金额，如果没有找到 Plugs，则为 0，并且与 Quarts 相同。

例子：

Plugs | Quarts
0 | 2
2 | 3
4 | 0

有人可以帮我弄这个吗？

1个回答

下面的代码应该达到预期的结果：

import pandas as pd
import numpy as np
import re

df2 = pd.DataFrame([[1, 'plugs: $3.00'], 
                    [4, np.NaN], 
                    [7, 'quarts: $3.00']], 
                    columns=['name', 'price'])

df2

    name          price
 0     1   plugs: $3.00
 1     4            NaN
 2     7  quarts: $3.00

def price(x):
    rprice = re.search('(plugs:|quarts:)\s*\$([\d\.]*)', x)
    if rprice == None:
        return ('','0')
    else:
        return rprice.groups()

df2.fillna("", inplace=True)
df2['price'].map(lambda x: price(x))
df2['Plugs'] = df2['price'].map(lambda  x: float(price(x)[1]) if price(x)[0] == 'plugs:' else 0)
df2['Quart'] = df2['price'].map(lambda  x: float(price(x)[1]) if price(x)[0] == 'quarts:' else 0)

df2（下面有新列）

   name          price  Plugs  Quart
0     1   plugs: $3.00    3.0    0.0
1     4                   0.0    0.0
2     7  quarts: $3.00    0.0    3.0

一些注意事项：我使用正则表达式来提取类型和成本，并将 NA 替换为空白文本。

其它你可能感兴趣的问题

上一篇如何在 pandas 数据框中的列内应用 .replace() 来清理数据下一篇如何在 Sklearn 中使用 Pipeline 时应用数据集平衡技术？