数据挖掘 - 如何在 pandas 数据框中的列内应用 .replace() 来清理数据 - 吾爱随笔录

我有一个 pandas 数据框，我想在其中将所有非数字值替换为“”

我在代码中遇到的错误如下：

ValueError: could not convert string to float: '690,276.00'

因为我试图将所有值转换为浮点数，所以我可以对它们进行内部操作。

我的部分清理数据框代码如下所示：

    # Cleaning:              
df_clean = df_read[~(df_read['Ratio of Similarity (Gray)'] <= .2)]
print(df_clean, 'clean 1: Eliminate Ratio of similarity less than 0.2')
df_clean_2 = df_clean.dropna(subset=['buybox_price'])  
print(df_clean_2, 'clean 2: Eliminate Nan Buybox Prices')
df_clean_2 = df_clean_2.replace(",", "").replace('', '').astype({'product_ranking':'float64'})
df_clean_3 = df_clean_2[~(df_clean_2['product_ranking'] >= 5000000)]
print(df_clean_3, 'clean 3: Eliminate Product Ranking + than 5.000.000')
df_clean_4 = df_clean_3[~(df_clean_3['buybox_price'] <= 6)]
print(df_clean_4, 'clean 4: Eliminate Buybox Price less than 6$')
# Save Cleaned File
path_file = os.path.join(BASE_DIR, 'csv/amazon_product_comparator.csv')
df_hc = df_clean_4.to_csv(path_file)

该错误可以在以下行中找到：

df_clean_3 = df_clean_2[~(df_clean_2['product_ranking'] >= 5000000)]
    print(df_clean_3, 'clean 3: Eliminate Product Ranking + than 5.000.000')

import pandas as pd import numpy as np import re # example data frame df2 = pd.DataFrame([['1', '2000000000'], ['2', '1,002,*8320'], ['3', '1,000,000']], columns = ['idx','product_ranking']) # Remove Commas df2['product_ranking'] = df2['product_ranking'].map(lambda x: re.sub('[,]*' , '', x)) # Remove Convert strings that are numeric into floats. df2['product_ranking'] = df2['product_ranking'].map(lambda x: float(x) if x.isnumeric() else np.nan) #Comparison is working df2['product_ranking'] > 100000000