如何移动 for 循环并使用纯 Pandas?

数据挖掘 Python 熊猫
2022-03-05 08:02:19

我正在处理庞大的数据表,并开始学习 Pandas,但我遇到了这个挑战,我有一个循环,并试图将我循环中的所有内容都移到 Pandas 中,但我并不是所有我都能找到解决方法。

panda_dataframe = pd.read_sql(sql=sql, con=mysql_cnx, index_col='UUID')

logging.debug('__setupProducts() - after mysql query : run time {time}'.format(time=datetime.datetime.now() - start_time))
logging.debug('__setupProducts() - Product found to handle: {count}'.format(count=len(panda_dataframe)))

panda_dataframe['Price'] = panda_dataframe['Price'].apply(lambda x:float(x/100))
panda_dataframe['PriceNext'] = panda_dataframe['PriceNext'].apply(lambda x:float(x/100))
panda_dataframe['CostPrice'] = panda_dataframe['CostPrice'].apply(lambda x:float(x/100))
panda_dataframe['CostPriceReal'] = panda_dataframe['CostPriceReal'].apply(lambda x:float(x/100))
panda_dataframe['InStoreStock'] = panda_dataframe['InStoreStock'].apply(lambda x:int(x))

logging.debug('__setupProducts() - restructer dataframe to right types : run time {time}'.format(time=datetime.datetime.now() - start_time))

for product_uuid, product in panda_dataframe.iterrows():
    logging.info('Product: {title} loading and prepare...'.format(title=product['Title']))

    try:
        product_data = store_remote_stock_dataframe.get_group(product_uuid)

        product_data_onstock = product_data.loc[product_data['Stock'] > 0, ['Stock', 'CostPriceReal', 'CostPrice', 'Expected', 'DistributorUUID', 'Country']]
        product_data_outstock = product_data.loc[product_data['Stock'] <= 0, ['Stock', 'CostPriceReal', 'CostPrice', 'Expected', 'DistributorUUID', 'Country']]
        product_data = None

        if len(product_data_onstock) > 0:
            stock_cost_price = product_data_onstock.sort_values(by=['CostPrice'], ascending=True).iloc[0,:]
            stock_cost_real_price = product_data_onstock.sort_values(by=['CostPriceReal'], ascending=True).iloc[0,:]
        elif len(product_data_outstock) > 0:
            stock_cost_price = product_data_outstock.sort_values(by=['CostPrice'], ascending=True).iloc[0,:]
            stock_cost_real_price = product_data_outstock.sort_values(by=['CostPriceReal'], ascending=True).iloc[0,:]
        else:
            stock_cost_price = None
            stock_cost_real_price = None

        stock_cost_price = stock_cost_price.drop(['CostPriceReal','CostPrice']) if stock_cost_price is not None else None
        stock_cost_real_price = stock_cost_real_price.drop(['CostPriceReal','CostPrice']) if stock_cost_real_price is not None else None
    except:
        stock_cost_price = None
        stock_cost_real_price = None




    products.append({
        'uuid' : product_uuid,
        'title' : product['Title'],
        'price' : product['Price'],
        'price-next' : product['PriceNext'],
        'price-cost' : product['CostPrice'],
        'price-cost-real' : product['CostPriceReal'],
        'overwrites' : product['Overwrites'],
        'distributor-stock' : {
            'cost-price' : {
                'distributor' : stock_cost_price['DistributorUUID'] if stock_cost_price is not None else None,
                'stock' : stock_cost_price['Stock'] if stock_cost_price is not None else 0,
                'expected' : stock_cost_price['Expected']  if stock_cost_price is not None else -1,
                'country' : stock_cost_price['Country']  if stock_cost_price is not None else None,
            },
            'cost-price-real' : {
                'distributor' : stock_cost_real_price['DistributorUUID'] if stock_cost_real_price is not None else None,
                'stock' : stock_cost_real_price['Stock'] if stock_cost_real_price is not None else 0,
                'expected' : stock_cost_real_price['Expected'] if stock_cost_real_price is not None else -1,
                'country' : stock_cost_real_price['Country']  if stock_cost_real_price is not None else None
            }
        },
        'stock' : {
            'store' : int(product['InStoreStock']) if product['InStoreStock'] is not None else 0,
        },
        'manufacturer' : manufacturers[product['ManufacturerUUID']]['_id'] if product['ManufacturerUUID'] in manufacturers else None,
        'category' : categorys[product['CategoryUUID']]['_id'] if product['CategoryUUID'] in categorys else None,
    })

我要做的是将我的“尝试”代码关闭在我的 for 循环之前,然后我可以删除 for 循环并继续完全不使用 for 循环。

希望有人可以帮助我对熊猫更好,这样我就可以使用熊猫的力量:)

1个回答

在对 pandas 行进行操作时,我总是避免使用 for 循环——它既慢又低效。如果可能的def func(x): ...尝试创建一些函数(df[‘col1’].apply(func)func