pandas – apply some calculations using multiple columns

January 10, 2022

I am trying to do calculate price per unit based on price and pack_count columns from a dataframe.

import pandas as pd
# assign data of lists.
data = {'price': ['23', '', '211', '100'], 'pack_count': [23, 10, 1, 1]}

# Create DataFrame.
df = pd.DataFrame(data)
df["price_per_unit"] = (df["price"].str.replace('', '0' ).astype(float) \
                                / df["pack_count"].astype(float))\
                                    .apply(lambda x: '' if x == 0 else round(x, 2))
                                    
print(df)

This is the result I get:

   
  price  pack_count price_per_unit
0    23          23          88.26
1                10               
2   211           1         201010
3   100           1         100000

I used replace to handle empty strings but something weird is happening as you can see in the output above.

Why is this happening?
How can I modify it such that if price column is an empty string, then we skip the calculation for that column and assign a default value, otherwise do the required calculation.

Any help would be appreciated.
Thank you!

>Solution :

Instead of the str.replace method, use replace method.

str.replace replaces each occurrence of the pattern in the Series, i.e. matches in any part of the strings. So for example, '23' is changed to '02030', which becomes 2030.00 when changed to float, which divided by 23 becomes 88.26.

Meanwhile replace replaces values given in to_replace with value, i.e. looks for exact matches.

df["price_per_unit"] = (df["price"].replace('', '0').astype(float)) \
                                / df["pack_count"].astype(float) \
                                    .apply(lambda x: '' if x == 0 else round(x, 2))

Output:

  price  pack_count  price_per_unit
0    23          23             1.0
1                10             0.0
2   211           1           211.0
3   100           1           100.0