I am trying to do calculate price per unit based on price and pack_count columns from a dataframe.
import pandas as pd
# assign data of lists.
data = {'price': ['23', '', '211', '100'], 'pack_count': [23, 10, 1, 1]}
# Create DataFrame.
df = pd.DataFrame(data)
df["price_per_unit"] = (df["price"].str.replace('', '0' ).astype(float) \
/ df["pack_count"].astype(float))\
.apply(lambda x: '' if x == 0 else round(x, 2))
print(df)
This is the result I get:
 
price pack_count price_per_unit
0 23 23 88.26
1 10
2 211 1 201010
3 100 1 100000
I used replace to handle empty strings but something weird is happening as you can see in the output above.
-
Why is this happening?
-
How can I modify it such that if
pricecolumn is an empty string, then we skip the calculation for that column and assign a default value, otherwise do the required calculation.
Any help would be appreciated.
Thank you!
>Solution :
Instead of the str.replace method, use replace method.
str.replace replaces each occurrence of the pattern in the Series, i.e. matches in any part of the strings. So for example, '23' is changed to '02030', which becomes 2030.00 when changed to float, which divided by 23 becomes 88.26.
Meanwhile replace replaces values given in to_replace with value, i.e. looks for exact matches.
df["price_per_unit"] = (df["price"].replace('', '0').astype(float)) \
/ df["pack_count"].astype(float) \
.apply(lambda x: '' if x == 0 else round(x, 2))
Output:
price pack_count price_per_unit
0 23 23 1.0
1 10 0.0
2 211 1 211.0
3 100 1 100.0