I have a pandas dataframe df :-
| ID | COST | 1F | 2F | 3F | 4G |
|---|---|---|---|---|---|
| 1 | 362 | 0 | 1 | 1 | 1 |
| 2 | 269 | 0 | 1 | 0 | 0 |
| 3 | 346 | 1 | 1 | 1 | 1 |
| 4 | 342 | 0 | 0 | 0 | 0 |
I have a total_cost dictionary :
total_cost ={‘1F’ : 0.047,’2F’ : 0.03,’3F’: 0.023,’4G’: 0.025}
I want to add a TOTAL_COST column such that wherever 1 is present, COST*(value from total_cost dictionary) for that col is to be multiplied and added together.
The dataframe has around a milion records, what would be the most efficient way to do this?
Expected df :-
| ID | COST | 1F | 2F | 3F | 4G | TOTAL_COST |
|---|---|---|---|---|---|---|
| 1 | 362 | 0 | 1 | 1 | 1 | 28.236 |
| 2 | 269 | 0 | 1 | 0 | 0 | 8.07 |
| 3 | 346 | 1 | 1 | 1 | 1 | 43.25 |
| 4 | 342 | 0 | 0 | 0 | 0 | 0 |
>Solution :
Use slicing to only get the columns present as keys in your dictionary, then multiply by it, aggregate as sum per row and multiply by the COST:
df['TOTAL_COST'] = df[list(total_cost)].mul(total_cost).sum(axis=1).mul(df['COST'])
Or, cast the dictionary as Series (the non matching columns will become NaN):
df['TOTAL_COST'] = df.mul(pd.Series(total_cost)).sum(axis=1).mul(df['COST'])
Output:
ID COST 1F 2F 3F 4G TOTAL_COST
0 1 362 0 1 1 1 28.236
1 2 269 0 1 0 0 8.070
2 3 346 1 1 1 1 43.250
3 4 342 0 0 0 0 0.000