How to use a vectorized operation based on column names?

February 4, 2022

Let’s say I have a set-up like this

import pandas as pd 

def dummy(val1, val2):
    return val1 * val2 / 10


df = pd.DataFrame({'a': range(1, 3), 'b': range(2, 4), 'c': range(3, 5)})
d = {'a': 3, 'b': 10}

   a  b  c
0  1  2  3
1  2  3  4

Now I would like to apply dummy to the columns in df which exist as keys in d and add new columns; thereby val1 refers to the values in the respective columns and val2 to the value in d for the respective key.

I could do it like this

for k, v in d.items():
    # d[k] is of course just v; it's just to show that k is required for both input values
    df[f'{k}_calc'] = dummy(df[k], d[k])

which gives me the desired outcome

   a  b  c  a_calc  b_calc
0  1  2  3     0.3     2.0
1  2  3  4     0.6     3.0

Is there a more straightforward implementation available that avoids the loop?

>Solution :

You can try something like this:

~~cols = pd.Index([‘a’,’b’])~~

#Using @richardec idea of the dictionary keys
cols = pd.Index(d.keys())
df[cols + '_calc'] = df[cols].apply(lambda x: dummy(x, d[x.name]))
df

Output:

   a  b  c  a_calc  b_calc
0  1  2  3     0.3     2.0
1  2  3  4     0.6     3.0

Details:

Using pd.DataFrame.apply, you are passing each column of the
dataframe into the lambda.
x.name is the column header.
Using x, and d[x.name] as inputs to your custom function.