Given:
import numpy as np
import pandas as pd
df = pd.DataFrame(data={'user_ip': ["u1", "u2", "u3", "u4", "u5"],
'a': [1, np.nan, 8, 2, 0],
'b': [2, 5, 1, np.nan, 0],
'c': [3, 0, np.nan, 0, 7],
'd': [0, 2, 1, 2, 9],
},
)
user_ip a b c d
0 u1 1.0 2.0 3.0 0
1 u2 NaN 5.0 0.0 2
2 u3 8.0 1.0 NaN 1
3 u4 2.0 NaN 0.0 2
4 u5 0.0 0.0 7.0 9
Goal:
I’d like to loop through each row to get a new column using my custom defined function with input arguments (including DataFrame and its column) as follows:
def fcn(df, col, x, y):
return x*df[col] + y
df["new_col_apply"] = df.apply(lambda inp_df: fcn(inp_df, col="b", x=2, y=10), axis=1)
My solution works fine but apply() method seems quite slow for my original dataframe containing more than 900K rows.
I am aware of map() but since DataFrame doesn’t have map() transformation and I specifically need to input my DataFrame and its column (col) as input to my function fcn, my following snippet:
df["new_col_map"] = df.map(lambda inp_df: fcn(inp_df, col="b", x=2, y=10), na_action="ignore")
ends up in AttributeError as bellow:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-14-dfe11c4bff87> in <cell line: 1>()
----> 1 df["new_col_map"] = df.map(lambda inp_df: fcn(inp_df, col="b", x=2, y=10), na_action="ignore")
2 df
/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5900 ):
5901 return self[name]
-> 5902 return object.__getattribute__(self, name)
5903
5904 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'map'
Is there any better and faster alternative than apply() transformation to loop through large pandas DataFrame with custom defined functions with several arguments?
Cheers,
>Solution :
Try directly invoking the function:
df["new_col"] = fcn(df, col="b", x=2, y=10)