How can a single column .apply() be made faster in Python Pandas?

Learned how to run a profiler for a code that needs many iterations in hopes to make the run times for sustainable. Turns out this take up 55-58% of the run time:

data['CDA_Factor_Avg'] = data.apply(lambda row : data['CDA_Factor'].loc[ starting_date : row.name ].mean(), axis=1)

Resulting in a Pandas dataframe ‘data’, columns ‘CDA_Factor_Avg’ and ‘CDA_Factor’ like:

CDA_Factor CDA_Factor_Avg
1 1
4 2.5
9 4.66

Where the mean is only ever taken up to the current cell. The Index is datetime. Does anyone see any better alternatives?

Thank you!

>Solution :

You can use a expanding mean:

>>> df["CDA_Factor"].expanding().mean()
0    1.000000
1    2.500000
2    4.666667
Name: CDA_Factor, dtype: float64

Leave a Reply