Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can a single column .apply() be made faster in Python Pandas?

Learned how to run a profiler for a code that needs many iterations in hopes to make the run times for sustainable. Turns out this take up 55-58% of the run time:

data['CDA_Factor_Avg'] = data.apply(lambda row : data['CDA_Factor'].loc[ starting_date : row.name ].mean(), axis=1)

Resulting in a Pandas dataframe ‘data’, columns ‘CDA_Factor_Avg’ and ‘CDA_Factor’ like:

CDA_Factor CDA_Factor_Avg
1 1
4 2.5
9 4.66

Where the mean is only ever taken up to the current cell. The Index is datetime. Does anyone see any better alternatives?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you!

>Solution :

You can use a expanding mean:

>>> df["CDA_Factor"].expanding().mean()
0    1.000000
1    2.500000
2    4.666667
Name: CDA_Factor, dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading