Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Running a function on multiple columns of a pandas dataframe in parallel

Assume we have a pandas dataframe and 100 columns from S1 to S100.
There might be other columns in the data frame, but we are interested in these only.
We need to get the number of rows satisfying the condition below.

num_of_rows = len(df[df[S1] >= float(cutoff)])

Is there a way to do this in parallel for 100 columns and get an array of num_of_rows resulting from each column?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Going parallel is most likely to be more expensive, I would use vectorial code:

df.ge(float(cutoff)).sum()

If you only want to use Sx columns:

df.filter(like='S').ge(float(cutoff)).sum()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading