Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Loop to delete rows based on condition Pandas

I got large data samples (1.6 million rows each) where I wish to delete all rows which does not fit certain conditions.

I do have over 1400 different conditions which are tested if they should be applied and once applied I use following code to delete them (with provided random example of data sample):

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,100,size=(1600000, 13)), columns=list('ABCDEFGHIJKLM'))

cols = ['A','B','C','D','E','F','G','H','I','J','K','L','M']


df['Conditions'] = df[(df[cols] >= 30) & (df[cols] <= 50)].count(axis=1)
df = df[(df["Conditions"] >= 2) & (df["Conditions"] <= 6)]

So for this example-loop. Values between 30 and 50 should occur min 2 but max 6 times per row (all conditions are similar but with different values)
My problem is that this takes very long time and since I got 1200 different data samples I’d like to find any way to speed up the process. Do you have any suggestions of method to increase the speed of this? I’ve also tried df.drop but I experience this as faster. Appreciate all suggestions.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I just realized that you used count, which is slower because you need to copy your data on the mask. I’d suggest you use sum on the logic:

mask = ((df[cols] >= 30) & (df[cols] <= 50)).sum(axis=1)
df = df[mask.between(2,6)]

This takes about 400ms on my system whereas your approach takes about 1s (including my commented suggestion, without it, it’s about 2s).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading