Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter and drop rows in a pandas dataframe from selected columns

I’ve got a pandas dataframe that has 100 columns. First col is a string and the rest are ints. I’d like to drop any row where an int column contains a value that is out of range (less than 0 or greater than 500)

I’ve seen examples of doing this operation on single columns but not multiple. I’ve tried the following where I’ve filled a list with all 99 column names that are ints but the index list returned contains the index of every row so they all get dropped.

drop_list = my_data[(my_data[column_name_list] < 0) | (my_data[column_name_list] > 500)].index
my_data = my_data.drop(drop_list)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If I understand correctly, you should aggregate with any/all to perform boolean indexing:

out = my_data[~(my_data[column_name_list].lt(0).any(axis=1)
               |my_data[column_name_list].gt(500).any(axis=1)
               )]

Example:

# input
   A  0   1    2    3
0  a -1  10  100   10
1  b  2  20  200   20
2  c  3  30  300  600

# output
   A  0   1    2   3
1  b  2  20  200  20

# intermediates
   A  0   1    2    3  lt(0).any(axis=1)  gt(500).any(axis=1)     OR      ~
0  a -1  10  100   10               True                False   True  False
1  b  2  20  200   20              False                False  False   True
2  c  3  30  300  600              False                 True   True  False

Note that following De Morgan’s law this is equivalent to:

out = my_data[(my_data[column_name_list].ge(0).all(axis=1)
             & my_data[column_name_list].le(500).all(axis=1)
              )]

Intermediates:

   A  0   1    2    3  ge(0).all(axis=1)  le(500).all(axis=1)    AND
0  a -1  10  100   10              False                 True  False
1  b  2  20  200   20               True                 True   True
2  c  3  30  300  600               True                False  False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading