Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select certain columns based on multiple criteria in pandas

I have the following dataset:

my_df = pd.DataFrame({'id':[1,2,3,4,5],
                      'type':['corp','smb','smb','corp','mid'],
                      'sales':[34567,2190,1870,22000,10000],
                      'sales_roi':[.10,.21,.22,.15,.16],
                      'sales_pct':[.38,.05,.08,.30,.20],
                      'sales_ln':[4.2,2.1,2.0,4.1,4],
                      'cost_pct':[22000,1000,900,14000,5000],
                      'flag':[0,1,0,1,1],
                      'gibberish':['bla','ble','bla','ble','bla'],
                      'tech':['lnx','mst','mst','lnx','mc']})
my_df['type'] = pd.Categorical(my_df.type)
my_df
    id  type    sales   sales_roi   sales_pct   sales_ln    cost_pct    flag    gibberish   tech
0   1   corp    34567   0.10        0.38        4.2         22000       0       bla         lnx
1   2   smb     2190    0.21        0.05        2.1         1000        1       ble         mst
2   3   smb     1870    0.22        0.08        2.0         900         0       bla         mst
3   4   corp    22000   0.15        0.30        4.1         14000       1       ble         lnx
4   5   mid     10000   0.16        0.20        4.0         5000        1       bla         mc

And I want to filter out all variables who end in "_pct" or "_ln" or are equal to "gibberish" or "tech". This is what I have tried:

df_selected = df.loc[:, ~my_df.columns.str.endswith('_pct') &
~my_df.columns.str.endswith('_ln') &
~my_df.columns.str.contains('gibberish','tech')]

But it returns me an unwanted column ("tech"):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    id  type    sales   sales_roi   flag    tech
0   1   corp    34567   0.10        0       lnx
1   2   smb     2190    0.21        1       mst
2   3   smb     1870    0.22        0       mst
3   4   corp    22000   0.15        1       lnx
4   5   mid     10000   0.16        1       mc

This is the expected result:

    id  type    sales   sales_roi   flag
0   1   corp    34567   0.10        0   
1   2   smb     2190    0.21        1   
2   3   smb     1870    0.22        0    
3   4   corp    22000   0.15        1   
4   5   mid     10000   0.16        1    

Please consider that I have to deal with hundreds of variables and this is just an example of what I need. Any help will be greatly appreciated.

>Solution :

Currently, what you are doing will return every column because of how the conditions are written. endswith will accept tuples so just put all the columns you are looking for in a single tuple and then filter

my_df[my_df.columns[~my_df.columns.str.endswith(('_pct','_ln','gibberish','tech'))]]

   id  type  sales  sales_roi  flag
0   1  corp  34567       0.10     0
1   2   smb   2190       0.21     1
2   3   smb   1870       0.22     0
3   4  corp  22000       0.15     1
4   5   mid  10000       0.16     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading