Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas drop a column according to one header value

I have this dataframe

name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451

I read it as:

 dfr     =  pd.read_csv(f_name,
                        parse_dates           = True,
                        index_col             = 0,
                        header                = [0,1,2,3],
                        infer_datetime_format = True,
                        cache_dates=True)

I would like to drop some columns according to the a threshold value in the 4th row, that is one of the heads due to the the fact that I use multiple index.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would like to do something like:

for column in dfr:
    if dfr[column][2] <= 1300.:
        dfr = dfr.drop(column,axis=1) 

The problem is that I am not able to select the right "head" inside the multi heads. I would like also to do it in a smart way, avoiding in other word the cycle.

>Solution :

You can select forth level by Index.get_level_values and select columns with invert mask – greater like 1300 in DataFrame.loc:

df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]

Or if dont need always converting to integers is possible set values before solution:

df = df.rename(columns=int, level=3)
print (df.columns)
MultiIndex([('01100MS', '626323', '5188431',  915),
            ('02200MS', '616720', '5181393', 1499),
            ('02500MS', '616288', '5173583', 1310),
            ('03100MS', '611860', '5165895', 1235),
            ('06400MS', '622375', '5152605',  190)],
           names=['name', 'lat', 'long', 'alt'])

df = df.loc[:,df.columns.get_level_values(3) > 1300]
print (df)
name       02200MS 02500MS
lat         616720  616288
long       5181393 5173583
alt           1499    1310
1920-01-01     4.1    4.41
1920-01-02     4.2    4.42
1920-01-03     4.3    4.43
1920-01-04     4.4    4.44
1920-01-05     4.5    4.45
1920-01-06     4.6    4.46
1920-01-07     4.7    4.47
1920-01-08     4.8    4.48
1920-01-09     4.9    4.49
1920-01-10     5.0    4.50
1920-01-11     5.1    4.51
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading