Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Drop columns if all their values in a specific date range are NaNs using Pandas

Given a data sample as follows:

          date  value1    value2    value3
0   2021-10-12   1.015  1.115668  1.015000
1   2021-10-13     NaN  1.104622  1.030225
2   2021-10-14     NaN  1.093685       NaN
3   2021-10-15   1.015  1.082857       NaN
4   2021-10-16   1.015  1.072135  1.077284
5   2021-10-29   1.015  1.061520  1.093443
6   2021-10-30   1.015  1.051010  1.109845
7   2021-10-31   1.015       NaN  1.126493
8    2021-11-1   1.015       NaN       NaN
9    2021-11-2   1.015  1.020100       NaN
10   2021-11-3     NaN  1.010000       NaN
11  2021-11-30   1.015  1.000000       NaN

Let’s say I want to drop columns whose values all are NaNs in the November of 2021, which means range of 2021-11-01 to 2021-11-30 (including the starting and ending date).

Under this requirement, vlue3 will be drop since all its values in 2021-11 are NaNs. Other columns have NaNs in 2021-11 but not all, so those columns will be kept.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How could I achieve that in Pandas? Thanks.

EDIT:

df['date'] = pd.to_datetime(df['date'])  
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df.loc[mask]

Out:

         date  value1  value2  value3
8  2021-11-01   1.015     NaN     NaN
9  2021-11-02   1.015  1.0201     NaN
10 2021-11-03     NaN  1.0100     NaN
11 2021-11-30   1.015  1.0000     NaN

>Solution :

You can filter rows by November of 2021 and test if all rows has NaNs by conditions:

df['date'] = pd.to_datetime(df['date'])

df = df.loc[:, ~df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()]

Or:

df['date'] = pd.to_datetime(df['date'])

df = df.loc[:, df[df['date'].dt.to_period('m') == pd.Period('2021-11')].notna().any()]

EDIT:

mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df = df.loc[:, df.loc[mask].notna().any()]

Out:

         date  value1    value2
0  2021-10-12   1.015  1.115668
1  2021-10-13     NaN  1.104622
2  2021-10-14     NaN  1.093685
3  2021-10-15   1.015  1.082857
4  2021-10-16   1.015  1.072135
5  2021-10-29   1.015  1.061520
6  2021-10-30   1.015  1.051010
7  2021-10-31   1.015       NaN
8  2021-11-01   1.015       NaN
9  2021-11-02   1.015  1.020100
10 2021-11-03     NaN  1.010000
11 2021-11-30   1.015  1.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading