Given a data sample as follows:
date value1 value2 value3
0 2021-10-12 1.015 1.115668 1.015000
1 2021-10-13 NaN 1.104622 1.030225
2 2021-10-14 NaN 1.093685 NaN
3 2021-10-15 1.015 1.082857 NaN
4 2021-10-16 1.015 1.072135 1.077284
5 2021-10-29 1.015 1.061520 1.093443
6 2021-10-30 1.015 1.051010 1.109845
7 2021-10-31 1.015 NaN 1.126493
8 2021-11-1 1.015 NaN NaN
9 2021-11-2 1.015 1.020100 NaN
10 2021-11-3 NaN 1.010000 NaN
11 2021-11-30 1.015 1.000000 NaN
Let’s say I want to drop columns whose values all are NaNs in the November of 2021, which means range of 2021-11-01 to 2021-11-30 (including the starting and ending date).
Under this requirement, vlue3 will be drop since all its values in 2021-11 are NaNs. Other columns have NaNs in 2021-11 but not all, so those columns will be kept.
How could I achieve that in Pandas? Thanks.
EDIT:
df['date'] = pd.to_datetime(df['date'])
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df.loc[mask]
Out:
date value1 value2 value3
8 2021-11-01 1.015 NaN NaN
9 2021-11-02 1.015 1.0201 NaN
10 2021-11-03 NaN 1.0100 NaN
11 2021-11-30 1.015 1.0000 NaN
>Solution :
You can filter rows by November of 2021 and test if all rows has NaNs by conditions:
df['date'] = pd.to_datetime(df['date'])
df = df.loc[:, ~df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()]
Or:
df['date'] = pd.to_datetime(df['date'])
df = df.loc[:, df[df['date'].dt.to_period('m') == pd.Period('2021-11')].notna().any()]
EDIT:
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df = df.loc[:, df.loc[mask].notna().any()]
Out:
date value1 value2
0 2021-10-12 1.015 1.115668
1 2021-10-13 NaN 1.104622
2 2021-10-14 NaN 1.093685
3 2021-10-15 1.015 1.082857
4 2021-10-16 1.015 1.072135
5 2021-10-29 1.015 1.061520
6 2021-10-30 1.015 1.051010
7 2021-10-31 1.015 NaN
8 2021-11-01 1.015 NaN
9 2021-11-02 1.015 1.020100
10 2021-11-03 NaN 1.010000
11 2021-11-30 1.015 1.000000