I have the following dataframe:
from numpy import nan
df = pd.DataFrame({'Date': ['2014-09-30', '2014-10-01',
'2014-10-31', '2014-11-01'],
'X1': [20, nan, 19, nan],
'X2': [nan,2,nan,4],
'X3': [5,nan,9,nan],
})
Date X1 X2 X3
0 2014-09-30 20 nan 5
1 2014-10-01 nan 2 nan
2 2014-10-31 19 nan 9
3 2014-11-01 nan 4 nan
As you can see, the issue is that the columns have different release time. I want to create one unique dataframe with end-of-the-month day for each variable. The outcome should therefore be:
Date X1 X2 X3
0 2014-09-30 20 2 5
1 2014-10-31 19 4 9
Can anyone help me get it?
Thanks!
>Solution :
You can use pandas pandas.tseries.offsets.MonthEnd to groupby+first
from pandas.tseries.offsets import MonthEnd
g = pd.to_datetime(df['Date']) + MonthEnd(1) - MonthEnd()
df.groupby(g, as_index=False).first().convert_dtypes()
output:
Date X1 X2 X3
0 2014-09-30 20 2 5
1 2014-10-31 19 4 9