Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to adjust this dataframe?

I have the following dataframe:

from numpy import nan
df = pd.DataFrame({'Date': ['2014-09-30', '2014-10-01',
                            '2014-10-31', '2014-11-01'],
                     'X1': [20, nan, 19, nan],
                     'X2': [nan,2,nan,4],
                     'X3': [5,nan,9,nan],
                     }) 

         Date   X1   X2   X3
0  2014-09-30   20  nan    5
1  2014-10-01  nan    2  nan
2  2014-10-31   19  nan    9
3  2014-11-01  nan    4  nan

As you can see, the issue is that the columns have different release time. I want to create one unique dataframe with end-of-the-month day for each variable. The outcome should therefore be:


         Date   X1   X2   X3
0  2014-09-30   20    2    5
1  2014-10-31   19    4    9

Can anyone help me get it?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks!

>Solution :

You can use pandas pandas.tseries.offsets.MonthEnd to groupby+first

from pandas.tseries.offsets import MonthEnd
g = pd.to_datetime(df['Date']) + MonthEnd(1) - MonthEnd()
df.groupby(g, as_index=False).first().convert_dtypes()

output:

         Date  X1  X2  X3
0  2014-09-30  20   2   5
1  2014-10-31  19   4   9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading