Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

df.groupby(df.index.to_period('M')).last() generates a new date index exceeding the range of the initial date index

Given a sample data as follows:

import pandas as pd
import numpy as np
np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 4)), index=dates, 
                  columns=['A_values', 'B_values', 'C_values', 'target'])
df

Out:

            A_values  B_values  C_values    target
2013-02-26  6.059783  7.333694  1.389472  3.126731
2013-02-27  9.972433  1.281624  1.789931  7.529254
2013-02-28  6.621605  7.843101  0.968944  0.585713
2013-03-01  9.623960  6.165574  0.866300  5.612724
2013-03-02  6.165247  9.638430  5.743043  3.711608
             ...       ...       ...       ...
2013-05-22  0.589729  6.479978  3.531450  6.872059
2013-05-23  6.279065  3.837670  8.853146  8.209883
2013-05-24  5.533017  5.241127  1.388056  5.355926
2013-05-25  1.596038  4.665995  2.406251  1.971875
2013-05-26  3.269001  1.787529  6.659690  7.545569

With code below, we can see the last row’s index is outside of range of the initial date index (maximum date is 2013-05-26):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(pd.Grouper(freq='M')).last()
Out[177]: 
            A_values  B_values  C_values    target
2013-02-28  6.621605  7.843101  0.968944  0.585713
2013-03-31  5.906967  8.545341  6.326550  8.684117
2013-04-30  5.358775  1.473809  5.231534  0.604810
2013-05-31  3.269001  1.787529  6.659690  7.545569

and:

df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())
Out[178]: 
2013-02   2013-02-28
2013-03   2013-03-31
2013-04   2013-04-30
2013-05   2013-05-26
Freq: M, dtype: datetime64[ns]

But I hope to get an expected result as follows, how could I do that? Thanks.

            A_values  B_values  C_values    target
2013-02-28  6.621605  7.843101  0.968944  0.585713
2013-03-31  5.906967  8.545341  6.326550  8.684117
2013-04-30  5.358775  1.473809  5.231534  0.604810
2013-05-26  3.269001  1.787529  6.659690  7.545569  # date should be `2013-05-26` based on the original data

>Solution :

Idea is create helper column from DatetimeIndex and after last convert column to index:

df = (df.assign(new=df.index)
        .groupby(pd.Grouper(freq='M'))
        .last()
        .set_index('new')
        .rename_axis(None))
print (df)
            A_values  B_values  C_values    target
2013-02-28  6.621605  7.843101  0.968944  0.585713
2013-03-31  5.906967  8.545341  6.326550  8.684117
2013-04-30  5.358775  1.473809  5.231534  0.604810
2013-05-26  3.269001  1.787529  6.659690  7.545569
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading