Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the value corresponding to the start of a given category created with datetime64[ns]

I have a DataFrame that is a time series every minutes (duration: up to today)
All ts are in datetime64[ns]

Input:

timestamp                   A       
2021-01-01 00:00:00.000     2000   
2021-01-01 00:01:00.000     2005   
2021-01-01 00:02:00.000     2010      
...

I create a new column with categorical data consisting of periods of 40 days like below:
( I am using pd.cut) , output is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

timestamp                   A        period 
2021-01-01 00:00:00.000     2000     (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:01:00.000     2005     (2021-12-31 23:59:59.999999999, 2022-02-10]
2021-01-01 00:02:00.000     2010     (2021-12-31 23:59:59.999999999, 2022-02-10]
...

The set of period is as below:

[(2021-12-31 23:59:59.999999999, 2022-02-10] < (2022-02-10, 2022-03-22] < (2022-03-22, 2022-05-01]]

I want to create a new column in which I get the first value of this category
For example:

timestamp                   A        period                                       A_ref
2021-01-01 00:00:00.000     2000     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
2021-01-01 00:01:00.000     2005     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
2021-01-01 00:02:00.000     2010     (2021-12-31 23:59:59.999999999, 2022-02-10]  2000
...

It could be that the beginning of the period is not an exact match.

What I have tried: df.resample(’40D’) taking ‘first’ as aggregator. Doing this I can get the value I am interested in but I really want is fill this value across all of the rows.
Maybe the approach of using a category is not necessary?

Thank you

>Solution :

df.merge(df.groupby('period')['A'].first().reset_index(), on='period', suffixes = ('', '_ref'))

                 timestamp  ...  A_ref
0  2021-01-01 00:00:00.000  ...   2000
1  2021-01-01 00:01:00.000  ...   2000
2  2021-01-01 00:02:00.000  ...   2000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading