Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Minimum date of each continuous block of data of another column

I have a dataframe with time-series data as follows:

      Date      Value
0  2021-12-01     A
1  2021-12-02     A
2  2021-12-03     A
3  2021-12-04     B
4  2021-12-05     B
5  2021-12-06     A
6  2021-12-07     A
7  2021-12-08     C

I’m trying to reduce this to only have the first date of each continuous block for the Value column. So the result would look like:

      Date      Value
0  2021-12-01     A
1  2021-12-04     B
2  2021-12-06     A
3  2021-12-08     C

I’ve tried a bunch of different ways of masking, dropping duplicates based on the mask, etc. but cannot do it. Any help is appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use ne (not equals) + shift​ to create a mask where the first value of each consecutive group is True, and then cumsum to create a unique for each group that’s shared by all its items.

Then, drop_duplicates based on that, and use index of the returned rows to index the dataframe:

subset = df.loc[df['Value'].ne(df['Value'].shift(1)).cumsum().drop_duplicates().index]

Output:

>>> subset
         Date Value
0  2021-12-01     A
3  2021-12-04     B
5  2021-12-06     A
7  2021-12-08     C
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading