Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Drop rows before a specific observation in a MultiIndex DataFrame

I have a series of observations (status) at different timepoints (dt) for a number of subjects (id) in a MultiIndex DataFrame:

       dt  status
id  n
01  1  xx     abc
    2  xx     xxx
02  1  xx     xxx
    2  xx     abc
    3  xx     xxx
    4  xx     xxx
03  1  xx     xxx
04  1  xx     xxx
    2  xx     xxx
    3  xx     abc

I want to drop every observation before status=='abc' to make a new MultiIndex DataFrame like this:

       dt  status
id  n
01  1  xx     abc
    2  xx     xxx
02  2  xx     abc
    3  xx     xxx
    4  xx     xxx
04  3  xx     abc

What is the most efficient way to do that?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Create a boolean mask to identify the rows where status is abc, then group the mask by id and use cummax to propagate the True values in forward direction then use the resulting mask to filter the rows

df[df['status'].eq('abc').groupby('id').cummax()]

      dt status
id n           
01 1  xx    abc
   2  xx    xxx
02 2  xx    abc
   3  xx    xxx
   4  xx    xxx
04 3  xx    abc
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading