Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the number of max consecutive values per month?

I have this df:

            DATE  CODE  YEAR_MONTH  PP  CONDITION
10958 1991-01-01  100015  1991-01  1.2      True
10959 1991-01-02  100015  1991-01  0.0      False
10960 1991-01-03  100015  1991-01  0.7      False
10961 1991-01-04  100015  1991-01  1.0      True
10962 1991-01-05  100015  1991-01  1.0      True
10962 1991-01-06  100015  1991-01  2.0      True
10962 1991-01-07  100015  1991-01  3.0      True
10962 1991-01-08  100015  1991-01  2.0      True
10962 1991-01-09  100015  1991-01  0.5      False
10962 1991-01-10  100015  1991-01  0.1      False
10962 1991-01-11  100015  1991-01  1.2      True
10962 1991-01-12  100015  1991-01  1.0      True
         ...     ...      ...  ...        ...
21911 1991-01-01  100016  1991-01  0.0      False
21912 1991-01-02  100016  1991-01  0.0      False
21913 1991-01-03  100016  1991-01  1.0      True
21914 1991-01-04  100016  1991-01  2.0      True
21915 1991-01-05  100016  1991-01  1.4      True
         ...     ...      ...  ...        ...

I want to obtain the maximum group of consecutive True values per YEAR_MONTH and CODE.

For example if i have 3 groups of consecutive True values in a specific YEAR_MONTH and CODE with 2, 3 and 5 consecutive True values, i want only the max consecutive group of True values. In this case 5.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

PostData: df[‘CONDITION’] == True when df[‘PP’] >= 1

Expected result:

CODE  YEAR_MONTH MAX_CONSEC_VALUE  
100015  1991-01       5
100015  1991-02       3
100015  1991-03       4
100016  1991-01       3
100016  1991-02       2
100016  1991-03       2
...     ...          ...

I tried this code:

    #Generate the CONDITION column 
    df['CONDITION']=(df['PP']>=1)
    #Cumulative consecutive values
    group=df.groupby(['CODE','YEAR_MONTH'])['CONDITION'].cumsum()
    
    #Creating the group dataframe
    group=pd.DataFrame({'index':group.index, 'PP':group.values})
    
    
    #Merging the CODE and YEAR_MONTH
    df.reset_index(drop=False,inplace=True)
consecutives=pd.merge(group,df[['index','CODE','YEAR_MONTH']],on='index', how='left')    
    
#Getting the max value per CODE and YEAR_MONTH    
consecutives_final = consecutives.groupby(['CODE','YEAR_MONTH']).max()

But this code gets the total True consecutive values by CODE and YEAR_MONTH.

Would you mind to help me?

Thanks in advance.

>Solution :

Try groupby on the key along with the cumsum of the negate condition (for the consecutive True blocks):

# this assumes the data is sorted by `CODE, YEAR_MONTH` already
(df.groupby(['CODE','YEAR_MONTH', (-df['CONDITION']).cumsum()])['CONDITION'].sum()
   .groupby(['CODE','YEAR_MONTH']).max()
)

Output (from the sample data):

CODE    YEAR_MONTH
100015  1991-01       5
100016  1991-01       3
Name: CONDITION, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading