Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas rank values within groupby, starting a new rank if diff is greater than 1

I have a sample dataframe as follows:

data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
df=pd.DataFrame.from_dict(data)
df['WeekDiff'] = df.groupby('Store')['Week'].diff().fillna(1)

I added a difference column to find the gaps in the Week column within my data.
I have been trying to groupby Store and somehow use the differences column to achieve the below output but with no success. I need the ranks to start from each occurence of a value greater than one until the next such value. Please see a sample output I’d like to achieve.

result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
      'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}

I am new to python and pandas and I’ve been trying to google this all day, but couldn’t find a solution. Could you please help me how to do this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you in advance!

>Solution :

You could try as follows:

import pandas as pd

data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}

df = pd.DataFrame(data)

df['Rank'] = df.groupby('Store')['Week'].diff()>1
df['Rank'] = df.groupby('Store')['Rank'].cumsum().add(1)

# check with expected output:
result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
      'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}

result_df = pd.DataFrame(result_data)

df.equals(result_df)
# True

Or as a (lengthy) one-liner:

df['Rank'] = df.set_index('Store').groupby(level=0)\
    .agg(Rank=('Week','diff')).gt(1).groupby(level=0)\
        .cumsum().add(1).reset_index(drop=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading