Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a column ranking order of records by date

Imagine I have the following dataframe with employees, their contract type (values could be Employee, Contractor and Agency). Also, one person can have more than 1 contract as you could see in the dataframe example below:

ID      Name    Contract    Date
10000   John    Employee    2021-01-01
10000   John    Employee    2021-01-01
10000   John    Employee    2020-03-06
10000   John    Contractor  2021-01-03
10000   John    Agency      2021-01-01
10000   John    Contractor  2021-02-01
10001   Carmen  Employee    1988-06-03
10001   Carmen  Employee    2021-02-03
10001   Carmen  Contractor  2021-02-03
10002   Peter   Contractor  2021-02-03
10003   Fred    Employee    2020-01-05
10003   Fred    Employee    1988-06-03

I need to find a way that, per each unique ID, and each unique Contract Type, it created a column named "Order" that would rank, starting with 1 on the oldest contract, each of the contract types each ID have.If the date is the same, the rank order does not matter. This would result on the following dataframe:

ID      Name    Contract    Date        Order
10000   John    Employee    2021-01-01  1
10000   John    Employee    2021-01-01  2
10000   John    Employee    2020-03-06  3
10000   John    Contractor  2021-01-03  2
10000   John    Agency      2021-01-01  1
10000   John    Contractor  2021-02-01  1
10001   Carmen  Employee    1988-06-03  1
10001   Carmen  Employee    2021-02-03  2
10001   Carmen  Contractor  2021-02-03  1
10002   Peter   Contractor  2021-02-03  1
10003   Fred    Employee    2020-01-05  2
10003   Fred    Employee    1988-06-03  1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here is an approach: You first sort df using df.sort_values() and finally use df.groupby().cumcount()

df['Data'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID', 'Contract', 'Date'])
df['Order'] = df.groupby(['ID', 'Contract']).cumcount() + 1
print(df)

       ID    Name    Contract        Date       Data  Order
4   10000    John      Agency  2021-01-01 2021-01-01      1
3   10000    John  Contractor  2021-01-03 2021-01-03      1
5   10000    John  Contractor  2021-02-01 2021-02-01      2
2   10000    John    Employee  2020-03-06 2020-03-06      1
0   10000    John    Employee  2021-01-01 2021-01-01      2
1   10000    John    Employee  2021-01-01 2021-01-01      3
8   10001  Carmen  Contractor  2021-02-03 2021-02-03      1
6   10001  Carmen    Employee  1988-06-03 1988-06-03      1
7   10001  Carmen    Employee  2021-02-03 2021-02-03      2
9   10002   Peter  Contractor  2021-02-03 2021-02-03      1
11  10003    Fred    Employee  1988-06-03 1988-06-03      1
10  10003    Fred    Employee  2020-01-05 2020-01-05      2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading