Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Enumerate sub groups based on occurence

I am looking for a simple solution to enumerate subgroups in a pandas dataframe based on their occurance. Here is a sample dataframe and the results I want to achieve. Its a bit like ngroup(), but counter resets every major group. Data is grouped by ‘LOT’ and ‘TDATE’.

import pandas as pd

df = pd.DataFrame({

       "LOT": ['A', 'A', 'A', 'A', 'A', 'A','B', 'B', 'B','B', 'B', 'B'],

       "TDATE": ['01.01.2022', '01.01.2022', '02.01.2022', '02.01.2022', '03.01.2022', '03.01.2022', '02.01.2022', '02.01.2022', '03.01.2022','03.01.2022', '06.01.2022', '06.01.2022'],

       "MOD": ['TP1', 'TP2', 'TP1', 'TP2', 'TP1', 'TP2','TP1', 'TP2','TP1', 'TP2','TP1', 'TP2'],

       "MWERT": [1.2, 2.3, 3.5, 4.6, 5.1, 6.4, 7.3, 2.8, 1.9, 1.10, 1.1, 1.2]})

Results of df['ENUM'] = df.groupby(['LOT','TDATE']).ngroup()

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     3
7    B  02.01.2022  TP2    2.8     3
8    B  03.01.2022  TP1    1.9     4
9    B  03.01.2022  TP2    1.1     4
10   B  06.01.2022  TP1    1.1     5
11   B  06.01.2022  TP2    1.2     5

but this is what I need:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     0
7    B  02.01.2022  TP2    2.8     0
8    B  03.01.2022  TP1    1.9     1
9    B  03.01.2022  TP2    1.1     1
10   B  06.01.2022  TP1    1.1     2
11   B  06.01.2022  TP2    1.2     2

As you can see, the ngroup() "resets" for any new ‘LOT’.

>Solution :

You can use a nested groupby:

df['ENUM'] = (df.groupby('LOT')
                .apply(lambda g: g.groupby('TDATE').ngroup())
                .droplevel(0)
              )

output:

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     0
7    B  02.01.2022  TP2    2.8     0
8    B  03.01.2022  TP1    1.9     1
9    B  03.01.2022  TP2    1.1     1
10   B  06.01.2022  TP1    1.1     2
11   B  06.01.2022  TP2    1.2     2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading