Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to identify groups based on two columns and ignore duplicates?

My dataframe is this :

df = pd.DataFrame({'col1': ['A', 'A', 'A', 'A', 'B', 'C', 'C'],
 'col2': ['a1', 'a1', 'a2', 'a2', 'b1', 'c1', 'c2']})

print(df)

  col1 col2
0    A   a1
1    A   a1
2    A   a2
3    A   a2
4    B   b1
5    C   c1
6    C   c2

I want to add a column that will tell the rank of the column col2 in each value of col1.

I tried three codes but none of them gives the correct result :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['col3_1'] = df.groupby(['col1', 'col2']).cumcount() + 1
df['col3_2'] = df.groupby(['col1'])['col2'].cumcount() + 1
df['col3_3'] = df.groupby(['col1', 'col2']).ngroup() + 1

print(df)

  col1 col2  col3_1  col3_2  col3_3
0    A   a1       1       1       1
1    A   a1       2       2       1
2    A   a2       1       3       2
3    A   a2       2       4       2
4    B   b1       1       1       3
5    C   c1       1       1       4
6    C   c2       1       2       5

PS : the real values of the column col2 doesn’t have a pattern letter-number.

My expected output is this :

  col1 col2  col3
0    A   a1     1
1    A   a1     1
2    A   a2     2
3    A   a2     2
4    B   b1     1
5    C   c1     1
6    C   c2     2

Can you guys show me how to do it ?

>Solution :

I think this does what you want:

df['col3'] = df.groupby(['col1'])['col2'].transform(lambda x: pd.factorize(x)[0] + 1)

pd.factorize assigns an integer to each unique value, it returns a tuple of the integer codes and the values. We add 1 because it starts indexing from 0.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading