Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Identify non-unique rows, grouping any pairs

I am trying to figure out a non-looping way to identify (auto-incrementing int would be ideal) the non-unique groups of rows (a group can contain 1 or more rows) within each GroupID.

Here is an example DataFrame that looks like

Index Cents SD_YF GroupID
10 182.5 2.1 0
11 182.5 2.1 0
12 153.5 1.05 1
13 153.5 1.05 1
14 43 11 2
15 43 11 2
4 152 21 2
5 152 21 2

My ideal output would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Index Cents SD_YF GroupID UniID
10 182.5 2.1 0 1
11 182.5 2.1 0 2
12 153.5 1.05 1 3
13 153.5 1.05 1 4
14 43 11 2 5
15 43 11 2 6
4 152 21 2 5
5 152 21 2 6

I have bolded #5 to draw attention to how index 14, 4 are paired together. Similar with #6. I hope that makes sense!

>Solution :

IIUC you need to add the group number + the cumcount per duplicate + 1:

df['UniID'] = (df['GroupID']
 +df.groupby('GroupID').ngroup().add(1)
 +df.groupby(['GroupID', 'Cents', 'SD_YF']).cumcount()
)

output:

   Index  Cents  SD_YF  GroupID  UniID
0     10  182.5   2.10        0      1
1     11  182.5   2.10        0      2
2     12  153.5   1.05        1      3
3     13  153.5   1.05        1      4
4     14   43.0  11.00        2      5
5     15   43.0  11.00        2      6
6      4  152.0  21.00        2      5
7      5  152.0  21.00        2      6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading