Pandas – Identify non-unique rows, grouping any pairs

July 7, 2022

I am trying to figure out a non-looping way to identify (auto-incrementing int would be ideal) the non-unique groups of rows (a group can contain 1 or more rows) within each GroupID.

Here is an example DataFrame that looks like

Index	Cents	SD_YF	GroupID
10	182.5	2.1	0
11	182.5	2.1	0
12	153.5	1.05	1
13	153.5	1.05	1
14	43	11	2
15	43	11	2
4	152	21	2
5	152	21	2

My ideal output would be:

Index	Cents	SD_YF	GroupID	UniID
10	182.5	2.1	0	1
11	182.5	2.1	0	2
12	153.5	1.05	1	3
13	153.5	1.05	1	4
14	43	11	2	5
15	43	11	2	6
4	152	21	2	5
5	152	21	2	6

I have bolded #5 to draw attention to how index 14, 4 are paired together. Similar with #6. I hope that makes sense!

>Solution :

IIUC you need to add the group number + the cumcount per duplicate + 1:

df['UniID'] = (df['GroupID']
 +df.groupby('GroupID').ngroup().add(1)
 +df.groupby(['GroupID', 'Cents', 'SD_YF']).cumcount()
)

output:

   Index  Cents  SD_YF  GroupID  UniID
0     10  182.5   2.10        0      1
1     11  182.5   2.10        0      2
2     12  153.5   1.05        1      3
3     13  153.5   1.05        1      4
4     14   43.0  11.00        2      5
5     15   43.0  11.00        2      6
6      4  152.0  21.00        2      5
7      5  152.0  21.00        2      6