Suppose that I have a sample data set that can be generated using code below
# Sample DataFrame with duplicate rows
data = {'A': [1, 2, 1, 3, 1, 2, 3, 2],
'B': [4, 5, 4, 6, 4, 5, 6, 5],
'C': [1, 2, 3, 4, 5, 6, 7, 8]}
df = pd.DataFrame(data)
In above dataframe I want to assign duplicate rows similar index. For index 0 would be assigned to rows 0, 1 and 4. Similarly index 1 would be assigned to rows 1, 5 and 7. Duplicates should be identified using only column A and B
>Solution :
Use groupby().ngroup:
df.index = df.groupby(['A','B']).ngroup()
Output:
A B C
0 1 4 1
1 2 5 2
0 1 4 3
2 3 6 4
0 1 4 5
1 2 5 6
2 3 6 7
1 2 5 8