Create new ID based on two columns

December 6, 2023

I have a dataframe that looks like this:

Name	ID
A	1
B	2
A	1
C	3
B	3
D	3
E	1
F	2

As you can see for some IDs there are multiple names, I would like to change the ID so that there is a unique ID for each new instance of a name, the resulting ID column would ideally look like this:

Unfortunately I cannot use ngroup() because there are over 35,000 IDs.

Name	ID	ID_new
A	1	1_1
B	2	2_1
A	1	1_1
C	3	3_1
B	2	2_1
D	3	3_2
E	1	1_2
F	2	2_2

All help is appreciated!

I have used .ngroup() + 1 but as I said there are too many IDs, as well I have used cumcount() + 1 but this makes the number after the ‘_’ go up by one each time resulting in non-unique IDs.

>Solution :

Use factorize per groups in lambda function in GroupBy.transform and join with ID by Series.str.cat:

f = lambda x: pd.factorize(x)[0] + 1
s = df.groupby('ID')['Name'].transform(f).astype(str)
df['ID_new'] = df['ID'].astype(str).str.cat(s, sep='_')
print (df)
  Name  ID ID_new
0    A   1    1_1
1    B   2    2_1
2    A   1    1_1
3    C   3    3_1
4    B   2    2_1
5    D   3    3_2
6    E   1    1_2
7    F   2    2_2