I have a DataFrame like this:
df = pd.DataFrame({
'idA': [11,25,11,45,11,25,35],
'idB': [30,40,45,20,30,40,90]
})
I want to create a new column includes a unique value with respect to the idA and idB by each row. In other words, whenever a pair appears in any other place, the value should be exactly the same. But any other pair – including the reverse one – must be different. To overcome this, I thought to apply a made up math function like this:
df['NewValue'] = df.min(axis=1)*(df['idA']*10 + 15*df['idB'])
# idA idB NewValue
# 11 30 6160
# 25 40 21250
# 11 45 8635
# 45 20 15000
# 11 30 6160
# 25 40 21250
# 35 90 59500
It seems working. But by following this, NewValue can take any number. I desire to have a simple incrementing integer instead. For this example :
NewValue = [1,2,3,4,1,2,5]
In reality, I have thousands of rows. So I am also seeking for an efficient solution. Any way to handle this?
>Solution :
IIUC, you could use groupby + ngroup:
df['NewValue'] = df.groupby(['idA','idB'], sort=False).ngroup().add(1)
Output:
idA idB NewValue
0 11 30 1
1 25 40 2
2 11 45 3
3 45 20 4
4 11 30 1
5 25 40 2
6 35 90 5