My dataframe is this :
df = pd.DataFrame({'col1': ['A', 'A', 'A', 'A', 'B', 'C', 'C'],
'col2': ['a1', 'a1', 'a2', 'a2', 'b1', 'c1', 'c2']})
print(df)
col1 col2
0 A a1
1 A a1
2 A a2
3 A a2
4 B b1
5 C c1
6 C c2
I want to add a column that will tell the rank of the column col2 in each value of col1.
I tried three codes but none of them gives the correct result :
df['col3_1'] = df.groupby(['col1', 'col2']).cumcount() + 1
df['col3_2'] = df.groupby(['col1'])['col2'].cumcount() + 1
df['col3_3'] = df.groupby(['col1', 'col2']).ngroup() + 1
print(df)
col1 col2 col3_1 col3_2 col3_3
0 A a1 1 1 1
1 A a1 2 2 1
2 A a2 1 3 2
3 A a2 2 4 2
4 B b1 1 1 3
5 C c1 1 1 4
6 C c2 1 2 5
PS : the real values of the column col2 doesn’t have a pattern letter-number.
My expected output is this :
col1 col2 col3
0 A a1 1
1 A a1 1
2 A a2 2
3 A a2 2
4 B b1 1
5 C c1 1
6 C c2 2
Can you guys show me how to do it ?
>Solution :
I think this does what you want:
df['col3'] = df.groupby(['col1'])['col2'].transform(lambda x: pd.factorize(x)[0] + 1)
pd.factorize assigns an integer to each unique value, it returns a tuple of the integer codes and the values. We add 1 because it starts indexing from 0.