I have a table with data:
Col1
0 1.0
1 1.0
2 1.0
3 2.0
4 3.0
5 4.0
6 NaN
How can I rank only duplicated values (without taking into account NaN as well)?
My current output is where unfortunately unique values are ranked as well:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 1.0
4 3.0 1.0
5 4.0 1.0
6 NaN NaN
The output I need is:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
Example of the code:
import numpy as np
import pandas as pd
df = pd.DataFrame([[1],
[1],
[1],
[2],
[3],
[4],
[np.NaN]], columns=['Col1'])
print(df)
# Adding row_number for each pair:
df['Rn'] = df[df['Col1'].notnull()].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
# I managed to select only necessary rows for mask, but how can I apply it along with groupby?:
m = df.dropna().loc[df['Col1'].duplicated(keep=False)]
print(m)
Thank you!
>Solution :
Try:
m = df['Col1'].duplicated(keep=False)
df['Rn'] = df[m].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
Prints:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN