How to auto increment counter by repeteaded values in a column

February 7, 2022

I Have a data frame with the column name and I need to create the column seq, which allows me identify the different times that a name appears in the data frame, it’s important to preserve the order.

import pandas as pd  
  
data = {'name': ['Tom', 'Joseph','Joseph','Joseph', 'Tom', 'Tom', 'John','Tom','Tom','John','Joseph']
        , 'seq': ['Tom 0', 'Joseph 0','Joseph 0','Joseph 0', 'Tom 1', 'Tom 1', 'John 0','Tom 2','Tom 2','John 1','Joseph 1']}  
  
df = pd.DataFrame(data)  
  
print(df)

      name       seq
0      Tom     Tom 0
1   Joseph  Joseph 0
2   Joseph  Joseph 0
3   Joseph  Joseph 0
4      Tom     Tom 1
5      Tom     Tom 1
6     John    John 0
7      Tom     Tom 2
8      Tom     Tom 2
9     John    John 1
10  Joseph  Joseph 1

>Solution :

Create a boolean mask to know if the name has changed from the previous row. Then filter out the second, third, … names of a sequence before grouping by name. cumcount increment the sequence number and finally concatenate name and sequence number.

# Boolean mask
m = df['name'].ne(df['name'].shift())

# Create sequence number
seq = df.loc[m].groupby('name').cumcount().astype(str) \
        .reindex(df.index, fill_value=pd.NA).ffill()

# Concatenate name and seq
df['seq'] = df['name'] + ' ' + seq

Output:

>>> df
      name       seq
0      Tom     Tom 0
1   Joseph  Joseph 0
2   Joseph  Joseph 0
3   Joseph  Joseph 0
4      Tom     Tom 1
5      Tom     Tom 1
6     John    John 0
7      Tom     Tom 2
8      Tom     Tom 2
9     John    John 1
10  Joseph  Joseph 1

>>> m
0      True
1      True
2     False
3     False
4      True
5     False
6      True
7      True
8     False
9      True
10     True
Name: name, dtype: bool