Suppose I have the following table:
| id | name | date | |
|---|---|---|---|
| 1 | Sta | sta@example.com | 11.11.22 |
| 2 | Danny | dany@example.com | 11.11.22 |
| 3 | Elle | elle@example.com | 11.11.22 |
| 4 | Elle | falsemail@example.com | 11.11.22 |
| 5 | Elle | elle@example.com | 12.11.22 |
What is the best way to create an incremental counter for repeating observations for the feature subset [name, date]?
Desired output:
| id | name | date | counter | |
|---|---|---|---|---|
| 1 | Sta | sta@example.com | 11.11.22 | 1 |
| 2 | Danny | dany@example.com | 11.11.22 | 1 |
| 3 | Elle | elle@example.com | 11.11.22 | 1 |
| 4 | Elle | falsemail@example.com | 11.11.22 | 2 |
| 5 | Elle | elle@example.com | 12.11.22 | 1 |
Edit: The table itself is sorted correctly and the duplicates appear after each other.
>Solution :
df['counter'] = df.groupby(['name', 'date']).cumcount() + 1
df
id name mail date counter
0 1 Sta sta@example.com 11.11.22 1
1 2 Danny dany@example.com 11.11.22 1
2 3 Elle elle@example.com 11.11.22 1
3 4 Elle falsemail@example.com 11.11.22 2
4 5 Elle elle@example.com 12.11.22 1