Say for example that we have the data set that looks like
| client | client case | client payment |
|---|---|---|
| foo | 1 | 200 |
| foo | 1 | 250 |
| foo | 1 | 225 |
| foo | 2 | 200 |
| bar | 1 | 500 |
| bar | 1 | 500 |
Within Pandas, I can set client and client case to be indexes, but I would like to number these indexes within groups to have an additional column that becomes something like below.
| client | client case | client payment | payment number |
|---|---|---|---|
| foo | 1 | 200 | 1 |
| foo | 1 | 250 | 2 |
| foo | 1 | 225 | 3 |
| foo | 2 | 200 | 1 |
| bar | 1 | 500 | 1 |
| bar | 1 | 500 | 2 |
Essentially just creating some kind of payment numbering system within each client case.
I could theoretically do this by creating some kind of list by iterating over the rows but I know that this isn’t the optimal or suggested way of using Pandas.
>Solution :
this would do
df['payment number'] = 1 + df.groupby(['client', 'client case']).cumcount()
output
client client case client payment payment number
0 foo 1 200 1
1 foo 1 250 2
2 foo 1 225 3
3 foo 2 200 1
4 bar 1 500 1
5 bar 1 500 2