My df looks as follows:
import pandas as pd
d = {'col1': [1,2,3,3,1,2,2,3,4,1,1,2]
df= pd.DataFrame(data=d)
Now I want to add a new column with the following schemata:
| col1 | new_col |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 3 | 3 |
| 3 | 3 |
| 1 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 1 | 8 |
| 1 | 8 |
| 2 | 9 |
Once it starts again at 1 it should just keep counting.
At the moment I am at the point where I just add a column with difference:
df['diff'] = df['col1'].diff()
How to extend this approach?
>Solution :
Try with
df.col1.diff().ne(0).cumsum()
Out[94]:
0 1
1 2
2 3
3 3
4 4
5 5
6 5
7 6
8 7
9 8
10 8
11 9
Name: col1, dtype: int32