Pandas count past identical occurrences

March 1, 2022

I have a pandas dataframe that looks like this.

   col1  col2  Col3  target
    1     3     7      0
    2     4     8      1
    1     3     7      1
    2     4     8      0
    1     3     7      1
    2     4     8      0
    1     3     7      1 
    2     4     8      1
    1     3     7      1
    2     4     8      1
    2     4     8      0
    1     3     7      0
    2     4     8      0
    1     3     7      0

I would like to count the past identical target occurrences.

For example, if the current row target is 1 and the last 3 row target 1 as well, then I would like to have the value 3 in the count column.

More examples:

Current row target is 1; Last row target is 0; Count is 0.

Current row target is 1; Last row target is 1; Count is 1.

Current row target is 1; Last 2 row target is 1; Count is 2.

Current row target is 0; Last 2 row target is 0; Count is 2.

This is how the modified df looks like.

   col1  col2  Col3  target  count
    1     3     7      0      0
    2     4     8      1      0
    1     3     7      1      1
    2     4     8      0      0
    1     3     7      1      0
    2     4     8      0      0
    1     3     7      1      0
    2     4     8      1      1
    1     3     7      1      2
    2     4     8      1      3
    2     4     8      0      0
    1     3     7      0      1
    2     4     8      0      2
    1     3     7      0      3

Is there any easier way to do this in pandas?

>Solution :

Use GroupBy.cumcount with consecutive groups created by compared shifted values:

df['count'] = df.groupby(df['target'].ne(df['target'].shift()).cumsum()).cumcount()

print (df)
    col1  col2  Col3  target  count
0      1     3     7       0      0
1      2     4     8       1      0
2      1     3     7       1      1
3      2     4     8       0      0
4      1     3     7       1      0
5      2     4     8       0      0
6      1     3     7       1      0
7      2     4     8       1      1
8      1     3     7       1      2
9      2     4     8       1      3
10     2     4     8       0      0
11     1     3     7       0      1
12     2     4     8       0      2
13     1     3     7       0      3

Details:

print (df['target'].ne(df['target'].shift()).cumsum())
0     1
1     2
2     2
3     3
4     4
5     5
6     6
7     6
8     6
9     6
10    7
11    7
12    7
13    7
Name: target, dtype: int32