Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas count past identical occurrences

I have a pandas dataframe that looks like this.

   col1  col2  Col3  target
    1     3     7      0
    2     4     8      1
    1     3     7      1
    2     4     8      0
    1     3     7      1
    2     4     8      0
    1     3     7      1 
    2     4     8      1
    1     3     7      1
    2     4     8      1
    2     4     8      0
    1     3     7      0
    2     4     8      0
    1     3     7      0 

I would like to count the past identical target occurrences.

For example, if the current row target is 1 and the last 3 row target 1 as well, then I would like to have the value 3 in the count column.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

More examples:

Current row target is 1; Last row target is 0; Count is 0.

Current row target is 1; Last row target is 1; Count is 1.

Current row target is 1; Last 2 row target is 1; Count is 2.

Current row target is 0; Last 2 row target is 0; Count is 2.

This is how the modified df looks like.

   col1  col2  Col3  target  count
    1     3     7      0      0
    2     4     8      1      0
    1     3     7      1      1
    2     4     8      0      0
    1     3     7      1      0
    2     4     8      0      0
    1     3     7      1      0
    2     4     8      1      1
    1     3     7      1      2
    2     4     8      1      3
    2     4     8      0      0
    1     3     7      0      1
    2     4     8      0      2
    1     3     7      0      3

Is there any easier way to do this in pandas?

>Solution :

Use GroupBy.cumcount with consecutive groups created by compared shifted values:

df['count'] = df.groupby(df['target'].ne(df['target'].shift()).cumsum()).cumcount()

print (df)
    col1  col2  Col3  target  count
0      1     3     7       0      0
1      2     4     8       1      0
2      1     3     7       1      1
3      2     4     8       0      0
4      1     3     7       1      0
5      2     4     8       0      0
6      1     3     7       1      0
7      2     4     8       1      1
8      1     3     7       1      2
9      2     4     8       1      3
10     2     4     8       0      0
11     1     3     7       0      1
12     2     4     8       0      2
13     1     3     7       0      3

Details:

print (df['target'].ne(df['target'].shift()).cumsum())
0     1
1     2
2     2
3     3
4     4
5     5
6     6
7     6
8     6
9     6
10    7
11    7
12    7
13    7
Name: target, dtype: int32
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading