I have a pandas dataframe that looks like this.
col1 col2 Col3 target
1 3 7 0
2 4 8 1
1 3 7 1
2 4 8 0
1 3 7 1
2 4 8 0
1 3 7 1
2 4 8 1
1 3 7 1
2 4 8 1
2 4 8 0
1 3 7 0
2 4 8 0
1 3 7 0
I would like to count the past identical target occurrences.
For example, if the current row target is 1 and the last 3 row target 1 as well, then I would like to have the value 3 in the count column.
More examples:
Current row target is 1; Last row target is 0; Count is 0.
Current row target is 1; Last row target is 1; Count is 1.
Current row target is 1; Last 2 row target is 1; Count is 2.
Current row target is 0; Last 2 row target is 0; Count is 2.
This is how the modified df looks like.
col1 col2 Col3 target count
1 3 7 0 0
2 4 8 1 0
1 3 7 1 1
2 4 8 0 0
1 3 7 1 0
2 4 8 0 0
1 3 7 1 0
2 4 8 1 1
1 3 7 1 2
2 4 8 1 3
2 4 8 0 0
1 3 7 0 1
2 4 8 0 2
1 3 7 0 3
Is there any easier way to do this in pandas?
>Solution :
Use GroupBy.cumcount with consecutive groups created by compared shifted values:
df['count'] = df.groupby(df['target'].ne(df['target'].shift()).cumsum()).cumcount()
print (df)
col1 col2 Col3 target count
0 1 3 7 0 0
1 2 4 8 1 0
2 1 3 7 1 1
3 2 4 8 0 0
4 1 3 7 1 0
5 2 4 8 0 0
6 1 3 7 1 0
7 2 4 8 1 1
8 1 3 7 1 2
9 2 4 8 1 3
10 2 4 8 0 0
11 1 3 7 0 1
12 2 4 8 0 2
13 1 3 7 0 3
Details:
print (df['target'].ne(df['target'].shift()).cumsum())
0 1
1 2
2 2
3 3
4 4
5 5
6 6
7 6
8 6
9 6
10 7
11 7
12 7
13 7
Name: target, dtype: int32