I am trying to create a column that counts incrementally as it goes down the rows group by one column and based on yet another column.
So I have this data
car status
0 audi False
1 audi False
2 audi False
3 audi True
4 bmw False
5 bmw False
6 bmw False
7 bmw True
8 bmw False
9 lexus True
10 lexus True
11 lexus True
I would like to add a colum that counts how many consecutive False are in status as it goes counting for each car and reset when there is a True to start again if there is another False or another car
car status counter
0 audi False 1
1 audi False 2
2 audi False 3
3 audi True 0
4 bmw False 1
5 bmw False 2
5 bmw False 3
6 bmw True 0
7 bmw False 1
9 lexus True 0
10 lexus True 0
11 lexus False 1
I am trying this but it increments by car
import pandas as pd
data = [['audi', False],
['audi', False],
['audi', False],
['audi', True],
['bmw', False],
['bmw', False],
['bmw', False],
['bmw', True],
['bmw', False],
['lexus', True],
['lexus', True],
['lexus', False]]
df = pd.DataFrame(data=data, columns=['car', 'status'])
df['count'] = df.groupby('car')['status'].transform(lambda x: x.ne(x.shift()).cumsum())
print(df)
>Solution :
You can use GroupBy.cumcount with grouping consecutive values, last set 0 for status=True values by Series.mask:
df['count'] = (df.groupby(['car', df['status'].ne(df['status'].shift()).cumsum()])
.cumcount()
.add(1)
.mask(df['status'], 0))
print(df)
car status count
0 audi False 1
1 audi False 2
2 audi False 3
3 audi True 0
4 bmw False 1
5 bmw False 2
6 bmw False 3
7 bmw True 0
8 bmw False 1
9 lexus True 0
10 lexus True 0
11 lexus False 1
Alternative solution, for working with boolean column only:
df['count'] = (df.groupby(['car', (~df['status'] & df['status'].shift(fill_value=False)).cumsum()])
.cumcount()
.add(1)
.mask(df['status'], 0))