Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Column increment counter and reset based on other columns

I am trying to create a column that counts incrementally as it goes down the rows group by one column and based on yet another column.

So I have this data

      car  status  
0    audi   False      
1    audi   False      
2    audi   False      
3    audi    True      
4     bmw   False     
5     bmw   False      
6     bmw   False      
7     bmw    True      
8     bmw   False      
9   lexus    True      
10  lexus    True      
11  lexus    True

I would like to add a colum that counts how many consecutive False are in status as it goes counting for each car and reset when there is a True to start again if there is another False or another car

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      car  status  counter
0    audi   False  1
1    audi   False  2   
2    audi   False  3   
3    audi    True  0   
4     bmw   False  1   
5     bmw   False  2 
5     bmw   False  3    
6     bmw    True  0    
7     bmw   False  1         
9   lexus    True  0
10  lexus    True  0   
11  lexus   False  1

I am trying this but it increments by car

import pandas as pd

data = [['audi', False],
        ['audi', False],
        ['audi', False],
        ['audi', True],
        ['bmw', False],
        ['bmw', False],
        ['bmw', False],
        ['bmw', True],
        ['bmw', False],
        ['lexus', True],
        ['lexus', True],
        ['lexus', False]]

df = pd.DataFrame(data=data, columns=['car', 'status'])

df['count'] = df.groupby('car')['status'].transform(lambda x: x.ne(x.shift()).cumsum())

print(df)

>Solution :

You can use GroupBy.cumcount with grouping consecutive values, last set 0 for status=True values by Series.mask:

df['count'] = (df.groupby(['car', df['status'].ne(df['status'].shift()).cumsum()])
                 .cumcount()
                 .add(1)
                 .mask(df['status'], 0))

print(df)
      car  status  count
0    audi   False      1
1    audi   False      2
2    audi   False      3
3    audi    True      0
4     bmw   False      1
5     bmw   False      2
6     bmw   False      3
7     bmw    True      0
8     bmw   False      1
9   lexus    True      0
10  lexus    True      0
11  lexus   False      1

Alternative solution, for working with boolean column only:

df['count'] = (df.groupby(['car', (~df['status'] & df['status'].shift(fill_value=False)).cumsum()])
                 .cumcount()
                 .add(1)
                 .mask(df['status'], 0))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading