Count consecutive elements in Pandas list

June 17, 2024

I have created the following pandas dataframe:

import pandas as pd
import numpy as np

ds = {'col1':[1,"S",3,4,"S"], 'col2' : [6,"S",8,9,"S"],'col3' : [67,None,87,79,"S"]}

df = pd.DataFrame(data=ds)

df['col4']= df[['col1','col2','col3']].values.tolist()

The dataframe looks like this:

print(df)

  col1 col2  col3          col4
0    1    6    67    [1, 6, 67]
1    S    S  None  [S, S, None]
2    3    8    87    [3, 8, 87]
3    4    9    79    [4, 9, 79]
4    S    S     S     [S, S, S]

For each record, I need to calculate the number of consecutive "S" inside col4. The resulting dataframe would look like this:

  col1 col2  col3          col4   iCount
0    1    6    67    [1, 6, 67]      0
1    S    S  None  [S, S, None]      2
2    3    8    87    [3, 8, 87]      0
3    4    9    79    [4, 9, 79]      0
4    S    S     S     [S, S, S]      3

I have tried this code:

col4 = np.array(df['col4'])
iCount = 0
for i in range(len(df)):
    for j in range(len(col4[i])):

        if(col4[i][j] == "S"):
            iCount += 1
    
        else:
            iCount = 0

df['iCount'] = iCount

But I get the following dataframe:

  col1 col2  col3          col4  iCount
0    1    6    67    [1, 6, 67]       3
1    S    S  None  [S, S, None]       3
2    3    8    87    [3, 8, 87]       3
3    4    9    79    [4, 9, 79]       3
4    S    S     S     [S, S, S]       3

Please can someone help me find the error?

>Solution :

I would use itertools.groupby:

from itertools import groupby

def consec(lst):
    return max((len(list(g)) for k,g in
                groupby(lst, lambda x: x=='S') if k), default=0)

df['iCount'] = df['col4'].map(consec)

NB. using max here to get the longest sequence since there could be more than one stretch of S’s, but you could use min/sum or any other logic.

If you are sure there is a maximum of one series of S per list, you could simplify to:

df['iCount'] = [sum(x=='S' for x in lst) for lst in df['col4']]

Output:

  col1 col2  col3          col4  iCount
0    1    6    67    [1, 6, 67]       0
1    S    S  None  [S, S, None]       2
2    3    8    87    [3, 8, 87]       0
3    4    9    79    [4, 9, 79]       0
4    S    S     S     [S, S, S]       3