Home Iterating over lists in pandas dataframe to remove everything after certain value (if the value exists) in list

Questions

Iterating over lists in pandas dataframe to remove everything after certain value (if the value exists) in list

January 6, 2022

I want to filter my dataframe values based on the occurrence of ‘1’ in my column events. When a 1 occurres, everything after the 1 should be removed.

I want to do this for my whole dataframe, which looks like this:

import pandas as pd

df = pd.DataFrame([['00000000000 ', [4, 5, 5, 3, 2, 1, 5]],
                   ['00000000001', [4, 5, 5, 1, 2, 1, 5, 5, 5]],
                   ['00000000002 ', [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]]],
                  columns=['session_id', 'events'])

This works with the following solution, like answered in this question.

df['events_short'] = ""
for i, row in df.iterrows():
    df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]

This only works if the ‘1’ occurs, when it doesn’t, I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-175-e4d3f228e32f> in <module>()
      1 df['events_short'] = ""
      2 for i, row in df.iterrows():
----> 3     df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]

ValueError: 1 is not in list

Therefore, I need an exception, for when the 1 does not occur in the array. Can someone help me to set this up? Thanks!

>Solution :

While @OnY’s answer is nice, it requires to read twice each list (once to find if the index is existing, once to find it).

A more efficient approach might be to use a helper function with try/except:

def upto1(l):
    try:
        return l[:l.index(1)]
    except ValueError:
        return l
    
df['events2'] = df['events'].apply(upto1)

example:

    session_id                          events          events2
0  00000000000           [4, 5, 5, 3, 2, 1, 5]  [4, 5, 5, 3, 2]
1  00000000001     [4, 5, 5, 1, 2, 1, 5, 5, 5]        [4, 5, 5]
2  00000000002  [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]           [4, 5]
3  00000000003                       [0, 2, 3]        [0, 2, 3]