I want to filter my dataframe values based on the occurrence of ‘1’ in my column events. When a 1 occurres, everything after the 1 should be removed.
I want to do this for my whole dataframe, which looks like this:
import pandas as pd
df = pd.DataFrame([['00000000000 ', [4, 5, 5, 3, 2, 1, 5]],
['00000000001', [4, 5, 5, 1, 2, 1, 5, 5, 5]],
['00000000002 ', [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]]],
columns=['session_id', 'events'])
This works with the following solution, like answered in this question.
df['events_short'] = ""
for i, row in df.iterrows():
df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]
This only works if the ‘1’ occurs, when it doesn’t, I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-175-e4d3f228e32f> in <module>()
1 df['events_short'] = ""
2 for i, row in df.iterrows():
----> 3 df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]
ValueError: 1 is not in list
Therefore, I need an exception, for when the 1 does not occur in the array. Can someone help me to set this up? Thanks!
>Solution :
While @OnY’s answer is nice, it requires to read twice each list (once to find if the index is existing, once to find it).
A more efficient approach might be to use a helper function with try/except:
def upto1(l):
try:
return l[:l.index(1)]
except ValueError:
return l
df['events2'] = df['events'].apply(upto1)
example:
session_id events events2
0 00000000000 [4, 5, 5, 3, 2, 1, 5] [4, 5, 5, 3, 2]
1 00000000001 [4, 5, 5, 1, 2, 1, 5, 5, 5] [4, 5, 5]
2 00000000002 [4, 5, 1, 3, 2, 1, 5, 5, 5, 1] [4, 5]
3 00000000003 [0, 2, 3] [0, 2, 3]