I’m want to write a script that will identify instances where a word (string) appears in a row of a pandas dataframe more than once.
Using a lambda function I can identify the existence of a string in a row but but I can’t find any information on how to identify ‘2 or more’ instances of the string, this is an example of what I have currently:
df = pd.DataFrame({'ID':[1,2,3],'Ans1':['Yes','Yes','Yes'],'Ans2':['No','Yes','No'],'Ans3':['No','No','No']})
df['Result'] = df.apply(lambda row: row.astype(str).str.contains('Yes').any(), axis=1)
df
Pseudocode for what I’m trying to get:
if 'Yes' isin row > 1:
df['Results'] == True
Desired result:
ID Ans1 Ans2 Ans3 Result
1 Yes No No False
2 Yes Yes No True
3 Yes No No False
>Solution :
Try, you can do column filtering if you don’t want to check the entire dataframe for yes, then use eq, equals to, and sum with axis=1 to sum values along rows then check to see if that sum is gt, greater than, 1:
df['Result'] = df.eq('Yes').sum(1).gt(1)
Output:
ID Ans1 Ans2 Ans3 Result
0 1 Yes No No False
1 2 Yes Yes No True
2 3 Yes No No False