For the following data df:
id k1 k2
0 1 re_setup oo_setup
1 2 oo_setup oo_setup
2 3 alerting bounce
3 4 bounce re_oversetup
4 5 re_oversetup alerting
5 6 alerting_s re_setup
6 7 re_oversetup oo_setup
7 8 alerting bounce
8 9 alerting_bounce bounce
We hope that: if the K1 and K2 columns include characters setup or bounce, return True. Otherwise, return False. Note that if K1 contains setup and K2 contains bounce, or vice versa, this situation returns False.
How to achieve it? Thanks.
The expected results are as follows:
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_oversetup False
4 5 re_oversetup alerting_bounce False
5 6 alerting_s re_setup False
6 7 re_oversetup oo_setup True
7 8 alerting bounce False
8 9 alerting_bounce bounce True
I try with df['same1'] = df[['k1', 'k2']].apply(lambda x: x.str.contains('setup|bounce')).all(1), it returns the following result:
id k1 k2 same same1
0 1 re_setup oo_setup True True
1 2 oo_setup oo_setup True True
2 3 alerting bounce False False
3 4 bounce re_oversetup False True incorrect result
4 5 re_oversetup alerting_bounce False True incorrect result
5 6 alerting_s re_setup False False
6 7 re_oversetup oo_setup True True
7 8 alerting bounce False False
8 9 alerting_bounce bounce True True
We can see that line 3 and 4 returns the wrong results.
Reference:
If one row in two columns contain the same string python pandas
>Solution :
Use str.extract and compare the result:
s1 = df['k1'].str.extract('(setup|bounce)', expand=False)
s2 = df['k2'].str.extract('(setup|bounce)', expand=False)
df['same'] = s1.eq(s2)
Output:
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_oversetup False
4 5 re_oversetup alerting False
5 6 alerting_s re_setup False
6 7 re_oversetup oo_setup True
7 8 alerting bounce False
8 9 alerting_bounce bounce True
all matches
s1 = df['k1'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
s2 = df['k2'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
df['same_all'] = s1.eq(s2)
Output:
id k1 k2 same_all
0 1 re_setup oo_setup True
1 2a oo_setup bounce_setup False # only 1 match
2 2b setup_bounce bounce_setup True # all matches
3 3 alerting bounce False
4 4 bounce re_oversetup False
5 5 re_oversetup alerting False
6 6 alerting_s re_setup False
7 7 re_oversetup oo_setup True
8 8 alerting bounce False
9 9 alerting_bounce bounce True