For the following data df
:
id k1 k2
0 1 re_setup oo_setup
1 2 oo_setup oo_setup
2 3 alerting bounce
3 4 bounce re_oversetup
4 5 re_oversetup alerting
5 6 alerting_s re_setup
6 7 re_oversetup oo_setup
7 8 alerting bounce
8 9 alerting_bounce bounce
We hope that: if the K1
and K2
columns include characters setup
or bounce
, return True
. Otherwise, return False
. Note that if K1
contains setup
and K2
contains bounce
, or vice versa, this situation returns False
.
How to achieve it? Thanks.
The expected results are as follows:
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_oversetup False
4 5 re_oversetup alerting_bounce False
5 6 alerting_s re_setup False
6 7 re_oversetup oo_setup True
7 8 alerting bounce False
8 9 alerting_bounce bounce True
I try with df['same1'] = df[['k1', 'k2']].apply(lambda x: x.str.contains('setup|bounce')).all(1)
, it returns the following result:
id k1 k2 same same1
0 1 re_setup oo_setup True True
1 2 oo_setup oo_setup True True
2 3 alerting bounce False False
3 4 bounce re_oversetup False True incorrect result
4 5 re_oversetup alerting_bounce False True incorrect result
5 6 alerting_s re_setup False False
6 7 re_oversetup oo_setup True True
7 8 alerting bounce False False
8 9 alerting_bounce bounce True True
We can see that line 3 and 4 returns the wrong results.
Reference:
If one row in two columns contain the same string python pandas
>Solution :
Use str.extract
and compare the result:
s1 = df['k1'].str.extract('(setup|bounce)', expand=False)
s2 = df['k2'].str.extract('(setup|bounce)', expand=False)
df['same'] = s1.eq(s2)
Output:
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_oversetup False
4 5 re_oversetup alerting False
5 6 alerting_s re_setup False
6 7 re_oversetup oo_setup True
7 8 alerting bounce False
8 9 alerting_bounce bounce True
all matches
s1 = df['k1'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
s2 = df['k2'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
df['same_all'] = s1.eq(s2)
Output:
id k1 k2 same_all
0 1 re_setup oo_setup True
1 2a oo_setup bounce_setup False # only 1 match
2 2b setup_bounce bounce_setup True # all matches
3 3 alerting bounce False
4 4 bounce re_oversetup False
5 5 re_oversetup alerting False
6 6 alerting_s re_setup False
7 7 re_oversetup oo_setup True
8 8 alerting bounce False
9 9 alerting_bounce bounce True