Home If multiple columns contains several characters simultaneously then return True

Questions

If multiple columns contains several characters simultaneously then return True

February 24, 2023

For the following data df:

   id               k1            k2
0   1         re_setup      oo_setup
1   2         oo_setup      oo_setup
2   3         alerting        bounce
3   4           bounce  re_oversetup
4   5     re_oversetup      alerting
5   6       alerting_s      re_setup
6   7     re_oversetup      oo_setup
7   8         alerting        bounce
8   9  alerting_bounce        bounce

We hope that: if the K1 and K2 columns include characters setup or bounce, return True. Otherwise, return False. Note that if K1 contains setup and K2 contains bounce, or vice versa, this situation returns False.

How to achieve it? Thanks.

The expected results are as follows:

   id               k1               k2   same
0   1         re_setup         oo_setup   True
1   2         oo_setup         oo_setup   True
2   3         alerting           bounce  False
3   4           bounce     re_oversetup  False
4   5     re_oversetup  alerting_bounce  False
5   6       alerting_s         re_setup  False
6   7     re_oversetup         oo_setup   True
7   8         alerting           bounce  False
8   9  alerting_bounce           bounce   True

I try with df['same1'] = df[['k1', 'k2']].apply(lambda x: x.str.contains('setup|bounce')).all(1), it returns the following result:

   id               k1               k2   same  same1
0   1         re_setup         oo_setup   True   True
1   2         oo_setup         oo_setup   True   True
2   3         alerting           bounce  False  False
3   4           bounce     re_oversetup  False   True  incorrect result
4   5     re_oversetup  alerting_bounce  False   True  incorrect result
5   6       alerting_s         re_setup  False  False
6   7     re_oversetup         oo_setup   True   True
7   8         alerting           bounce  False  False
8   9  alerting_bounce           bounce   True   True

We can see that line 3 and 4 returns the wrong results.

Reference:

If one row in two columns contain the same string python pandas

>Solution :

Use str.extract and compare the result:

s1 = df['k1'].str.extract('(setup|bounce)', expand=False)
s2 = df['k2'].str.extract('(setup|bounce)', expand=False)

df['same'] = s1.eq(s2)

Output:

   id               k1            k2   same
0   1         re_setup      oo_setup   True
1   2         oo_setup      oo_setup   True
2   3         alerting        bounce  False
3   4           bounce  re_oversetup  False
4   5     re_oversetup      alerting  False
5   6       alerting_s      re_setup  False
6   7     re_oversetup      oo_setup   True
7   8         alerting        bounce  False
8   9  alerting_bounce        bounce   True

all matches

s1 = df['k1'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
s2 = df['k2'].str.extractall('(setup|bounce)')[0].groupby(level=0).agg(set)
df['same_all'] = s1.eq(s2)

Output:

   id               k1            k2  same_all
0   1         re_setup      oo_setup      True
1  2a         oo_setup  bounce_setup     False  # only 1 match
2  2b     setup_bounce  bounce_setup      True  # all matches
3   3         alerting        bounce     False
4   4           bounce  re_oversetup     False
5   5     re_oversetup      alerting     False
6   6       alerting_s      re_setup     False
7   7     re_oversetup      oo_setup      True
8   8         alerting        bounce     False
9   9  alerting_bounce        bounce      True