Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

check if string contains another column value

I have a dataframe and would like to check if a column value contains another column value.

    name1   name2
0   aa      aab
1   xyz     x

the below doesn’t work

df = df.assign(name1_contains_name2=df.name1.str.contains(df.name2),
            name2_contains_name1=df.name2.str.contains(df.name1))

but I would like to get the below dataframe

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    name1   name2   name1_contains_name2    name2_contains_name1
0   aa      aab     False                   True
1   xyz     x       True                    False

How can I write it?

>Solution :

If need test per rows use list comprehensions:

z = list(zip(df.name1, df.name2))
out = df.assign(name1_contains_name2=[b in a for a, b in z],
               name2_contains_name1=[a in b for a, b in z])
print (out)
  name1 name2  name1_contains_name2  name2_contains_name1
0    aa   aab                 False                  True
1   xyz     x                  True                 False

Or use one list comprehension with DataFrame constructor:

out = df.join(pd.DataFrame([[(b in a), (a in b)] for a, b in zip(df.name1, df.name2)], 
                  columns=['name1_contains_name','name2_contains_name1'],
                  index=df.index))

print (out)
  name1 name2  name1_contains_name  name2_contains_name1
0    aa   aab                False                  True
1   xyz     x                 True                 False

If need test per all values use | with join for regex OR:

out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
                name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
print (out)

  name1 name2  name1_contains_name2  name2_contains_name1
0    aa   aab                 False                  True
1   xyz     x                  True                 False

Difference is possible seen in added new line to DataFrame:

print (df)
  name1 name2
0    aa   aab
1   xyz     x
2     b  xyza

out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
                name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
print (out)
  name1 name2  name1_contains_name2  name2_contains_name1
0    aa   aab                 False                  True
1   xyz     x                  True                 False
2     b  xyza                 False                  True

z = list(zip(df.name1, df.name2))
out1 = df.assign(name1_contains_name2=[b in a for a, b in z],
               name2_contains_name1=[a in b for a, b in z])
print (out1)
  name1 name2  name1_contains_name2  name2_contains_name1
0    aa   aab                 False                  True
1   xyz     x                  True                 False
2     b  xyza                 False                 False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading