Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check if word contains substrings

Question

Consider the following:

word = 'analphabetic'
df = pd.DataFrame({'substring': list('abcdefgh') + ['ab', 'phobic']})

substring is not necessarily a single letter!

I want to add a column with the name of word and each row it shows True/False whether the substring in that row is in word. Can I do this with a built-in pandas method?

Desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  substring  analphabetic
0         a          True
1         b          True
2         c          True
3         d         False
4         e          True
5         f         False
6         g         False
7         h          True
8         ab         True
9         phobic    False

pandas.Series.str.contains

The other way around can be done by doing something like df.substring.str.contains(word). I guess you could do something like:

df[word] = [i in word for i in df.substring]

But then the built-in function str.contains() could be done by:

string = 'a'
df = pd.DataFrame({'words': ['these', 'are', 'some', 'random', 'words']})
df[string] = [string in i for i in df.words]

So my thought is that there is also a built-in method to do my trick.

>Solution :

A possible solution (which should work for substrings longer than a single letter):

df['analphabetic'] = df['substring'].map(lambda x: x in word)

Output:

  substring  analphabetic
0         a          True
1         b          True
2         c          True
3         d         False
4         e          True
5         f         False
6         g         False
7         h          True

Using list comprehension:

df['analphabetic'] = [x in word for x in df.substring]

Using apply:

df['analphabetic'] = df['substring'].apply(lambda x: x in word)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading