Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check substring with 1 character difference in python

I have list of thousand of strings sub_strings, and list of millions of strings strings

I want to check if strings[i] have substring from sub_strings or substring that’s different in 1 character

sub_strings = ['hello']
strings =     ['hell dude whats good',
               'hllo',
               'hallo',
               'hello',
               'dude whats good']

is_substring_no_more_then_1_differnce(strings , sub_strings)

expected

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[True, True, True, True, False]

>Solution :

Here’s a route for solving:

from Levenshtein import distance as lev

sub_strings = ['hello', 'bob']
strings =     ['hell dude whats good',
               'hllo',
               'hallo',
               'hello',
               'dude whats good',
               'bobo wants some food']

distance_list = []
for sentence in strings:
    distance_list.append(min([lev(word, substring) for word in sentence.split() for substring in sub_strings]))

print([x <= 1 for x in distance_list])

That will spit out [True, True, True, True, False, True]

But this is going to get very slow as you add elements to either list. Every individual word inside a string inside strings has to be checked against every word in substrings. That’s a lot of checking when you have millions of words in each string in strings and thousands of subtrings.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading