Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python spellcheck function – replace standalone words but not substrings

Scraping thousands of lines of data, I’ve created a spellcheck function for specific terms that are often misspelled, automatically correcting them before writing to file.

This works well if it’s a standalone word like "apple" and I replace it with "orange", but becomes a problem if it’s "pineapple" and turns into "pineorange". As a workaround, I pad the original term with a space on either side, but this causes it to miss out on occurrences where characters like a period are after it, "apple." for example.

What options do I have to improve the handling here? Preferably something other than a bunch of if checks on the last character.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

spelling_dict = {
    "abc" : "ABC",
    "apple" : "Apple",
    "tortose" : "Tortoise"
}

def spellcheck(line):
    for word, correction in spelling_dict.items():
        
        # Pad words with a space on either side
        word = word.center( len(word) + 2 )
        correction = correction.center ( len(correction) + 2 )

        line = line.replace(word, correction)
        
    return line

myphrase = "For apple, I want to capitalize both occurrences of apple."
fixedphrase = spellcheck(myphrase)

print(fixedphrase)

>Solution :

Looks like you want regular expressions. In this case, the pattern (the thing to look for) is the string apple wrapped in word-boundaries \\b:

import re

pattern = "\\bapple\\b"

phrase = "apple pineapple apples and apple."

print(re.sub(pattern, "orange", phrase))

Output:

orange pineapple apples and orange.
>>> 

Notice how apple and apple. were replaced with orange and orange., but pineapple and apples remain unchanged.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading