Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to sub first character in a text file with regex in Python

I have some lower case texts and I’m trying to amend all of the lower case ‘i’ to uppercase ‘I’ (including " i’" to " I’"). I have a text file with this text to test my code (‘i_test.txt)

    i am a bumble 
    bee i'm not the 
    prodigal son
    i'll stop talking 
    now it's all done i think

This script amends all cases except for the first character in the text file (adapted from How can I do multiple substitutions using regex?):

    import re, os

    file = 'i_test.txt'

    def multiple_replace(dict, text):
      # Create a regular expression  from the dictionary keys
      regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

      # For each match, look-up corresponding value in dictionary
      return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

    if __name__ == "__main__":

        dict = {
        "^i " : "I ",
        " i " : " I ",
        "\ni " : "\nI ",
        " i'" : " I'",
        "\ni'" : "\nI'",
        "^i'" : "I'",
        }

    with open(file) as text:
        new_text = multiple_replace(dict, text.read())
    with open("i_out.txt", "w") as result:
        result.write(new_text)

The output is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    i am a bumble 
    bee I'm not the 
    prodigal son
    I'll stop talking 
    now it's all done I think

In the dictionary I am searching for patterns of ‘i’ preceded by and followed by a space, preceded by a new line and followed by a space (+ similar patterns for i’). I attempted to amend the first character with this regex

    "^i " : "I ",

But it doesn’t work, is there a way to sub the first character in a text file?

>Solution :

You may not need a map covering all possible occurrences of the first person singular pronoun. I believe a regex replacement on \bi\b should give the result you want:

inp = """i am a bumble 
bee i'm not the 
prodigal son
i'll stop talking 
now it's all done i think"""
output = re.sub(r'\bi\b', 'I', inp)
print(output)

This prints:

I am a bumble 
bee I'm not the 
prodigal son
I'll stop talking 
now it's all done I think
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading