regex: cleaning text: remove everything upto a certain line

December 24, 2022

I have a text file containing The Tragedie of Macbeth. I want to clean it and the first step is to remove everything upto the line The Tragedie of Macbeth and store the remaining part in removed_intro_file.

I tried:

import re
filename, title = 'MacBeth.txt', 'The Tragedie of Macbeth'
with open(filename, 'r') as file:
    removed_intro = file.read()
    with open('removed_intro_file', 'w') as output:
        removed = re.sub(title, '', removed_intro)
        print(removed)
        output.write(removed)

The print statement doesn’t print anything so it doesn’t match anything. How can I use regex over several lines? Should one instead use pointers that point to the start and end of the lines to removed? I’d also be glad to know if there is a nicer way to solve this maybe not using regex.

>Solution :

your regex only replaces title with ''; you want to remove the title and all text before it, so search for all characters (including newlines) from the beginning of the string to the title included; this should work (I only tested it on a sample file I wrote):

removed = re.sub(r'(?s)^.*'+re.escape(title), '', removed_intro)

line

byMR

Published December 24, 2022

Add a comment

Why in Flutter, when I go to another page, the text in TextField disappears?

byMR

December 24, 2022

Questions

Cannot push 0 to stack in java

byMR

December 24, 2022

Questions

How to get percentage of object in Java Streams

byMR

December 24, 2022

Questions

not able to display image using a variable to store image URL

byMR

December 24, 2022

Questions

I don't Understand "How Lua save data on the same variable when we Iterate over it"

byMR

December 24, 2022

Questions

Tailwind CSS hidden and visible

byMR

December 24, 2022

regex: cleaning text: remove everything upto a certain line