Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python regex to find all strings that start with ' and end with '.tr ignoring leading and trailing whitespaces

I am struggling to get the correct regex for my script. I would like to find all Substrings in a file that start with a ' and end with '.tr. And save all these matches in a list.

This is what Ive got so far:

import glob
import pathlib
import re
       
libPathString = str(pathlib.Path.cwd().parent.resolve()) 

for path in glob.glob(libPathString + "/**", recursive=True):
    if(".dart" in path):
        with open(path, 'r+', encoding="utf-8") as file:
            data = [line.strip() for line in file.readlines()]
            data = ''.join(data)
            words = re.findall(r'\'.*\'.tr', data)
            print(words)

The first problem is that words is not just the matching substring but the whole file until the substring.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Also it is giving me this file:

  child: Hero(
    tag: heroTag ?? '',  // <- because of this and the line below starts with `tr`
    transitionOnUserGestures: true,
    child: Material(

But this should not match!

And then it is not finding this:

  AutoSizeText(
      'Das ist ein langer Text, der immer in einer Zeile ist.'
          .tr,
      style: AppTextStyles.montserratH4Regular,

This one should match!

What am I missing here?

>Solution :

You can use

words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)

See the Python demo. Details:

  • '[^'\\]*(?:\\.[^'\\]*)*'', zero or more chars other than ' and \, and then zero or more sequences of a \ followed with any single char and any zero or more chars other than ' and \ (this will match strings between ' chars with any escaped chars in between)
  • \s* – zero or more whitespaces (this will match any whitespace, including line breaks)
  • \.tr.tr string (note the escaped . that now matches a litera dot)
  • \b – word boundary.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading