I am struggling to get the correct regex for my script. I would like to find all Substrings in a file that start with a ' and end with '.tr. And save all these matches in a list.
This is what Ive got so far:
import glob
import pathlib
import re
libPathString = str(pathlib.Path.cwd().parent.resolve())
for path in glob.glob(libPathString + "/**", recursive=True):
if(".dart" in path):
with open(path, 'r+', encoding="utf-8") as file:
data = [line.strip() for line in file.readlines()]
data = ''.join(data)
words = re.findall(r'\'.*\'.tr', data)
print(words)
The first problem is that words is not just the matching substring but the whole file until the substring.
Also it is giving me this file:
child: Hero(
tag: heroTag ?? '', // <- because of this and the line below starts with `tr`
transitionOnUserGestures: true,
child: Material(
But this should not match!
And then it is not finding this:
AutoSizeText(
'Das ist ein langer Text, der immer in einer Zeile ist.'
.tr,
style: AppTextStyles.montserratH4Regular,
This one should match!
What am I missing here?
>Solution :
You can use
words = re.findall(r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b", data)
See the Python demo. Details:
'[^'\\]*(?:\\.[^'\\]*)*'–', zero or more chars other than'and\, and then zero or more sequences of a\followed with any single char and any zero or more chars other than'and\(this will match strings between'chars with any escaped chars in between)\s*– zero or more whitespaces (this will match any whitespace, including line breaks)\.tr–.trstring (note the escaped.that now matches a litera dot)\b– word boundary.