I have a sentence:
"Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent compared with 2020 adjusted EPS of $1.49"
and would like to get number $1.11 after the first substring "adjusted EPS".
The best regex formula I could come with is:
re.search("^.*Adjusted EPS.*?(\$\d+.\d+).*", text,re.IGNORECASE).group(1)
but this gives me number $1.49 after second occurrence of "adjusted EPS".
How can I modify the search so I get the number $1.11?
>Solution :
The problem here is greedy regex which you use just in the beginning:
^.*Adj ...
^ means the start of the string. Being greedy, .* "eats" as much characters as possible up until the last "adjusted EPS"
There’re two solutions here, either make it non-greedy (i.e. lazy) ^.*?Adj ..., or remove ^.* completely – I see no use of it here