Advertisements
I’m parsing name strings that have strange compound formations.
The current formation that’s giving me a problem is these names:
Edward St. Loe Livermore
Henry St. George Tucker III
Henry St. John
This pattern (.*)(St\.\s\w+)\s(.*)
parses the first two names and completely ignores the third.
This pattern (.*)(St\.\s\w+)|(St\.\s\w+\s(.*))$
returns the third name as well, but leaves off the surname of the first two.
I’m using this save https://regex101.com/ to test the regex pattern
So far I can’t figure out what pattern will return the surname in the match for all three names,
or if I need to do conditional statement in my code to parse the three element names separately, which seems inefficient.
TIA
>Solution :
Use this regex:
(.*)(St\.\s\w+)\s*.*
Online Demo
The regular expression matches as follows:
Node | Explanation |
---|---|
( |
group and capture to \1: |
.* |
any character except \n (0 or more times (matching the most amount possible)) |
) |
end of \1 |
( |
group and capture to \2: |
St |
‘St’ |
\. |
. |
\s |
whitespace (\n, \r, \t, \f, and " ") |
\w+ |
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) |
) |
end of \2 |
\s* |
whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) |
.* |
any character except \n (0 or more times (matching the most amount possible)) |