I have a string like following
19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit
How can I write a regex that would give me these two separate strings
19990101 - John DoeLorem ipsum dolor sit amet
19990102 - Elton Johnconsectetur adipiscing elit
The regex I wrote works up to this
/\d+ -/gm
But I don’t know how can I include the alphabets there as well
>Solution :
You can use
const text = '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit';
console.log(text.match(/\d+\s+-[A-Za-z0-9\s]*[A-Za-z]/g))
console.log(text.split(/(?!^)\s+(?=\d+\s+-)/))
The text.match(/\d+\s+-[A-Za-z0-9\s]*[A-Za-z]/g)
approach is extracting the alphanumeric/whitespace chars after \d+\s+-
pattern. Details:
\d+
– one or more digits\s+
– one or more whitespaces-
– a hyphen[A-Za-z0-9\s]*
– zero or more alphanumeric or whitespace chars[A-Za-z]
– a letter
The text.split(/(?!^)\s+(?=\d+\s+-)/)
splitting approach breaks the string with one or more whitespaces before one or more digits + one or more whitespaces + -
:
(?!^)
– not at the start of string\s+
– one or more whitespaces(?=\d+\s+-)
– a positive lookahead that matches a location that is immediately followed with one or more digits + one or more whitespaces +-
.