Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Modify Regex To Match String with Multiple Full Month Names And End on Hyphen

I am using a Regex to pull dates out of a series of strings. The format varies slightly, but it always contains the full month. The strings usually contain two dates to represent a range like so:

February 1, 2020 - March 18, 2020

or

February 1st 2020 - March 18th 2020

And this is working great until I come across dates like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

June 1 - July 22, 2018

where a year is not presented in the "starting" part of the range because it is the same as the "ending" year.

Below is the Regex I crudely copied and applied to my code. It is Javascript but I really think this is more of a Regex question…

const regex = /((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?((\s*[,.\-\/]\s*)\D?)?\s*((19[0-9]\d|20\d{2})|\d{2})*/gm;

var myDateString1 = "January 8, 2020 - January 27, 2020"; // THIS WORKS GREAT!
var myDateString2 = "January 8 - January 27, 2020"; // THIS DOES NOT WORK GREAT!

var dates = myDateString1.match(regex);
// returns ["January 8, 2020","January 27, 2020"]

var dates2 = myDateString2.match(regex);
// returns ["January 8 - J"]

Is there a way I can modify this so if it is met with a hyphen it discontinues that given match? So myDateString2 would return ["January 8", "January 27, 2020"]?

The strings sometimes have words before or after, like

Presented from January 8, 2020 - January 27, 2020 at such and such place

so I don’t think simply having a regex based on the hyphen before/after would work.

>Solution :

You could use 2 capture groups and make the pattern more specific to match the format of the strings.

The /m flag can be omitted as there are no anchors in the pattern.

Note that the pattern matches a date like pattern, and does not validate the date itself.

\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s+\d{4})?)\s+[,./-]\s+\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s+\d{4})\b

See a regex101 demo.

const regex = /\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s+\d{4})?)\s+[,./-]\s+\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s+\d{4})\b/g;
const str = `January 8, 2020 - January 27, 2020
January 8 - January 27, 2020
Presented from January 8, 2020 - January 27, 2020 at such and such place
June 1 - July 22, 2018`;

console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading