Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex: match string between mandatory and optional groups

I’m trying to parse file with list of movies where strings like:

id,title (year),genre1|genre2|genre3

Year field is optional, but there are movies with some parts of title in brackets

So I have such regex:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(?:^\s*(\d+)\s*,.*?)(?:.*?\((\d{4})\))?(?:.*,\s*(.*)$)

my regex result

How can I improve it to catch title which is between id and optional year (or genres if there is no year)?

Data example:

1,Ace Ventura: When Nature Calls(1995),Comedy
20,Money Train (1995),Action|Comedy|Crime|Drama|Thriller
21,Get Shorty (1995),Comedy|Crime|Thriller
22,Copycat ,Crime|Drama|Horror|Mystery|Thriller
23,Assassins (1995),Action|Crime|Thriller
24,"Powder (1995)",Drama|Sci-Fi
25,Leaving (5) Las Vegas ,Drama|Romance

>Solution :

The year is always before a comma, so don’t put .* before the comma after the year.

^\s*(\d+)\s*,(.*?)(?:\((\d{4})\))?\s*,\s*(.*)$

DEMO

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading