Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parsing Data using Regex. Split it into columns via groups

I want to use REGEX to parse my data into 3 columns

Film data:
Marvel Comics Presents (1988) #125
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (Trade Paperback)
Spider-Man Legends Vol. II: Todd Mcfarlane Book I
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (1998)
Marvel Comics Presents #125

Expected output:
enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I can see how to group it, but can’t seem to REGEX it:
enter image description here

I built this expression: (.*)\((\d{4})\)(.*)

I want to essentially use the ? quantifier to say the following:
(.*)\((\d{4})\)**?**(.*)
sort of like saying this group may or may not be there?

Nevertheless, it’s not working.

>Solution :

You could use 2 capture groups, where the last 2 are optional:

^(.*?)(?:\((\d{4})\))?\s*(#\d+)?$

The pattern matches:

  • ^ Start of string
  • (.*?) Capture group 1
  • (?:\((\d{4})\))? Optional non capture group capturing 4 digits in group 2
  • \s* match optional whitespace chars
  • (#\d+)? Optional group 3, match # and 1+ digits
  • $ End of string

See a regex101 demo.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading