Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I group a string only if preceeded by a character, but ignore if the character is missing?

Sorry for the confusing title, I’m a bit lost myself.

I have the following (sample of) strings I’ve been retrieving while scraping government datas:

- Responsable administrative et financière - 01 02 03 04 05
- Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine - 01 02 03 04 05
- Conseiller
- Ingénieure des services culturels urbanisme et environnement

As you can see, some of them have a phone number (the last 10 numbers after the final dash for the 2 firsts lines), and some don’t.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’m looking for a way to group everything from after the first dash to the end, and if there is a dash, to make another group with the phone number in it.

So, with my input, I’d like to get the following back:

group1: "Responsable administrative et financière"
group2: "01 02 03 04 05"

group1: "Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine"
group2: "01 02 03 04 05"

group1: "Conseiller"

group1: "Ingénieure des services culturels urbanisme et environnement"

The closest I’ve been with regex is the following:

/- (.*)(?: - (.*))/gm

But then I don’t really know where to go, since if I add a "?" to make the second part optional, then it matches everything, so I’m a bit lost. Demo here

How should I proceed?

Thank you in advance

>Solution :

You can match any char except a - in the second part, and make that part optional while the first part is non greedy:

^- (.*?)(?: - ([^-\n]*))?$
  • ^- Start of string, and match -
  • (.*?) Captture group 1, match any char except a newline, as least as possible
  • (?: Non capture group
    • - Match literally
    • ([^-\n]*) Capture group 2, match optional chars other than - and a newline
  • )? Close non capture group and make it optional
  • $ End of string

Regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading