Home How can I group a string only if preceeded by a character, but ignore if the character is missing?

Questions

How can I group a string only if preceeded by a character, but ignore if the character is missing?

January 27, 2022

Sorry for the confusing title, I’m a bit lost myself.

I have the following (sample of) strings I’ve been retrieving while scraping government datas:

- Responsable administrative et financière - 01 02 03 04 05
- Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine - 01 02 03 04 05
- Conseiller
- Ingénieure des services culturels urbanisme et environnement

As you can see, some of them have a phone number (the last 10 numbers after the final dash for the 2 firsts lines), and some don’t.

I’m looking for a way to group everything from after the first dash to the end, and if there is a dash, to make another group with the phone number in it.

So, with my input, I’d like to get the following back:

group1: "Responsable administrative et financière"
group2: "01 02 03 04 05"

group1: "Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine"
group2: "01 02 03 04 05"

group1: "Conseiller"

group1: "Ingénieure des services culturels urbanisme et environnement"

The closest I’ve been with regex is the following:

/- (.*)(?: - (.*))/gm

But then I don’t really know where to go, since if I add a "?" to make the second part optional, then it matches everything, so I’m a bit lost. Demo here

How should I proceed?

Thank you in advance

>Solution :

You can match any char except a - in the second part, and make that part optional while the first part is non greedy:

^- (.*?)(?: - ([^-\n]*))?$

^- Start of string, and match -
(.*?) Captture group 1, match any char except a newline, as least as possible
(?: Non capture group
- - Match literally
- ([^-\n]*) Capture group 2, match optional chars other than - and a newline
)? Close non capture group and make it optional
$ End of string