Sorry for the confusing title, I’m a bit lost myself.
I have the following (sample of) strings I’ve been retrieving while scraping government datas:
- Responsable administrative et financière - 01 02 03 04 05
- Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine - 01 02 03 04 05
- Conseiller
- Ingénieure des services culturels urbanisme et environnement
As you can see, some of them have a phone number (the last 10 numbers after the final dash for the 2 firsts lines), and some don’t.
I’m looking for a way to group everything from after the first dash to the end, and if there is a dash, to make another group with the phone number in it.
So, with my input, I’d like to get the following back:
group1: "Responsable administrative et financière"
group2: "01 02 03 04 05"
group1: "Gestionnaire travaux d'entretien sur les monuments historiques (Titre 3 - fonctionnement) et contrôle scientifique et technique sur les MH inscrits - Ille-et-Vilaine"
group2: "01 02 03 04 05"
group1: "Conseiller"
group1: "Ingénieure des services culturels urbanisme et environnement"
The closest I’ve been with regex is the following:
/- (.*)(?: - (.*))/gm
But then I don’t really know where to go, since if I add a "?" to make the second part optional, then it matches everything, so I’m a bit lost. Demo here
How should I proceed?
Thank you in advance
>Solution :
You can match any char except a - in the second part, and make that part optional while the first part is non greedy:
^- (.*?)(?: - ([^-\n]*))?$
^-Start of string, and match-(.*?)Captture group 1, match any char except a newline, as least as possible(?:Non capture group-Match literally([^-\n]*)Capture group 2, match optional chars other than-and a newline
)?Close non capture group and make it optional$End of string