Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Best way to split a string that has titles in it

here is my string :

data = '2.5 Excavation et terrassement 2.7 Travaux d'emplacement 3.2 Petits ouvrages de béton 4.2 Travaux de maçonnerie non structurale marbre et céramique 5.2 Ouvrages métalliques 6.2 Travaux de bois et plastique 7 Isolation étanchéité couvertures et revêtement extérieur 8 Portes et fenêtres 9 Travaux de finition 11.2 Équipements et produits spéciaux 12 Armoires et comptoirs usinés 13.5 Installations spéciales ou préfabriquées 15.6 Propane 17.2 Intercommunication téléphonie et surveillance'

i want the result to be:

list = ['2.5 Excavation et terrassement', '2.7 Travaux d'emplacement', '3.2 Petits ouvrages de béton',...]

thanks in advance .

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use re.findall:

print(re.findall(r'\d[\d.]*\D+[^\s\d]', data))

Explanation:

  • \d[\d.]* will match a digit followed by any number (zero included) of digits and dots
  • \D+ will match one or more non-digit characters
  • [^\s\d] ensures the match doesn’t end with a space (equivalent of stripping) or a digit (that would belong to the next title)
Edit: this is not fool-proof since any digit occurring in a title will be recognized as the start of a new title (hardly avoidable since you have title # like 8 or 9…)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading