Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex table of contents

I have a table of contents items I would need to regex. The data is not totally uniform and I cant get it to work in all cases.

Data is following:

1.     Header 1
1.2.  SubHeader2
1.2.1     Subheader 
1.2.2.   Another header
1.2.2.1        Test
1.2.2.2.    Test2

So I would need to get both the number and the header in different groups. The number should be without the trailing dot, if it is there. The issue that im struggling with is that not all of the numbers have the trailing dot.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried

^([0-9\.]+)[\.]\s+(.+)$      -- Doesnt work when there is no trailing
^([0-9\.]+)[\.]?\s+(.+)$     -- Contains the trailing dot if it is there 

>Solution :

You can use

^(\d+(?:\.\d+)*)\.?\s+(.+)

See the regex demo. Details:

  • ^ – start of string
  • (\d+(?:\.\d+)*) – Group 1: one or more digits and then zero or more repetitions of a . and one or more digits sequence
  • \.? – an optional .
  • \s+ – one or more whitespaces
  • (.+) – Group 2: any one or more chars other than line break chars, as many as possible.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading