Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to match two consecutive h2 markdown titles?

I am trying to cleanup text that may have two consecutive markdown titles, like so: ## Foo bar\n\n## Another bar\n\n The rest of the text also contains other titles and other \n\n.

\n\n are always literal characters, not actual newlines. The whole text is a single line.

Let me describe it with examples:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Example 1

foo bar##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\nAliquam erat volutpat. Aenean ultrices odio leo, at vulputate enim porttitor non. Nam sodales vitae turpis quis sollicitudin. Mauris molestie eget purus nec scelerisque.\n\n##Sed eu erat quis nulla lobortis dapibus.\n\nPraesent suscipit, ante quis pretium varius, tellus ex consectetur elit, eu pharetra nunc metus cursus ex.\n\n##Aenean eu tempus dolor.\n\n Vivamus scelerisque sit amet mi eget dignissim. Fusce sit amet ligula vel tortor tincidunt porta.\n\n

Should match: ##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\n

Example 2

foo bar##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n Some text here ##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\nAliquam erat volutpat. Aenean ultrices odio leo, at vulputate enim porttitor non. Nam sodales vitae turpis quis sollicitudin. Mauris molestie eget purus nec scelerisque.\n\n##Sed eu erat quis nulla lobortis dapibus.\n\nPraesent suscipit, ante quis pretium varius, tellus ex consectetur elit, eu pharetra nunc metus cursus ex.\n\n##Aenean eu tempus dolor.\n\n Vivamus scelerisque sit amet mi eget dignissim. Fusce sit amet ligula vel tortor tincidunt porta.\n\n

Should match nothing.

All the regex I’ve tried over match, i.e. also greedily capture part of the paragraph instead of just the titles. Any ideas?

Thank you!

>Solution :

This just straightforward matched double # up until first double \n.
Repeats twice.

(?:\#\#(?:(?!\\n\\n).)*\\n\\n){2}

https://regex101.com/r/Ay8CLV/1

Explained in formatting

 (?:
    \#\#
    (?:
       (?! \\ n \\ n )
       . 
    )*
    \\ n \\ n
 ){2}

To insure the body contains no double #’s as well, use this, if needed

(?:\#\#(?:(?!\\n\\n|\#\#).)*\\n\\n){2}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading