I am trying to cleanup text that may have two consecutive markdown titles, like so: ## Foo bar\n\n## Another bar\n\n The rest of the text also contains other titles and other \n\n.
\n\n are always literal characters, not actual newlines. The whole text is a single line.
Let me describe it with examples:
Example 1
foo bar##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\nAliquam erat volutpat. Aenean ultrices odio leo, at vulputate enim porttitor non. Nam sodales vitae turpis quis sollicitudin. Mauris molestie eget purus nec scelerisque.\n\n##Sed eu erat quis nulla lobortis dapibus.\n\nPraesent suscipit, ante quis pretium varius, tellus ex consectetur elit, eu pharetra nunc metus cursus ex.\n\n##Aenean eu tempus dolor.\n\n Vivamus scelerisque sit amet mi eget dignissim. Fusce sit amet ligula vel tortor tincidunt porta.\n\n
Should match: ##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\n
Example 2
foo bar##Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n\n Some text here ##Sed faucibus metus eu est sodales, a eleifend neque mollis.\n\nAliquam erat volutpat. Aenean ultrices odio leo, at vulputate enim porttitor non. Nam sodales vitae turpis quis sollicitudin. Mauris molestie eget purus nec scelerisque.\n\n##Sed eu erat quis nulla lobortis dapibus.\n\nPraesent suscipit, ante quis pretium varius, tellus ex consectetur elit, eu pharetra nunc metus cursus ex.\n\n##Aenean eu tempus dolor.\n\n Vivamus scelerisque sit amet mi eget dignissim. Fusce sit amet ligula vel tortor tincidunt porta.\n\n
Should match nothing.
All the regex I’ve tried over match, i.e. also greedily capture part of the paragraph instead of just the titles. Any ideas?
Thank you!
>Solution :
This just straightforward matched double # up until first double \n.
Repeats twice.
(?:\#\#(?:(?!\\n\\n).)*\\n\\n){2}
https://regex101.com/r/Ay8CLV/1
Explained in formatting
(?:
\#\#
(?:
(?! \\ n \\ n )
.
)*
\\ n \\ n
){2}
To insure the body contains no double #’s as well, use this, if needed
(?:\#\#(?:(?!\\n\\n|\#\#).)*\\n\\n){2}