Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex captures more than wanted

I want to remove references on Wikipedia with AutoWikiBrowser ( regex flavor), an automatic editor that handles regexes, but I am facing a newbie problem with the tags.

For example, I want to remove all references containing example.com, e.g.

<ref>{{cite web|title=Bar|url=https://example.com/bar}}</ref>

I tried the basic regex <ref>.*?example.com.*?</ref> (replaced with nothing), but it also captures everything after the first <ref> tag encountered, e.g:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<ref>{{cite web|title=Foo|url=https://zzz.com/foo}}</ref> blah-blah <ref>{{cite web|title=Bar|url=https://example.com/bar}}</ref>

I tried lookarounds with the tags, but the issue is it is not capturing the tags.

I am sorry to ask such a simple question, but I have been searching for the last hour to no avail, I speak English quite fluently, but not when it comes to technical terms…

>Solution :

You can use this regex, which will match a <ref> tag that includes example.com before the closing </ref>:

<ref>(?=(?:(?!<\/ref>).)*example\.com).*?<\/ref>

This matches:

  • <ref> : the characters <ref>
  • (?=(?:(?!<\/ref>).)*example\.com) : a forward lookahead that asserts the phrase example.com occurs before a closing </ref> tag (using a tempered greedy token)
  • .*? : a minimal number of characters
  • <\/ref> : the characters <\/ref>

Demo on regex101

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading