Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I stop my regex pattern from matching all text between the first open tag and the last closing tag?

Let me clarify what I want. I have next string:

<tag param=1>
bla bla bla
</tag>
<tag param=1>
1111111111
</tag>
<tag param=2>
1111111111
</tag>

I want get two matches for each tag with param=1 and their contents. To do it I use next code:

var matches = Regex.Matches(myString, "<tag param=1(.|\n)*</tag>");

But I get only one match for all string. I suppose it happens because it finds the first tag, handle all other symbols with (.'\n)* and after this meets last </tag>. Can I somehow add restriction that it should stop matching after first found </tag>?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The letters you are capturing between the opening and closing tags ((.|\n)*) is greedy. Add a ? to it to make it "lazy" – it will capture the least amount of letters needed.

<tag param=1(.|\n)*?</tag>
                   ^

Then matches will have 2 elements, and you can take the .First() from it.


Still going to recommend that you use an HTML/XML parser instead. If your schema becomes more complex, such as having a <tag> inside a <tag>, no amount of regex can save you.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading