How do I stop my regex pattern from matching all text between the first open tag and the last closing tag?

Let me clarify what I want. I have next string:

<tag param=1>
bla bla bla
</tag>
<tag param=1>
1111111111
</tag>
<tag param=2>
1111111111
</tag>

I want get two matches for each tag with param=1 and their contents. To do it I use next code:

var matches = Regex.Matches(myString, "<tag param=1(.|\n)*</tag>");

But I get only one match for all string. I suppose it happens because it finds the first tag, handle all other symbols with (.'\n)* and after this meets last </tag>. Can I somehow add restriction that it should stop matching after first found </tag>?

>Solution :

The letters you are capturing between the opening and closing tags ((.|\n)*) is greedy. Add a ? to it to make it "lazy" – it will capture the least amount of letters needed.

<tag param=1(.|\n)*?</tag>
                   ^

Then matches will have 2 elements, and you can take the .First() from it.


Still going to recommend that you use an HTML/XML parser instead. If your schema becomes more complex, such as having a <tag> inside a <tag>, no amount of regex can save you.

Leave a Reply