Grep exclude count of occurence match between comments <!-> of curl body

Advertisements

I am very new to linux & bash script. I’m trying to read an xml file using curl command and count the number of occurrence of the word </entity> in it.

curl -s "https://server:port/app/collection/admin/file?wt=xml&_=12334343432&file=samplefile.xml&contentType=text%2Fxml%3Bcharset%3Dutf-8" | grep '</entity>' -oP | wc -l

This works correctly, however the xml file consists of comments like below resulting in wrong count.

Sample XML file

.........
........
 <entity>
.......
.......
</entity>
........
........
<!--
.......
<entity>
........
</entity>
.......
.......
-->
<entity>
.......
........
</entity>

The expected output should be 2 since one of the match is inside the comment block.

>Solution :

Since you’re using gnu-grep here is a PCRE regex solution for your problem:

curl -s "https://server:port/app/collection/admin/file?wt=xml&_=12334343432&file=samplefile.xml&contentType=text%2Fxml%3Bcharset%3Dutf-8" |
grep -ZzoP '(?s)<!--.*?-->(*SKIP)(*F)|</entity>' |
tr '\0' '\n' |
wc -l

2

RegEx Demo

RegEx Details:

(?s): Enable DOTALL mode so that dot matches line breaks also
: Match a commented block
(*SKIP)(*F): skips and fails this commented block
|: OR
</entity>: Match </entity> outside commented block
tr '\0' '\n': Converts NUL bytes to line break
wc -l: Counts number of lines

Dev solutions

Solutions for development problems

Grep exclude count of occurence match between comments <!– –> of curl body

>Solution :

Leave a ReplyCancel reply

>Solution :

Share this:

Leave a ReplyCancel reply

Discover more from Dev solutions