Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Grep exclude count of occurence match between comments <!– –> of curl body

I am very new to linux & bash script. I’m trying to read an xml file using curl command and count the number of occurrence of the word </entity> in it.

curl -s "https://server:port/app/collection/admin/file?wt=xml&_=12334343432&file=samplefile.xml&contentType=text%2Fxml%3Bcharset%3Dutf-8" | grep '</entity>' -oP | wc -l

This works correctly, however the xml file consists of comments like below resulting in wrong count.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Sample XML file

.........
........
 <entity>
.......
.......
</entity>
........
........
<!--
.......
<entity>
........
</entity>
.......
.......
-->
<entity>
.......
........
</entity>

The expected output should be 2 since one of the match is inside the comment block.

>Solution :

Since you’re using gnu-grep here is a PCRE regex solution for your problem:

curl -s "https://server:port/app/collection/admin/file?wt=xml&_=12334343432&file=samplefile.xml&contentType=text%2Fxml%3Bcharset%3Dutf-8" |
grep -ZzoP '(?s)<!--.*?-->(*SKIP)(*F)|</entity>' |
tr '\0' '\n' |
wc -l

2

RegEx Demo

RegEx Details:

  • (?s): Enable DOTALL mode so that dot matches line breaks also
  • <!--.*?-->: Match a commented block
  • (*SKIP)(*F): skips and fails this commented block
  • |: OR
  • </entity>: Match </entity> outside commented block
  • tr '\0' '\n': Converts NUL bytes to line break
  • wc -l: Counts number of lines
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading