Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

The sed command is not working with regex

I’m parsing the output of a HTTP GET request with sed to retrieve the contents of a given html tag. The result of that request is like this:

"<!DOCTYPE html><html><body><h1>Hello!</h1><p>v1.0.4-b</p></body></html>"

And I want to retrieve the version number inside the p element.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

However, sed seems to have a bug in regex parsing.
When I use:

sed 's/.*<p>//'

It correctly replaces the text at the left of the version (i.e., it outputs "v1.0.4-b</p></body></html>"). But, when I try to use regex groups, with

sed 's/.*<p>(.*)<\/p>.*/\1/'

It fails to match and gives an error:

sed: -e expression #1, char 20: invalid reference \1 on `s' command's RHS.

Despite that, when I test the regex on online regex validators it works.

Thank you in advance

>Solution :

You need to use

sed -n 's~.*<p>\([^<]*\)</p>.*~\1~p'
sed -n -E 's~.*<p>([^<]*)</p>.*~\1~p'

See the online demo:

#!/bin/bash
sed -n 's~.*<p>\([^<]*\)</p>.*~\1~p' <<< \
 "<!DOCTYPE html><html><body><h1>Hello!</h1><p>v1.0.4-b</p></body></html>"
## => v1.0.4-b

The sed 's/.*<p>(.*)<\p>.*/\1/' command would not work because

  • You are using a POSIX BRE pattern where the unescaped ( and ) are treated as literal parentheses chars, not a capturing group. In POSIX BRE, you need \(...\) to define a capturing group (this is why you get the invalid reference \1 exception)
  • If you add -E option to enable POSIX ERE, you can use (...) to define a capturing group
  • You are not matching /p, you have \p in the pattern.

As there are slashes in the pattern, it is more convenient to choose regex delimiters other than /, I chose ~ here.

Also, I used -n option to suppress default line output and p flag to print only the result of the substitution.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading