Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex sed does not give me expected result

Sed doesn’t give me the expected result. I want to get the output from Group 2 but sed gives me nothing. I ran this command on Ubuntu 20.04.3 LTS and I was using sed (GNU sed) 4.7. But when I tried it on regex101.com, it gave me the expected result. You can see it here.

root@6ab6c9bc0d76:~# cat /etc/issue
Ubuntu 20.04.3 LTS \n \l
root@6ab6c9bc0d76:~# sed --version
sed (GNU sed) 4.7
Packaged by Debian
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
Paolo Bonzini, Jim Meyering, and Assaf Gordon.
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help using GNU software: <https://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.

Group 2 is empty.

root@6ab6c9bc0d76:~# echo "https://one-two-three-four-five.dev.domain.com" | sed -E "s/(https?:\/\/)([\w|-]*)(.*)/Group1: \1\nGroup2: \2\nGroup3: \3/"
Group1: https://
Group2:
Group3: one-two-three-four-five.dev.domain.com
root@6ab6c9bc0d76:~#

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

With your GNU sed, you can use

#!/bin/bash
echo "https://one-two-three-four-five.dev.domain.com" | \
 sed -E "s~(https?://)([[:alnum:]_-]*)(.*)~Group1: \1\nGroup2: \2\nGroup3: \3~"

Output:

Group1: https://
Group2: one-two-three-four-five
Group3: .dev.domain.com

See the online demo.

Inside a bracket expression, \w is parsed as a backslash or w matching pattern. [:alnum:] POSIX character class matches digits or letters, so, as \w also matches underscores, you need to combine the [:alnum:] and _ inside the bracket expression than also matches a - char: [[:alnum:]_-]. Note the - must be located at the start/end of the bracket expression.

I used ~ as the regex delimiter char as you have / chars in the regex pattern, this helps avoid over-escaping.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading