SED Using Capture Group In Substitution

I’m trying to use SED (sed (GNU sed) 4.2.2) on Centos 7 (OS doesn’t seem related as same behavior occurs with AWS Linux 2) and my capture group is not being added back to the substitution string.

I’m trying to add a directory to an m3u8 file’s resources. The regex is correct as it does the replacement but it looses what should be captured in the first capture group.

code:

eregex='([0-9]+_?[0-9]*[.](ts|key))'
find . -type f -exec grep -lZEe "$eregex" {} + | xargs -r0 sed -i -E "s~$eregex~CH/$1~g"

original data:

https://example.com/dir/dir2/number/12345.key

behavior after execution:

https://example.com/dir/dir2/number/CH/

expected result:

https://example.com/dir/dir2/number/CH/12345.key

I’ve tried using it as a back reference \1 but that didn’t address the issue either. Is my syntax wrong here, or are the capture groups not working as intended? Tried using a non capture group as well for the possible extensions but that didn’t seem to be supported.

https://regex101.com/r/CSWeFx/1

>Solution :

I’ve tried using it as a back reference \1 but that didn’t address the issue either. Is my syntax wrong here,

Yes. The syntax for a backreference in sed‘s regex dialect is \1, \2, etc..

Your command line is processed by the shell before it invokes any commands. That includes parameter expansion, on which you are depending to provide the regex via variable eregex. But $1 is a variable reference too, and it will also be expanded (apparently to nothing in your case).

I’ve tried using it as a back reference \1 but that didn’t address the issue either.

The backslash (\) is a single-character quote character to the shell. Except inside a single-quoted string, \1 is equivalent to 1. The shell will convert the former to the latter during the quote removal stage of command-line processing. To pass a literal \ through to sed, you must either double it or enclose it in a single-quoted string. For example,

sed -i -E "s~${eregex}~CH/\\1~g"

or

sed -i -E "s~${eregex}~CH/"'\1~g'

(The curly braces are not essential in this case, but I consider it a matter of good form to use curly braces in variable references.)

or are the capture groups not working as intended? Tried using a non capture group as well for the possible extensions but that didn’t seem to be supported.

Correct, sed does not recognize Perl-style non-capturing groups.

Leave a Reply