Matching a regex metacharacter literally


My understanding of Regex’s in AWK is that in order to match a Regex metacharacter literally (For example: +,$,^,*,etc) You must escape them, like so:

awk -F '\\+' 'program here'

However I’ve noticed that you don’t actually need to do this with certain metacharacters, such as the "+"

Input file:


AWK program:

#!/usr/bin/awk -f
BEGIN { FS = "+|^"}

{print $1,$2,$3,$4 }

Expected output (Due to not escaping the +):


Actual output:

this|| is|| a|| line
this is a line

I don’t understand how this is working. I’m giving AWK blatantly bad code by not escaping the metacharacter (to make it literal) however AWK is matching successfully anyway?

I own a copy of "The AWK programming language" so I went through the section on Regex just to make sure I’m not going mad, and it states the following:

In a matching expression, a quoted string like "^[0-9]+$" can normally be used interchangeably with a regular expression enclosed in slashes, such as /^[0-9]+$/. There is one exception, however. If the string in quotes is to match a literal occurrence of a regular expression metacharacter, one extra backslash is needed to protect the protecting backslash itself. That is,

$0 ~ /(+|-)[0-9]+/


$0 ~ "(\+|-)[0-9]+"

are equivalent.

This behavior may seem arcane, but it arises because one level of protecting backslashes is removed when a quoted string is parsed by awk. If a backslash is needed in front of a metacharacter to turn off its special meaning in a regular expression, then that backslash needs a preceding backslash to protect it in a string.

Can someone explain what I’m missing here?

>Solution :

The + is at the start of the pattern: it can’t modify anything before that (i.e., allowing 1 or more of the non-existing character in front of it), thus awk interprets it as a literal + character, not a modifier.

From the gawk manual, on regex operator details

In POSIX awk and gawk, the ‘*’, ‘+’, and ‘?’ operators stand for themselves when there is nothing in the regexp that precedes them. For example, /+/ matches a literal plus sign. However, many other versions of awk treat such a usage as a syntax error.

Leave a ReplyCancel reply