Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Catch token word in bash 'heredoc'

The beginning of a ‘heredoc’ string in bash usually looks like

cat <<EOF or cat << EOF

i.e. there may or may not be a space between the two less-than characters and the token word "EOF". I would like to catch the token word, so I try the following

$ pcretest
PCRE version 8.45 2021-06-15

  re> "^\s*cat.*[^<]<{2}[^<](.*)"
data> cat << EOF
 0: cat << EOF
 1: EOF
data> cat <<EOF
 0: cat <<EOF
 1: OF

As you can see in the string where there is no space between the << and EOF, I only catch "OF" and not "EOF". The expression must match exactly two less-than signs and fail if there are three or more. But why does it gobble up the ‘E’ so that only "OF" is returned?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

In your pattern using are using a negated character class [^<] which matches a single character other than < which in this case is the E char in the string <<EOF

For your examples and using pcre, you could match the leading spaces and then match << without a following <

^\h*cat\h+<<(?!<)(.*)

The pattern matches:

  • ^ Start of string
  • \h* Match optional horizontal whitespace chars
  • cat\h+ Match cat and 1+ horizontal whitespace chars
  • <<(?!<) Match << and assert not < directly to the right
  • (.*) Capture optional chars in group 1

See a regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading