Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does bash "=~" operator ignore the last part of the pattern specified?

I am trying to do compare a string in bash to a regex pattern and have found something odd. For starters I am using GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu). This is within WSL.

For example here is sample program demonstrating the problem:

#!/bin/env bash

name="John"

if [[ "${name}" =~ "John"* ]]; then
    echo "found"
else
    echo "not found"
fi

exit

As expected this will echo found since the name "John" matches the regex pattern described. Now what I find odd is if I drop the n in John, it still echos found. Imo "Joh" does match the pattern of "John"*.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If you drop the "hn" and just set $name to "Jo" then it echos not found. It seems to only affect the last character in the Regex pattern (aside from the wildcard).

I am converting an old csh script to bash and this behavior is not happening in csh. What is causing bash to do this?

>Solution :

You’re mixing up syntax for shell patterns and regular expressions. Your regular expression, after stripping the quoting, is John*: Joh followed by any number of n, including 0. Matches Joh, John, Johnn, Johnnn, …

It’s not anchored, so it also matches any string containing one of the matches above.

Since it’s not anchored, depending on what you want, you could do any of these:

  • Any string containing John should match:
    • Regex: [[ $name =~ John ]]
    • Shell pattern: [[ $name == *John* ]]
  • Any string that begins with John should match:
    • Regex: [[ $name =~ ^John ]]
    • Shell pattern: [[ $name == John* ]]

Notice that shell patterns, unlike the regular expressions, must match the entire string.

A note on quoting: within [[ ... ]], the left-hand side doesn’t have to be quoted; on the right-hand side, quoted parts are interpreted literally. For regular expressions, it’s a good practice to define it in a separate variable:

re='^John'
if [[ $name =~ $re ]]; then

This avoids a few edge cases with special characters in the regex.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading