Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Bash regex to check file extensions

I am trying to check the type of a given file and if it is what I expect. It can have one of three extensions .fa, .fasta or .fasta.gz. Looking at other questions I think this should be quite trivial however when I try suggestions they do not work for me.

This is what I have tried, all of which do not match:

#!/bin/bash

test1="abcdef.fa"
test2="ghijkl.fasta"
test3="mnopqr.fasta.gz"
echo "test1: $test1"
echo "test2: $test2"
echo "test3: $test3"

# Attempt 1
if [[ $test1 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test1\n"; fi
if [[ $test2 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test2\n"; fi
if [[ $test3 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test3\n"; fi

# Attempt 2 - do I need to quote the string?
if [[ "$test1" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test1\n"; fi
if [[ "$test2" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test2\n"; fi
if [[ "$test3" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test3\n"; fi

# Attempt 3 - alternative regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test3\n"; fi

# Attempt 4 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test3\n"; fi

# Attempt 5 - put $ on end of regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test3\n"; fi

# Attempt 6 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test3\n"; fi

# Attempt 7 - use double ||
if [[ $test1 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test3\n"; fi

I am close with this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Attempt 8 - escape parentheses
if [[ $test1 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test1\n"; fi
if [[ $test2 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test2\n"; fi
if [[ $test3 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test3\n"; fi

However the first test does not work and the output looks like this:

test1: abcdef.fa
test2: ghijkl.fasta
test3: mnopqr.fasta.gz
Attempt8: Match with ghijkl.fasta
Attempt8: Match with mnopqr.fasta.gz

What am I missing?

>Solution :

=~ is supposed to accept regex patterns and not glob patterns. Try \.(fa|fasta|fasta\.gz)$.

Also you can use extended pattern matching: [[ $test1 == *.@(fa|fasta|fasta.gz) ]]

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading