Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to check if string is present in bash array using awk

I’ve got a file that looks like this:

a    12345
b    3456
c    45678

and i’ve got bash array of strings:

mylist=("a" "b")

What I want to do is to sum numbers in second column but only for rows where first column value (aka "a" or "b") is present in mylist.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My not-working code:

cat myfile.txt | awk -F'\t' '{BEGIN{sum=0} {if ($1 in ${mylist[@]}) sum+=$2} END{print sum}}'

Expected result is 12345+3456=15801.
I understand that problem is in if-statement but can’t figure out how to rearrange this code to work.

>Solution :

There’s no good reason to make awk read the array in the first place. Let join quickly pick out the matching lines — that’s what it’s specialized to do.

And if in real life your array and input file keys are guaranteed to be sorted as they are in the example, you can take the sort uses out of the code below.

# Cautious code that doesn't assume input sort order
LC_ALL=C join -1 1 -2 1 -o1.2 \
  <(LC_ALL=C sort <myfile.txt) \
  <(printf '%s\n' "${mylist[@]}" | LC_ALL=C sort) \
  | awk '{ sum += $1 } END { print sum }'

…or…

# Fast code that requires both the array and the file to be pre-sorted
join -1 1 -2 1 -o1.2 myfile.txt <(printf '%s\n' "${mylist[@]}") \
  | awk '{ sum += $1 } END { print sum }'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading