Why my range pattern is only working on the first file?

I have a set of files (FILE1.txt, FILE2.txt …) of the form:

foo 123
bar 456
start
foo 321
bar 654

And I want to ignore everything before start and only read lines containing foo in each file.

My attempt is this command :

awk '/start/,/EOF/ {if($1=="foo"){print $2}} ' FILE*.txt

And it actually works on the first file, that is it will print foo 321 but then it will ignore the range pattern for the next files. That is, if we assume that all the files has the same content showed above, it will print:

$ awk '/start/,/EOF/ {if($1=="foo"){print $2}} ' FILE*.txt

321 // Expected from FILE1.txt, successfully ignore the first "foo" before "start".
123 // Unexpected from FILE2.txt
321 // Expected from FILE2.txt
123 // Unexpected from FILE3.txt
321 // Expected from FILE3.txt
...

What am I doing wrong ? How to make the range pattern working on each file and not only once over all the files?
I’ve actually found a workaround based on find but for the sake of a good understanding I’m looking toward a solution relying on awk only.

>Solution :

awk processes all files as a single input stream. You need to tell awk when it’s processing a new file and to reset it’s pattern matching.

One approach:

awk '
FNR==1             { found=0 }          # FNR==1st record of new file, reset flag
/start/            { found=1 }          # found start of range, set flag
found && $1=="foo" { print $2 }         # if flag set and 1st field == "foo" then print 2nd field
' FILE?.txt

This generates:

321
321
321

NOTE: this was run against 3 files (FILE{1..3}.txt) that all have the same content as OP’s sample input

Leave a Reply