Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to iterate over all folders and their subfolders and have AWK process each TXT file in the subfolders?

I want to iterate over all folders and their subfolders and print the names of the .TXT files (in the subfolders) whose first line contains the string CYCLE DATE (there may be spaces and/or underscores between CYCLE and DATE). Here’s my attempt at solving this:

In files_and_folders.sh I entered this:

#!/bin/bash
find . -name '*.TXT' -exec awk 'NR == 1 && $0 ~ /CYCLE[_ ]+DATE/ { print FILENAME }'

At the bash command line I entered this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

bash files_and_folders.sh

That produced the following error message:

find: missing argument to -exec

What is the correct way to do this?

>Solution :

I’d split this problem like this:

  1. Go over all files
  2. for each file:
    1. get the first line only
    2. check for CYCLE DATE
    3. print file name if found.

So,

#!/bin/bash
# Don't error on no file name matches:
shopt -s nullglob
# Enable recursive ** glob:
shopt -s globstar

for file in **/*.TXT ; do
  # first line only   # look for regex              # print file name
  #                   #  -q:   silently             #
  # -n 1: one line    #  -E: extended regexes       #
  head -n 1 "${file}" | grep -q -E 'CYCLE[_ ]+DATE' && echo "${file}"
  # or your elegant:
  # awk 'NR == 1 && $0 ~ /CYCLE[_ ]+DATE/ { print FILENAME }' "${file}"
done

Of course, instead of grep you can use awk to analyze your line, but frankly, that’s unnecessarily complex here. Your regular expression is very simple (CYCLE, then "space" (at least once), then DATE), so a simple regex engine like grep can do the job.


The problem with your find is that you use neither ';' nor '{}' after -exec, so find can’t understand where the command it should execute is done (or where it should put the file it found when doing the invocation).

But since this doesn’t even need find and can be done completely without, I’d personally say for file in GLOB; do … done is easier to remember than find -name 'PATTERN' -exec Some complicated syntax '{}' ';'.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading