Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

bash – Concatenate files in different subfolders into a single file and have each file name in the first column

I am trying to concatenate a few thousand files that are in different subfolders into a single file and also have the name of each concatenated file inserted as the first column so that I know which file each data row came from. Essentially starting with something like this:

Folder1
file1.txt
123 010 ...
456 020 ...
789 030 ...

Folder2
file2.txt 
abc 100 ...
efg 200 ...
hij 300 ...

and outputting this:

CombinedFile.txt
file1  123  010 ...
file1  456  020 ...
file1  789  030 ...
file2  abc  100 ...
file2  efg  200 ...
file2  hij  300 ...

After reading this post, I have tried the following code, but end up with a syntax error (apologies, I’m super new to awk!)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

shopt -s globstar
for filename in path/**/*.txt; do
    awk '{print FILENAME "\t" $0}' *.txt > CombinedFile.txt
done

Thanks for your help!

>Solution :

Let’s build the command step by step.

awk works with pattern-action pairs of the form pattern { action } which executes action on the current record/line if pattern is true. If pattern is omitted, it is assumed to be true, and if action is committed it is equivalent to print the current record.

As the OP wants to print the name of the file at the beginning of the file, we can use the internal variables FILENAME and FNR. FILENAME contains the name of the file and FNR contains the current record/line number of the file being processed. So if FNR == 1 we want to print the filename. In awk, you write this as (FNR == 1){print FILENAME} When this condition is checked, we just need to print the line. This is done by 1 { print $0 } which is equivalent to 1.

So the following line prints what is expected for a single file:

$ awk '(FNR==1){print FILENAME}1' file

But we want to do this for multiple files, so we can do:

$ awk '(FNR==1){print FILENAME}1' file1 file2 file3 ... filen

or using a pattern/glob

$ awk '(FNR==1){print FILENAME}1' *.txt

If you want to match all files in the subdirectories as well, it can easily be done using find:

$ find . -type f -iname '*txt' -exec awk '(FNR==1){print FILENAME}1' {} \;  

The output of any of these files can now be redirected to any target upon request.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading