Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find all files with given extension in subfolders, and add substring corresponding to subfolder

I have numerous txt files, scattered across different folders.

- case1
   |
    - 0.25
       |
        - case1.txt
    - 0.35
       |
        _ case1.txt
    - 0.30
       |
        _ case1.txt
    - 0.45
       |
        _ case1.txt

- case2
   |
    - 0.25
       |
        - case2.txt
    - 0.35
       |
        _ case2.txt
    - 0.30
       |
        _ case2.txt
    - 0.45
       |
        _ case2.txt

.
.
.

I would like to copy them all to a folder, but unfortunately, as you can see, some of them have the same name, and thus a naive find solution ends up overwriting them. I would like to copy all the txt files to a directory foo, inserting the name of the name of the subfolder they’re in, before the .txt extension. Also, since that the subfolder has a dot in the name, and I need to copy these files to Windows, I’d also like to change 0.25 to 0_25. In other words, the file

- case2
   |
    - 0.25
       |
        - case2.txt

must be copied to foo as case2_0_25.txt. If a bash solution is too complex/unreadable, a Python solution would be fine too, but not a zsh one.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can do this easily enough using the globstar option of bash (from man bash):

globstar

If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

Since we can use ** to find the files, we just need to define the new name as including the original directory names and changing . to _:

for file in **/*.txt; do 
    newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/')
    cp -- "$file" foo/"$newName" 
done

Explanation

  • for file in **/*.txt; do: find all files (and directories, if that’s relevant) in the current directory whose name ends in *.txt.
  • newName=$(sed 's|[/.]|_|g' <<<"$file" | sed 's/_txt$/.txt/') : use sed to convert all / and . to _ in the file name. Note that $file here also includes the path, so it will be something like case1/0.25/case1.txt and that becomes case1_0_25_case1_txt. We then pass the output of the first sed to a second one which converts _txt (if found at the end of the line) to .txt, giving us “case1_0_25_case1.txt. The final output is saved in the variable $newName`.
  • cp -- "$file" foo/"$newName": we now copy the file to the directory foo/ and with the new name. The -- is not really needed here, but ensures the approach will work with any file name, including those whose name starts with a -.

I recreated the folder structure you show in your question, ran the command above and got:


$ tree
.
├── case1
│   ├── 0.25
│   │   └── case1.txt
│   ├── 0.30
│   │   └── case1.txt
│   ├── 0.35
│   │   └── case1.txt
│   └── 0.45
│       └── case1.txt
├── case2
│   ├── 0.25
│   │   └── case2.txt
│   ├── 0.30
│   │   └── case2.txt
│   ├── 0.35
│   │   └── case2.txt
│   └── 0.45
│       └── case2.txt
└── foo
    ├── case1_0_25_case1.txt
    ├── case1_0_30_case1.txt
    ├── case1_0_35_case1.txt
    ├── case1_0_45_case1.txt
    ├── case2_0_25_case2.txt
    ├── case2_0_30_case2.txt
    ├── case2_0_35_case2.txt
    └── case2_0_45_case2.txt

11 directories, 16 files
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading