Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Delete a file if it lacks a duplicate (partner)

Given to the success of the question I posted here: Find empty files and their duplicates, thank you Freeman & Mark Setchell, I am now encouraged to ask another related question. In this case, the challenge is to delete the file if it lacks a partner.

The text2image tool in Tesseract sometimes fails to produce .box files at all.

The files are supposed to appear as triplets, as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • File1.box
  • File1.gt.txt
  • File1.tif
  • File2.box
  • File2.gt.txt
  • File2.tif

But, when the tool fails to produce the box file, what I get is just the two partner files as follows.

  • File3.gt.txt
  • File3.tif

What I want is to delete those (gt.txt and .tif) files that lack the box partner.

I hope the description is clear.

>Solution :

is this what you want ?

#!/bin/bash

#iterate over the files in the directory
for file in *.{tif,gt.txt}; do
    #get the file name without extension
    file_name="${file%.*}"

    #checking if the corresponding box file exists
    box_file="${file_name}.box"
    if [ ! -f "$box_file" ]; then
        #delete the files without a partner
        rm "$file"
        echo "Deleted $file"
    fi
done
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading