Delete a file if it lacks a duplicate (partner)

Advertisements

Given to the success of the question I posted here: Find empty files and their duplicates, thank you Freeman & Mark Setchell, I am now encouraged to ask another related question. In this case, the challenge is to delete the file if it lacks a partner.

The text2image tool in Tesseract sometimes fails to produce .box files at all.

The files are supposed to appear as triplets, as follows:

  • File1.box
  • File1.gt.txt
  • File1.tif
  • File2.box
  • File2.gt.txt
  • File2.tif

But, when the tool fails to produce the box file, what I get is just the two partner files as follows.

  • File3.gt.txt
  • File3.tif

What I want is to delete those (gt.txt and .tif) files that lack the box partner.

I hope the description is clear.

>Solution :

is this what you want ?

#!/bin/bash

#iterate over the files in the directory
for file in *.{tif,gt.txt}; do
    #get the file name without extension
    file_name="${file%.*}"

    #checking if the corresponding box file exists
    box_file="${file_name}.box"
    if [ ! -f "$box_file" ]; then
        #delete the files without a partner
        rm "$file"
        echo "Deleted $file"
    fi
done

Leave a ReplyCancel reply