Advertisements
Given to the success of the question I posted here: Find empty files and their duplicates, thank you Freeman & Mark Setchell, I am now encouraged to ask another related question. In this case, the challenge is to delete the file if it lacks a partner.
The text2image tool in Tesseract sometimes fails to produce .box files at all.
The files are supposed to appear as triplets, as follows:
- File1.box
- File1.gt.txt
- File1.tif
- File2.box
- File2.gt.txt
- File2.tif
But, when the tool fails to produce the box file, what I get is just the two partner files as follows.
- File3.gt.txt
- File3.tif
What I want is to delete those (gt.txt and .tif) files that lack the box partner.
I hope the description is clear.
>Solution :
is this what you want ?
#!/bin/bash
#iterate over the files in the directory
for file in *.{tif,gt.txt}; do
#get the file name without extension
file_name="${file%.*}"
#checking if the corresponding box file exists
box_file="${file_name}.box"
if [ ! -f "$box_file" ]; then
#delete the files without a partner
rm "$file"
echo "Deleted $file"
fi
done