Find empty files and their duplicates

September 19, 2023

I am trying to train tesseract. The process involves creating triples of files: box files, text files and image (tif) files.

The tool that creates the .box files sometimes creates empty files. Those empty files cause problems for the engine. So, I want to delete the empty box files as well as their partners.

The whole pattern looks like the following

File1.box
File1.gt.txt
File1.tif
File2.box
File2.gt.txt
File2.tif

File2.box is an empty file (has zero size). I want to find and delete it as well as its partners (duplicates) such as File2.gt.txt and File2.tif.

Is this doable?

>Solution :

check this simple script,I used the find command to search for all empty .box files (-type f -name "*.box" -size 0) and then I deletes the empty .box files using the -delete flag, at the end it removes the corresponding .gt.txt and .tif files by executing the rm command within the -exec flag :

#!/bin/bash

#specifing the directory where the files are located
directory="/path/to/files"

#changing to the specified directory
cd "$directory" || exit

#find and delete empty .box files along with their partners
find . -type f -name "*.box" -size 0 -delete -exec sh -c 'rm -f "${1%.box}.gt.txt" "${1%.box}.tif"' sh {} \;