Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find empty files and their duplicates

I am trying to train tesseract. The process involves creating triples of files: box files, text files and image (tif) files.

The tool that creates the .box files sometimes creates empty files. Those empty files cause problems for the engine. So, I want to delete the empty box files as well as their partners.

The whole pattern looks like the following

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • File1.box
  • File1.gt.txt
  • File1.tif
  • File2.box
  • File2.gt.txt
  • File2.tif

File2.box is an empty file (has zero size). I want to find and delete it as well as its partners (duplicates) such as File2.gt.txt and File2.tif.

Is this doable?

>Solution :

check this simple script,I used the find command to search for all empty .box files (-type f -name "*.box" -size 0) and then I deletes the empty .box files using the -delete flag, at the end it removes the corresponding .gt.txt and .tif files by executing the rm command within the -exec flag :

#!/bin/bash

#specifing the directory where the files are located
directory="/path/to/files"

#changing to the specified directory
cd "$directory" || exit

#find and delete empty .box files along with their partners
find . -type f -name "*.box" -size 0 -delete -exec sh -c 'rm -f "${1%.box}.gt.txt" "${1%.box}.tif"' sh {} \;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading