I have the below code segment which converts a PDF with multiple pages, store the pages as separate PNGs in a directory and then use OpenCV to read all PNGs in a loop. However, cv2.imread() returns None for all except the first in the loop. E.g., in a directory with 4 images (converted from a PDF), cv2 only reads the first image and returns None in other 3 iterations. PNGs are perfectly created and stored in the directory when I check in the finder. Anyone got a clue on what’s going on? Appreciate your comments.
file_name, num_pages, file_ext = convert_file(file_name=file_name)
if not file_name:
return {"error": "Incompatible file type. Allowed types are PNG, JPG/JPEG and PDF only."}
images = []
for n in range(num_pages):
try:
if file_ext == 'pdf':
page = cv2.imread(file_name + '/' + str(n) + '.png')
shutil.rmtree(file_name, ignore_errors=True)
else:
page = cv2.imread(file_name)
except cv2.error:
logging.exception(f"Error reading image file {file_name + '/' + str(n) + '.png'}")
pass
if page is not None and len(page) > 0:
images.append(page)
if len(images) == 0:
return {"error": "Failed reading document."}
First edit: I’ve ensured there are no flaws in PDF conversion function by commenting that line out and store the images in the directory manually. Also tried removing the first page from the PDF and input a 3 page PDF and it still reads the first page (which was the second page before) and fails in other 2 pages. Also tried this with different PDF files and got the same problem.
Edit 2: Below is the directory and file structure in the context. My code creates a sub-directory of the same name as PDF file and store the PNG pages inside it as shown below.
>Solution :
It looks like you’re deleting the directory containing your images after reading the first one.
Maybe try moving the line
shutil.rmtree(file_name, ignore_errors=True)
outside the for loop.
P.S. this would have been a comment, but not enough rep to do that yet. sorry.