Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert png/jpg to word file using python

I need to convert lots of jpg/png files to docx files & then to pdf. My sole concern is to write the data in an image to a pdf file & if I need to edit any text manually, I can do that in word & save it in the corresponding pdf file.

I’ve tried using API but failed as the text is not correctly matching.

My image files contain only texts & not anything else.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I already have docx to pdf conversion code in Python.

from docx2pdf import convert

input = 'INPUT_FILE_NAME.docx'
output = 'OUTPUT_FILE_NAME.pdf'
convert(input)
convert(input, output)
convert("Output")

Kindly suggest me how to convert a png/jpg file to docx. Thanks.

>Solution :

from PIL import Image
from pytesseract import pytesseract

#Define path to tessaract.exe
path_to_tesseract = r'C:\Program Files\Tesseract-OCR\tesseract.exe'


#Define path to image
path_to_image = 'texttoimage.png'

#Point tessaract_cmd to tessaract.exe
pytesseract.tesseract_cmd = path_to_tesseract

#Open image with PIL
img = Image.open(path_to_image)

#Extract text from image
text = pytesseract.image_to_string(img)

print(text)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading