Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can i extract text from a PDF with python?

I’m looking to extract some text from a PDF. I’m using this code:

import PyPDF2
Doc = open('document.pdf','rb') 
pdfreader = PyPDF2.PdfFileReader(Doc)
pageObj = pdfreader.getPage(0)
pageObj.extractText()

Using this code the result from pageObj.extractText() is ''. I don’t know why this happen because there are text in the pdf that is open. This document just have 1 page.

Someone know what happen? or if there is another way to get information from a PDF?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can try with PDF Plumber.

Instead of printing you can write it in a text file.

import pdfplumber
with pdfplumber.open(r'D:\document.pdf') as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_text())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading