I’m looking to extract some text from a PDF. I’m using this code:
import PyPDF2
Doc = open('document.pdf','rb')
pdfreader = PyPDF2.PdfFileReader(Doc)
pageObj = pdfreader.getPage(0)
pageObj.extractText()
Using this code the result from pageObj.extractText() is ''. I don’t know why this happen because there are text in the pdf that is open. This document just have 1 page.
Someone know what happen? or if there is another way to get information from a PDF?
>Solution :
You can try with PDF Plumber.
Instead of printing you can write it in a text file.
import pdfplumber
with pdfplumber.open(r'D:\document.pdf') as pdf:
first_page = pdf.pages[0]
print(first_page.extract_text())