Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Iterating over a dictionary of pdf files and their name and create a dictionary and put the name and corresponding text into it

I wrote the code as follws to extract one single pdf file and put the text into a list. how can I modify the code that it iterates over a dictionary of pdf files and their name and create a dictionary and put the name and corresponding text into it?

dic = {
 '0R.pdf':'m1',
 '2R.pdf':'m2',
 '29R.pdf':'m3'}

def readpdffile(pdf_file):
        pdfFileObj = open(pdf_file, 'rb')
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        output = []
        for i in range(pdfReader.numPages):
            pageObj = pdfReader.getPage(i)
            output.append(pageObj.extractText())
    
        return output

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You can modify the code to iterate over the dictionary of pdf files and their names, and store the extracted text and the corresponding name in a dictionary using the following code:

import PyPDF2

dic = {
 '0R.pdf':'m1',
 '2R.pdf':'m2',
 '29R.pdf':'m3'
}

def read_pdffiles(dictionary):
    result = {}
    for pdf_file, name in dictionary.items():
        pdfFileObj = open(pdf_file, 'rb')
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        output = []
        for i in range(pdfReader.numPages):
            pageObj = pdfReader.getPage(i)
            output.append(pageObj.extractText())
        result[name] = output
        pdfFileObj.close()
    return result

result = read_pdffiles(dic)
print(result)

The read_pdffiles function takes a dictionary containing the pdf filenames and their corresponding names as input, and returns a dictionary containing the name and the extracted text as key-value pairs. The function opens each pdf file using the filename and extracts the text from each page using the PyPDF2 module. The extracted text is then stored in a list and the list is stored in the dictionary using the corresponding name as the key. The function finally returns the resulting dictionary.

You can call the read_pdffiles function with the dic dictionary as input, and store the resulting dictionary in a variable like result. The resulting dictionary will have the name and the corresponding extracted text for each pdf file as key-value pairs. You can print the resulting dictionary to verify the output.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading