Area of coding: PDF Table of Contents in python3 using pyPDF2
Problem: I need a program that can iterate through a union variable that contains multiple dictionarys, then multiple lists which contains multiple dictionarys.
{}
[{},[{},{},{}],{},[{},{},{}],{},[{},{},{}]]
This pattern repeats multiple times.
Expected output: The output should look like this
1 Title Goes Here
1.1 Title Goes Here
1.2 Title Goes Here
1.3 Title Goes Here
2 Title Goes Here
2.1 Title Goes Here
2.2 Title Goes Here
2.3 Title Goes Here
Program:
"""
Program finds the Table of Contents(ToC) of a pdf file
Then prints it out in the format
1
1.1
1.2
1.3
2
2.1
2.2
2.3
"""
import argparse as arp
from PyPDF2 import PdfFileReader
parser = arp.ArgumentParser()
parser.add_argument("-f", "--file", help="File to analyse")
arg = parser.parse_args()
filename = arg.file
def fileread():
doc = PdfFileReader(filename)
ToC = doc.getOutlines()
# ToC: Union[List[Union[Destination, list]], {__eq__}] = doc.getOutlines()
for elements in ToC:
#print(elements)
#print("\n")
try:
if elements is {}: # If the element is a dictionary just find the Title
print(elements['/Title']) # TODO: This is just skipped
else: # If the element is a list go through and print out the titles
for nest_dict in elements:
try:
print(nest_dict["/Title"])
except:
continue
except:
continue
fileread()
I’m testing this program on: Compilers – Principles, Techniques, and Tools-Pearson_Addison Wesley (2006).pdf (ce.sharif.edu/courses/94-95/1/ce414-2/resources/root/Text%20Books/Compiler%20Design/Alfred%20V.%20Aho,%20Monica%20S.%20Lam,%20Ravi%20Sethi,%20Jeffrey%20D.%20Ullman-Compilers%20-%20Principles,%20Techniques,%20and%20Tools-Pearson_Addison%20Wesley%20(2006).pdf)
Any help is much appreciated.
>Solution :
This line is not right:
if elements is {}: # If the element is a dictionary just find the Title
It should instead read:
if isinstance(elements, dict):