I have a metadata file that looks like this:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review & Tools</dc:title>
<dc:creator opf:file-as="Princeton Review, The" opf:role="aut">The Princeton Review</dc:creator>
<dc:identifier opf:scheme="ISBN">9780593516874</dc:identifier>
<dc:identifier opf:scheme="AMAZON">0593516877</dc:identifier>
<dc:identifier opf:scheme="GOODREADS">63139948</dc:identifier>
<dc:identifier opf:scheme="GOOGLE">o6i4EAAAQBAJ</dc:identifier>
</metadata>
</package>
I know how to use BeautifulSoup to extract fields like <dc.title>. I’m struggling how to extract only the ISBN field (<dc:identifier opf:scheme="ISBN">).
from bs4 import BeautifulSoup
with open ('metadata.opf', 'r') as f:
file = f.read()
metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)
author = metadata.find('dc:creator')
print(author.text)
# isbn = metadata.find_all('dc:identifier'). # This finds 4 fields, as expected.
How do I limit it? I can’t depend on the order of the fields, and the ISBN length can vary.
>Solution :
According to the documentation, the find method has an argument attribute using it you should be able to select ISBN
isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"})
So the code could be written like
from bs4 import BeautifulSoup
with open ('metadata.opf', 'r') as f:
file = f.read()
metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)
author = metadata.find('dc:creator')
print(author.text)
isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"}) # This finds 4 fields, as expected.
print(isbn.text)
and should result in
Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review & Tools
The Princeton Review
9780593516874
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find