Home Extracting specific tag from XML in python using BeautifulSoup

Questions

Extracting specific tag from XML in python using BeautifulSoup

September 20, 2023

I have a metadata file that looks like this:

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:title>Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review &amp; Tools</dc:title>
        <dc:creator opf:file-as="Princeton Review, The" opf:role="aut">The Princeton Review</dc:creator>
        <dc:identifier opf:scheme="ISBN">9780593516874</dc:identifier>
        <dc:identifier opf:scheme="AMAZON">0593516877</dc:identifier>
        <dc:identifier opf:scheme="GOODREADS">63139948</dc:identifier>
        <dc:identifier opf:scheme="GOOGLE">o6i4EAAAQBAJ</dc:identifier>
    </metadata>
</package>

I know how to use BeautifulSoup to extract fields like <dc.title>. I’m struggling how to extract only the ISBN field (<dc:identifier opf:scheme="ISBN">).

from bs4 import BeautifulSoup

with open ('metadata.opf', 'r') as f:
    file = f.read()

metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)

author = metadata.find('dc:creator')
print(author.text)

# isbn = metadata.find_all('dc:identifier'). # This finds 4 fields, as expected.

How do I limit it? I can’t depend on the order of the fields, and the ISBN length can vary.

>Solution :

According to the documentation, the find method has an argument attribute using it you should be able to select ISBN

isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"})

So the code could be written like

from bs4 import BeautifulSoup

with open ('metadata.opf', 'r') as f:
    file = f.read()

metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)

author = metadata.find('dc:creator')
print(author.text)

isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"}) # This finds 4 fields, as expected. 
print(isbn.text)

and should result in

Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review & Tools
The Princeton Review
9780593516874

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find

beautifulsoup

byMR

Published September 20, 2023

Add a comment

how to get 2 different Models data in one Controller in Laravel 9

byMR

September 20, 2023

Questions

how to write "2023-9-15 00:00:00" by R

byMR

September 20, 2023

Questions

multiline cell: sum the second line of each cell

byMR

September 20, 2023

Questions

Is the statement "Unicode encoding" accurate?

byMR

September 20, 2023

Questions

How can I update the UI (view) based on a method from viewmodel while still following MVVM

byMR

September 20, 2023

Questions

PHP Get the value of every occurrence of a key in a multidimensional array

byMR

September 20, 2023

Extracting specific tag from XML in python using BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

how to get 2 different Models data in one Controller in Laravel 9

how to write "2023-9-15 00:00:00" by R

multiline cell: sum the second line of each cell

Is the statement "Unicode encoding" accurate?

How can I update the UI (view) based on a method from viewmodel while still following MVVM

PHP Get the value of every occurrence of a key in a multidimensional array

Keep Up to Date with the Most Important News

Extracting specific tag from XML in python using BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

how to get 2 different Models data in one Controller in Laravel 9

how to write "2023-9-15 00:00:00" by R

multiline cell: sum the second line of each cell

Is the statement "Unicode encoding" accurate?

How can I update the UI (view) based on a method from viewmodel while still following MVVM

PHP Get the value of every occurrence of a key in a multidimensional array

Discover more from Dev solutions