Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python parsing with bs4 an xml to get a list of elements

I have an xml file containing data in this form:

<head xml:id="_2ebf9c0003">\n\nTECHNICAL FIELD</head>\n
<p n="0001" xml:id="_2ebf9c0004">whatever</p>
<p n="0002" xml:id="_2ebf9c0004">whatever</p>
<... other tags and data...>
<head xml:id="_2ebf9c0003">\n\nTITLE</head>\n

I know how to get particular elements like:

from bs4 import BeautifulSoup
soup = BeautifulSoup(PDM_description, 'lxml')
title_element = soup.title$
importing all p elements
paras = soup.findAll('p')

The question is how can I add an OR inside the query to get a list of "p" or "head" elements? More general how to get all the elements with tags belonging to a list.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

PSEUDO CODE:

paras = soup.findAll('p' OR 'head')    

>Solution :

You are close to your goal, just add a list with tags to your find_all():

soup.find_all(['p','head'])

Note: In new code use find_all() instead of older findAll() syntax

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading