Extract all text within a tag & save to dictionary using beautifulSoup

September 12, 2022

I have an xml file that looks a bit like this:

<article id = '1'> 
  <p> This is </p> 
  <p> example A </p>
</article>

<article id = '2'> 
  <p> This is </p> 
  <p> example B </p>
</article>

I would like to create a dictionary that looks like this:

{1: 'This is example A', 2: 'This is example B'}

with the keys being the ‘id’ in the tag. What is the best way to go about doing this using beautiful soup?

>Solution :

This is how I will do it:

from bs4 import BeautifulSoup


output = {}

# If you're getting your XML file from the web skip this step:
with open("xml_file.xml", mode="r") as f:
    data = f.read()

soup = BeautifulSoup(data)
articles = soup.find_all('article')

for i in range(len(articles)):
    output[i+1] = ' '.join(articles[i].text.replace('\n', '').split())

Hope this helps!