Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the parent text and child one separately and store them in dictionary

Suppose I have this html,

<span class="name">
  <span class="age">21</span>
  Will Green
</span>

I want to extract the name and age text and store them into a dictionary.

So far I have been able to get the age, but getting the name only has been difficult.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is what I tried so far.

with open('test.html', 'r') as file:
    contents = file.read()    
    soup = BeautifulSoup(contents, 'html.parser')
    
    name = soup.find(class_="name").getText()
    age = soup.find("span", class_="age").getText()

    results = {}
    results['name'] = name
    results['age'] = age

    print(results)

The output is {'name': '\n21\n Will Green\n ', 'age': '21'}

As you can see the the name is giving me some odd characters, spaces and also the text of child element as well.

How can I resolve this?

expected output {'name': 'Will Green', 'age': '21'}

>Solution :

In fact that structure is still the same you could use stripped_strings and zip() it with expected keys:

dict(zip(['age','name'],soup.select_one('span.name').stripped_strings))

An alterntive approach could be to select age first and then its next_sibling:

{
    'age': soup.select_one('span.age').text,
    'name':soup.select_one('span.age').next_sibling.get_text(strip=True)
}
Example
html='''
<span class="name">
  <span class="age">21</span>
  Will Green
</span>
'''
from bs4 import BeautifulSoup 

soup = BeautifulSoup(html)
dict(zip(['age','name'],soup.select_one('span.name').stripped_strings))
Output
{'age': '21', 'name': 'Will Green'}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading