Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract specific part of html using Beautifulsoup?

I am trying to extract the what’s within the ‘title’ tag from the following html, but so far I didn’t manage to.

<div class="pull_right date details" title="22.12.2022 01:49:03 UTC-03:00">

This is my code:

from bs4 import BeautifulSoup

with open("messages.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

results = soup.find_all('div', attrs={'class':'pull_right date details'})

print(results)

And the output is a list with all <div for the html file.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

To access the value inside title. Simply call ['title'].

If you use find_all, then this will return a list. Therefore you will need an index (e.g [0]['title'])

For example:

from bs4 import BeautifulSoup

fp = '<html><div class="pull_right date details" title="22.12.2022 01:49:03 UTC-03:00"></html>'
soup = BeautifulSoup(fp, 'html.parser')

results = soup.find_all('div', attrs={'class':'pull_right date details'})

print(results[0]['title'])

Or:

results = soup.find('div', attrs={'class':'pull_right date details'})

print(results['title'])

Output:

22.12.2022 01:49:03 UTC-03:00
22.12.2022 01:49:03 UTC-03:00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading