Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping issue with id_tag

I’m trying to extract data from a website with BeautifulSoup.
I’m actually stuck with this :
"Trad. de l’anglais par < a href="/searchinternet/advanced?all_authors_id=35534&SearchAction=1">Camille Fabien < /a>"
I want to get the names of translaters but the tag uses their id.

my code is

translater = soup.find_all("a", href="/searchinternet/advanced?all_authors_id=")

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried with a str.startswith but it doesn’t work.
Can someone help me plz?

>Solution :

Providing your HTML is correct, static (doesn’t get loaded with javascript after initial page load), this is one way to select that/those links:

from bs4 import BeautifulSoup as bs

html = '''<p>Trad. de l'anglais par <a href="/searchinternet/advanced?all_authors_id=35534&SearchAction=1">Camille Fabien </a></p>'''

soup = bs(html, 'html.parser')
a = soup.select('a[href^="/searchinternet/advanced?all_authors_id="]')
print(a[0])
print(a[0].get_text(strip=True))
print(a[0].get('href'))

Result in terminal:

<a href="/searchinternet/advanced?all_authors_id=35534&amp;SearchAction=1">Camille Fabien </a>
Camille Fabien
/searchinternet/advanced?all_authors_id=35534&SearchAction=1

BeautifulSoup documentation can be found at https://beautiful-soup-4.readthedocs.io/en/latest/index.html

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading