Scraping issue with id_tag

October 2, 2022

I’m trying to extract data from a website with BeautifulSoup.
I’m actually stuck with this :
"Trad. de l’anglais par < a href="/searchinternet/advanced?all_authors_id=35534&SearchAction=1">Camille Fabien < /a>"
I want to get the names of translaters but the tag uses their id.

my code is

translater = soup.find_all("a", href="/searchinternet/advanced?all_authors_id=")

I tried with a str.startswith but it doesn’t work.
Can someone help me plz?

>Solution :

Providing your HTML is correct, static (doesn’t get loaded with javascript after initial page load), this is one way to select that/those links:

from bs4 import BeautifulSoup as bs

html = '''<p>Trad. de l'anglais par <a href="/searchinternet/advanced?all_authors_id=35534&SearchAction=1">Camille Fabien </a></p>'''

soup = bs(html, 'html.parser')
a = soup.select('a[href^="/searchinternet/advanced?all_authors_id="]')
print(a[0])
print(a[0].get_text(strip=True))
print(a[0].get('href'))

Result in terminal:

<a href="/searchinternet/advanced?all_authors_id=35534&amp;SearchAction=1">Camille Fabien </a>
Camille Fabien
/searchinternet/advanced?all_authors_id=35534&SearchAction=1

BeautifulSoup documentation can be found at https://beautiful-soup-4.readthedocs.io/en/latest/index.html