I have a list of <li> each containing an <a> tag href url value and a <span> with its url title . I am trying to get the url by the span tag’s title value. This is my example:
<li><a href="http://someurl"><span>Title of URL</span></a></li>
This is my last attempt:
soup.select_one('span:-soup-contains("Title of URL:")').find_previous_sibling(text=True)
But that won’t work since the span is IN the <a> tag.
I’ve tried countless other variations that I have since deleted.
If anyone can help I’d be grateful.
>Solution :
Just select correct <a>:
from bs4 import BeautifulSoup
html_text = """\
<li><a href="http://someurl"><span>Title of URL</span></a></li>"""
soup = BeautifulSoup(html_text, "html.parser")
url = soup.select_one('a:-soup-contains("Title of URL")')["href"]
print(url)
Prints:
http://someurl