Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get href using beautiful soup

I have a list of <li> each containing an <a> tag href url value and a <span> with its url title . I am trying to get the url by the span tag’s title value. This is my example:

<li><a href="http://someurl"><span>Title of URL</span></a></li>

This is my last attempt:

soup.select_one('span:-soup-contains("Title of URL:")').find_previous_sibling(text=True)

But that won’t work since the span is IN the <a> tag.
I’ve tried countless other variations that I have since deleted.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If anyone can help I’d be grateful.

>Solution :

Just select correct <a>:

from bs4 import BeautifulSoup

html_text = """\
<li><a href="http://someurl"><span>Title of URL</span></a></li>"""

soup = BeautifulSoup(html_text, "html.parser")

url = soup.select_one('a:-soup-contains("Title of URL")')["href"]
print(url)

Prints:

http://someurl
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading