Home Unable To Scrape url from page using Python and BeautifulSoup. Any ideas?

Questions

Unable To Scrape url from page using Python and BeautifulSoup. Any ideas?

February 6, 2022

As the title suggests. I’m playing around with a Twitter bot that scrapes rss feeds and tweets the title of the article and a link.

For some reason when I run the below code it runs without errors but doesn’t retrieve the url link. Any suggestions are gratefully recieved.

from bs4 import BeautifulSoup
import requests

url = "https://www.kdnuggets.com/feed"
resp = requests.get(url)
soup = BeautifulSoup(resp.content)
items = soup.findAll('item')
item = items[1]

print(item.title.text)
print(item.link.text)

The title prints fine but the link is nowhere to be found. For reference, below is a copy of the html that is returned for this item.

<item>
<title>An Overview of Logistic Regression</title>
<link/>https://www.kdnuggets.com/2022/02/overview-logistic-regression.html
                                        <comments>https://www.kdnuggets.com/2022/02/overview-logistic-regression.html#disqus_thread</comments>
<dc:creator>&lt;![CDATA[Matt Mayo Editor]]&gt;</dc:creator>
<pubdate>Fri, 04 Feb 2022 13:00:11 +0000</pubdate>
<category>&lt;![CDATA[2022 Feb Tutorials, Overviews]]&gt;</category>
<category>&lt;![CDATA[Machine Learning]]&gt;</category>
<guid ispermalink="false">https://www.kdnuggets.com/?p=137943</guid>
<description>&lt;![CDATA[Logistic regression is an extension of linear regression to solve classification problems. Read more on the specifics of this algorithm here.]]&gt;</description>
<wfw:commentrss>https://www.kdnuggets.com/2022/02/overview-logistic-regression.html/feed</wfw:commentrss>
<slash:comments>0</slash:comments>
</item>

Thanks in advance.

>Solution :

You aren’t getting item.link.text because it’s empty – the link element is

<link/>

Try this method to get the text:

>>> item.link
<link/>
>>> item.link.findNext().text
'https://www.kdnuggets.com/2022/02/overview-logistic-regression.html#disqus_thread'

You’ll still need to strip off the #.... but that’s straightforward to do

beautifulsoup

byMR

Published February 06, 2022

Add a comment

Hide and Replace Div after a certain time

byMR

February 6, 2022

Questions

How to populate fields based on Post Codes

byMR

February 6, 2022

Questions

Python's unittest `assertCouldNotTest`?

byMR

February 6, 2022

Questions

Using the "!" operator during file input/output operations in C++

byMR

February 6, 2022

Questions

How to execute a function after setting a state in React Native?

byMR

February 6, 2022

Questions

C++ doesn't give me an output

byMR

February 6, 2022

Unable To Scrape url from page using Python and BeautifulSoup. Any ideas?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Hide and Replace Div after a certain time

How to populate fields based on Post Codes

Python's unittest `assertCouldNotTest`?

Using the "!" operator during file input/output operations in C++

How to execute a function after setting a state in React Native?

C++ doesn't give me an output

Keep Up to Date with the Most Important News

Unable To Scrape url from page using Python and BeautifulSoup. Any ideas?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Hide and Replace Div after a certain time

How to populate fields based on Post Codes

Python's unittest `assertCouldNotTest`?

Using the "!" operator during file input/output operations in C++

How to execute a function after setting a state in React Native?

C++ doesn't give me an output

Discover more from Dev solutions