Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python/Pandas: How to convert a bs4.element.ResultSet into a Pandas DataFrame?

I want to extract the title and the link out of the bs4.element.ResultSet into a pandas dataframe:

Code:

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
config = Config() 
config.browser_user_agent = user_agent 
user_input = "Solarpanels"
site = f'https://news.google.com/rss/search?q={user_input}+when:14d&hl=en-GB&gl=DE&ceid=GB:en' 
op = urlopen(site)
rd = op.read() 
sp_page = soup(rd, 'xml') 
news_list = sp_page.find_all('item') 
print(type(news_list))
print(news_list)

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<class 'bs4.element.ResultSet'>
[<item><title>Australian research finds cost-effective way to recycle solar panels - The Guardian</title><link>https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels</link><guid isPermaLink="false">1605236140</guid><pubDate>Sat, 15 Oct 2022 23:51:00 GMT</pubDate><description>&lt;ol&gt;&lt;li&gt;&lt;a href="https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels" target="_blank"&gt;Australian research finds cost-effective way to recycle solar panels&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#6f6f6f"&gt;The Guardian&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.techjuice.pk/australian-researchers-find-cost-effective-way-to-recycle-solar-panels/" target="_blank"&gt;Australian Researchers Find Cost-Effective Way To Recycle Solar Panels&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#6f6f6f"&gt;TechJuice&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.esi-africa.com/industry-sectors/business-and-markets/how-could-recycling-solar-panels-be-scaled-up-for-sustainable-effect/" target="_blank"&gt;How could recycling solar panels be scaled up for sustainable effect&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#6f6f6f"&gt;ESI Africa&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.digitaljournal.com/pr/solar-panel-recycling-market-to-rise-at-37-cagr-during-forecast-period-tmr-study" target="_blank"&gt;Solar Panel Recycling Market to Rise at 37% CAGR during Forecast Period: TMR Study&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#6f6f6f"&gt;Digital Journal&lt;/font&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;a href="https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2lzNjdmOUJSR3NNT0h4Y0h5dF9TZ0FQAQ?oc=5" target="_blank"&gt;View Full coverage on Google News&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;&lt;/ol&gt;</description><source url="https://www.theguardian.com">The Guardian</source></item> 

... and much more

I tried a lot, but unfortunately I can’t make it.

>Solution :

Try:

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}

user_input = "Solarpanels"
site = f"https://news.google.com/rss/search?q={user_input}+when:14d&hl=en-GB&gl=DE&ceid=GB:en"


soup = BeautifulSoup(requests.get(site, headers=headers).content, "xml")

all_data = []
for item in soup.select("item"):
    all_data.append(
        {
            "title": item.title.text,
            "link": item.link.text,
            "pubDate": item.pubDate.text,
            "description": BeautifulSoup(
                item.description.text, "html.parser"
            ).get_text(strip=True), # or .get_text(strip=True, separator=" ")
            "source": item.source.text,
            "source_url": item.source["url"],
        }
    )

df = pd.DataFrame(all_data)
print(df.head().to_markdown(index=False))

Prints:

title link pubDate description source source_url
Australian research finds cost-effective way to recycle solar panels – The Guardian https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels Sat, 15 Oct 2022 23:51:00 GMT Australian research finds cost-effective way to recycle solar panelsThe GuardianAustralian Researchers Find Cost-Effective Way To Recycle Solar PanelsTechJuiceHow could recycling solar panels be scaled up for sustainable effectESI AfricaSolar Panel Recycling Market to Rise at 37% CAGR during Forecast Period: TMR StudyDigital JournalView Full coverage on Google News The Guardian https://www.theguardian.com
Business Matters: Solar Panels on Commercial Property: Why You Should Make the Switch – Insider Media https://www.insidermedia.com/blogs/north-west/business-matters-solar-panels-on-commercial-property-why-you-should-make-the-switch Mon, 17 Oct 2022 09:13:35 GMT Business Matters: Solar Panels on Commercial Property: Why You Should Make the SwitchInsider Media Insider Media https://www.insidermedia.com
Cost of living: The people using solar panels and turbines to reduce bills – bbc.co.uk https://www.bbc.co.uk/news/uk-england-essex-62967716 Wed, 05 Oct 2022 07:00:00 GMT Cost of living: The people using solar panels and turbines to reduce billsbbc.co.uk bbc.co.uk https://www.bbc.co.uk
School applies for 120 solar panels – Stamford Mercury https://www.stamfordmercury.co.uk/news/school-applies-for-120-solar-panels-9278921/ Mon, 17 Oct 2022 11:00:00 GMT School applies for 120 solar panelsStamford Mercury Stamford Mercury https://www.stamfordmercury.co.uk
Solar panels enable Lanarkshire village hall to cut running costs by 80 per cent – Daily Record https://www.dailyrecord.co.uk/in-your-area/lanarkshire/solar-panels-enable-lanarkshire-village-28211459 Sun, 16 Oct 2022 18:50:00 GMT Solar panels enable Lanarkshire village hall to cut running costs by 80 per centDaily Record Daily Record https://www.dailyrecord.co.uk
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading