I want to download headline dataset from website about carbon

Advertisements

I want to download headline dataset from website about carbon, but my code does not work.
I also want to add here dates of publications, but they written in text form.

My code:

import requests
from bs4 import BeautifulSoup
news_titles=[]

for page in range(1,6):
    url="https://carbon-pulse.com/category/eu-ets/"+ str(page)
    result=requests.get(url)
    reshult=result.content
    soup=BeautifulSoup(reshult, "lxml")
    for title in soup.findAll("div",{"class":"posttitle"}):
        titles=title.find(text=True)
        news_titles.append(titles)

print(news_titles)

>Solution :

Your locators are not correct,You first need to get the "posttitle" with tag as h2 and then get tag "a" which contains title text
First 5 pages titles should be scraped

Also ur should be https://carbon-pulse.com/category/eu-ets/page/

Full code

import requests
from bs4 import BeautifulSoup

news_titles = []

for page in range(1, 6):
    url = "https://carbon-pulse.com/category/eu-ets/page/" + str(page)
    result = requests.get(url)
    reshult = result.content
    soup = BeautifulSoup(reshult, "html.parser")
    for title in soup.findAll("h2", {"class": "posttitle"}):
        titles = title.find_next("a").text
        news_titles.append(titles)

print(news_titles)

Leave a ReplyCancel reply