Advertisements
I want to download headline dataset from website about carbon, but my code does not work.
I also want to add here dates of publications, but they written in text form.
My code:
import requests
from bs4 import BeautifulSoup
news_titles=[]
for page in range(1,6):
url="https://carbon-pulse.com/category/eu-ets/"+ str(page)
result=requests.get(url)
reshult=result.content
soup=BeautifulSoup(reshult, "lxml")
for title in soup.findAll("div",{"class":"posttitle"}):
titles=title.find(text=True)
news_titles.append(titles)
print(news_titles)
>Solution :
Your locators are not correct,You first need to get the "posttitle" with tag as h2 and then get tag "a" which contains title text
First 5 pages titles should be scraped
Also ur should be https://carbon-pulse.com/category/eu-ets/page/
Full code
import requests
from bs4 import BeautifulSoup
news_titles = []
for page in range(1, 6):
url = "https://carbon-pulse.com/category/eu-ets/page/" + str(page)
result = requests.get(url)
reshult = result.content
soup = BeautifulSoup(reshult, "html.parser")
for title in soup.findAll("h2", {"class": "posttitle"}):
titles = title.find_next("a").text
news_titles.append(titles)
print(news_titles)