I would like to webscrape the following page: https://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.html
In particular, I would like to get the text inside every link you see displayed clicking on the link above. I am able to do it only by clickling on the link. For example, clicking on the first one:
import pandas as pd
from bs4 import BeautifulSoup
import requests
x = "https://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in211222~5f9a709924.en.html"
x1=[requests.get(x)]
x2 = [BeautifulSoup(x1[0].text)]
x3 = [x2[0].select("p+ p") for i in range(len(x2)-1)]
The problem is that I am not able to automate the process that leads me from the url with the list of links containing text (https://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.html) to the actual link where the text I need is stored (e.g. https://www.ecb.europa.eu/press/inter/date/2021/html/ecb.in211222~5f9a709924.en.html)
Can anyone help me?
Thanks!
>Solution :
To get a list of all links on https://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.html:
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.ecb.europa.eu/press/inter/date/2021/html/index_include.en.html')
soup = BeautifulSoup(r.text, 'html.parser')
links = [link.get('href') for link in soup.find_all('a')]