I have been trying numerous ways but this website is proving very hard to scrape via bs4.
I am trying to extract the href value found in the snip below on one of the matches. the id is to extract all href tags from the page into a list. I am not returning any values the ideal result is a list containing all hrefs eg //www.premierleague.com/match/74911
import warnings
import numpy as np
from datetime import datetime
import requests
from bs4 import BeautifulSoup
warnings.filterwarnings('ignore')
# set up empty dataframe in a list for storage. errors is set up to handle any matches that dont scrape.
dataframe = []
errors = []
url = "https://www.premierleague.com/results"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
matches = {}
soup.find_all("div", {"class": "competitionContainer"})
>Solution :
The data you see on the page is loaded from external source via JavaScript (you can open Web developer tools in your browser -> Network tab and start scrolling the page down. You should see the Ajax request there):
import json
import requests
api_url = "https://footballapi.pulselive.com/football/fixtures"
params = {
"comps": "1",
"compSeasons": "489",
"teams": "127,1,2,130,131,4,6,7,34,9,26,10,11,12,23,15,20,21,25,38",
"page": "1",
"pageSize": "40",
"sort": "desc",
"statuses": "C",
"altIds": "true",
}
headers = {
'Origin': 'https://www.premierleague.com',
}
page = 0
while True:
params['page'] = page
data = requests.get(api_url, params=params, headers=headers).json()
# uncoment this to print all data:
# print(json.dumps(data, indent=4))
for c in data['content']:
team1, team2 = c['teams'][0]['team']['name'], c['teams'][1]['team']['name']
print(f'{team1:<30} {team2:<30} https://www.premierleague.com/match/{int(c["id"])}')
if page > data['pageInfo']['numPages']:
break
page += 1
Prints:
...
Chelsea Tottenham Hotspur https://www.premierleague.com/match/74925
Nottingham Forest West Ham United https://www.premierleague.com/match/74928
Brentford Manchester United https://www.premierleague.com/match/74923
Arsenal Leicester City https://www.premierleague.com/match/74921
Brighton & Hove Albion Newcastle United https://www.premierleague.com/match/74924
Manchester City Bournemouth https://www.premierleague.com/match/74927
Southampton Leeds United https://www.premierleague.com/match/74929
Wolverhampton Wanderers Fulham https://www.premierleague.com/match/74930
Aston Villa Everton https://www.premierleague.com/match/74922
West Ham United Manchester City https://www.premierleague.com/match/74920
Leicester City Brentford https://www.premierleague.com/match/74916
Manchester United Brighton & Hove Albion https://www.premierleague.com/match/74919
Everton Chelsea https://www.premierleague.com/match/74913
Bournemouth Aston Villa https://www.premierleague.com/match/74912
Leeds United Wolverhampton Wanderers https://www.premierleague.com/match/74915
Newcastle United Nottingham Forest https://www.premierleague.com/match/74917
Tottenham Hotspur Southampton https://www.premierleague.com/match/74918
Fulham Liverpool https://www.premierleague.com/match/74914
Crystal Palace Arsenal https://www.premierleague.com/match/74911