Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can scrape the team names and goals from this site into a table? Ive been trying a few different methods but can't quite figure it out

import requests
from bs4 import BeautifulSoup

URL = "https://www.hockey-reference.com/leagues/NHL_2021_games.html"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="all_games")

table = soup.find('div', attrs = {'id':'div_games'}) 
print(table.prettify())

>Solution :

Select the table not the div to print the table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

table = soup.find('table', attrs = {'id':'games'}) 
print(table.prettify())

Or use pandas.read_html() to get the table and transform into a dataframe:

import pandas as pd

pd.read_html('https://www.hockey-reference.com/leagues/NHL_2021_games.html', attrs={'id':'games'})[0].iloc[:,:5]

Output:

Date Visitor G Home G.1
2021-01-13 St. Louis Blues 4 Colorado Avalanche 1
2021-01-13 Vancouver Canucks 5 Edmonton Oilers 3
2021-01-13 Pittsburgh Penguins 3 Philadelphia Flyers 6
2021-01-13 Chicago Blackhawks 1 Tampa Bay Lightning 5
2021-01-13 Montreal Canadiens 4 Toronto Maple Leafs 5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading