Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to webscrapp a site where everything has the same name?

Hi ! I’m new to Beautifulsoup, I was trying to webscrapp the info from this website:

The problem is that when I try to inspect the elements on the website everything is called "td" and class"sch1". Therefore when I try to import I get a big mess. How can I import this information in a way that can be readible and usable, maybe I’ll try build a dataframe with this.

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://feeds.donbest.com/schedulemembers/getRotation.html?bookType=1&eventDate=20230129"
get_url = requests.get(url).content
soup = BeautifulSoup(get_url,"html.parser")

title = soup.find_all("td","schtop1")
 rotation = soup.find_all("td","sch1")

 title_list = []
 rotation_list = []

 for mainT in title:
     title_list.append(mainT.text)
 print(title_list)

 for rot in rotation:
     rotation_list.append(rot.text)
print(rotation_list)

Output:
[‘NFL CONFERENCE CHAMPIONSHIPS’, ‘SUNDAY, JANUARY 29, 2023’]
[‘321’, ‘SAN FRANCISCO 49ERS’, ”, ‘P: Sun Jan 29 12:00:00 PST 2023\xa0\n C: Sun Jan 29 14:00:00 PST 2023\xa0\n E: Sun Jan 29 15:00:00 PST 2023’, ‘322’, ‘PHILADELPHIA EAGLES’, ‘323’, ‘CINCINNATI BENGALS’, ”, ‘P: Sun Jan 29 15:30:00 PST 2023\xa0\n C: Sun Jan 29 17:30:00 PST 2023\xa0\n E: Sun Jan 29 18:30:00 PST 2023’, ‘324’, ‘KANSAS CITY CHIEFS’]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I need to be able to use this information to build a pandas dataframe that looks like this:

Date Rot Visitor Visitor Rot Home Home PST ET CT
SUNDAY, JANUARY 29, 2023 321 SAN FRANCISCO 49ERS 322 PHILADELPHIA EAGLES Sun Jan 29 12:00:00 PST 2023 Sun Jan 29 15:00:00 PST C: Sun Jan 29 14:00:00 PST 2023
SUNDAY, JANUARY 29, 2023 323 PHILADELPHIA EAGLES 324 CINCINNATI BENGALS Sun Jan 29 15:30:00 PST Sun Jan 29 18:30:00 PST 2023 Sun Jan 29 17:30:00 PST 2023

I think I can build the dataframe if I can get the data in a more useful format.

>Solution :

import pandas as pd


df = pd.read_html(
    'https://feeds.donbest.com/schedulemembers/getRotation.html?bookType=1&eventDate=20230129/')[0]
print(df)

Output:

0                       NFL CONFERENCE CHAMPIONSHIPS  ...  NFL CONFERENCE CHAMPIONSHIPS
1                           SUNDAY, JANUARY 29, 2023  ...      SUNDAY, JANUARY 29, 2023
2  321  SAN FRANCISCO 49ERS  P: Sun Jan 29 12:00:...  ...                           NaN
3  323  CINCINNATI BENGALS  P: Sun Jan 29 15:30:0...  ...                           NaN

[4 rows x 7 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading