Home scrapping web link from 247sports

Questions

scrapping web link from 247sports

August 16, 2022

I am trying to grab a rankings history weblink from one url by using the following scrapping code

import requests
from bs4 import BeautifulSoup

url = 'https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/'

pageTree = requests.get(url, headers=headers)
Soup = BeautifulSoup(pageTree.content, 'html.parser')

past_link = Soup.find_all('ul', {'class':'ranks-list'})

past_link

I was able to generate this output

[<ul class="ranks-list">
 <li>
 <b>Natl.</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool">
 <strong>1</strong>
 </a>
 <a class="rank-history-link" href="https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/">
                     History
                 </a>
 </li>
 <li>
 <b>PRO</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&amp;Position=PRO">
 <strong>1</strong>
 </a>
 </li>
 <li>
 <b>GA</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&amp;State=GA">
 <strong>1</strong>
 </a>
 </li>
 <li>
 <b>All-Time</b>
 <a href="https://247sports.com/Sport/Football/AllTimeRecruitRankings/">
 <strong>6</strong>
 </a>
 </li>
 </ul>]

But going any further with something like as a "past_link.find_all(‘a’)" led to running into errors. What do you think should be the next step from here? Any assistance is truly appreciated. Thanks in advance.

>Solution :

To get rankings history link from that page you can use next example:

import requests
from bs4 import BeautifulSoup

url = "https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

history_link = soup.select_one(".rank-history-link")["href"]
print(history_link)

Prints:

https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/

beautifulsoup

byMR

Published August 16, 2022

Add a comment

Some cell of column is starting from comma so how can i remove it? in Dataframe

byMR

August 16, 2022

Questions

Python's requests triggers Cloudflare's security while accessing etherscan.io

byMR

August 16, 2022

Questions

how can i ensure that all boxes retain the same size regardless of how much text is in it

byMR

August 16, 2022

Questions

Powershell Command to store Azure credentials in Microsoft Graph

byMR

August 16, 2022

Questions

How to fill a second container color?

byMR

August 16, 2022

Questions

Why is React not re-rendering on setState?

byMR

August 16, 2022

scrapping web link from 247sports

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Some cell of column is starting from comma so how can i remove it? in Dataframe

Python's requests triggers Cloudflare's security while accessing etherscan.io

how can i ensure that all boxes retain the same size regardless of how much text is in it

Powershell Command to store Azure credentials in Microsoft Graph

How to fill a second container color?

Why is React not re-rendering on setState?

Keep Up to Date with the Most Important News

scrapping web link from 247sports

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Some cell of column is starting from comma so how can i remove it? in Dataframe

Python's requests triggers Cloudflare's security while accessing etherscan.io

how can i ensure that all boxes retain the same size regardless of how much text is in it

Powershell Command to store Azure credentials in Microsoft Graph

How to fill a second container color?

Why is React not re-rendering on setState?

Discover more from Dev solutions