Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to scrape specific element with a certain id in BeautifulSoup?

I am trying to scrape the table from baseball reference: https://www.baseball-reference.com/players/b/bondsba01.shtml, and the table I want is the one with id="batting_value", but when I trying to print out what I have scraped, the program returned an empty list instead. Any information or assistance is appreciated, thanks!

from bs4 import BeautifulSoup
from urllib.request import urlopen

root_page = "https://www.baseball-reference.com/players/b/bondsba01.shtml"
soup = BeautifulSoup(urlopen(root_page), features = 'lxml')

table = soup.find('table', id = 'batting_value')
print(table)

I’ve tried to print the <div> with id="div_batting_value" which contains the table in it, but still doesn’t work. However, I can successfully print out other <div> elements with different id.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Main issue here is that the table is hidden in the comments, so you have to bring it up first, before BeautifulSoup could find it – simplest solution in my opinion is to replace the specific characters in this case:

.replace('<!--','').replace('-->','')

Alternative is to be more specific and use bs4.Comment

Example
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
        requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->','')
)
soup.select_one('#batting_value')

Or in use with pandas.read_html():

import requests
import pandas as pd

df = pd.read_html(requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->',''), attrs={'id':'batting_value'})[0]
df[(~df.Lg.isna()) & (df.Lg != 'Lg')]

Results in:

Year Age Tm Lg G PA Rbat Rbaser Rdp Rfield Rpos RAA WAA Rrep RAR WAR waaWL% 162WL% oWAR dWAR oRAR Salary Pos Awards
0 1986 21 PIT NL 113 484 3 5 0 8 1 17 1.9 16 34 3.5 0.517 0.512 2.6 1 25 $60,000 *8/H RoY-6
1 1987 22 PIT NL 150 611 11 3 1 24 -3 36 3.7 21 57 5.8 0.525 0.523 3.2 2.1 33 $100,000 *78H/9 nan
20 2006 41 SFG NL 130 493 30 1 0 1 -4 27 2.5 15 42 4 0.52 0.516 3.9 -0.4 41 $19,331,470 *7H/D nan
21 2007 42 SFG NL 126 477 37 -1 -1 -10 -4 21 2 15 36 3.4 0.516 0.513 4.4 -1.5 46 $15,533,970 *7H/D AS
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading