I am trying to scrape the table from baseball reference: https://www.baseball-reference.com/players/b/bondsba01.shtml, and the table I want is the one with id="batting_value", but when I trying to print out what I have scraped, the program returned an empty list instead. Any information or assistance is appreciated, thanks!
from bs4 import BeautifulSoup
from urllib.request import urlopen
root_page = "https://www.baseball-reference.com/players/b/bondsba01.shtml"
soup = BeautifulSoup(urlopen(root_page), features = 'lxml')
table = soup.find('table', id = 'batting_value')
print(table)
I’ve tried to print the <div> with id="div_batting_value" which contains the table in it, but still doesn’t work. However, I can successfully print out other <div> elements with different id.
>Solution :
Main issue here is that the table is hidden in the comments, so you have to bring it up first, before BeautifulSoup could find it – simplest solution in my opinion is to replace the specific characters in this case:
.replace('<!--','').replace('-->','')
Alternative is to be more specific and use bs4.Comment
Example
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(
requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->','')
)
soup.select_one('#batting_value')
Or in use with pandas.read_html():
import requests
import pandas as pd
df = pd.read_html(requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->',''), attrs={'id':'batting_value'})[0]
df[(~df.Lg.isna()) & (df.Lg != 'Lg')]
Results in:
| Year | Age | Tm | Lg | G | PA | Rbat | Rbaser | Rdp | Rfield | Rpos | RAA | WAA | Rrep | RAR | WAR | waaWL% | 162WL% | oWAR | dWAR | oRAR | Salary | Pos | Awards | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1986 | 21 | PIT | NL | 113 | 484 | 3 | 5 | 0 | 8 | 1 | 17 | 1.9 | 16 | 34 | 3.5 | 0.517 | 0.512 | 2.6 | 1 | 25 | $60,000 | *8/H | RoY-6 |
| 1 | 1987 | 22 | PIT | NL | 150 | 611 | 11 | 3 | 1 | 24 | -3 | 36 | 3.7 | 21 | 57 | 5.8 | 0.525 | 0.523 | 3.2 | 2.1 | 33 | $100,000 | *78H/9 | nan |
| … | ||||||||||||||||||||||||
| 20 | 2006 | 41 | SFG | NL | 130 | 493 | 30 | 1 | 0 | 1 | -4 | 27 | 2.5 | 15 | 42 | 4 | 0.52 | 0.516 | 3.9 | -0.4 | 41 | $19,331,470 | *7H/D | nan |
| 21 | 2007 | 42 | SFG | NL | 126 | 477 | 37 | -1 | -1 | -10 | -4 | 21 | 2 | 15 | 36 | 3.4 | 0.516 | 0.513 | 4.4 | -1.5 | 46 | $15,533,970 | *7H/D | AS |