Home BeautifulSoup and Pandas read_html is not pulling all of the rows in a table

Questions

BeautifulSoup and Pandas read_html is not pulling all of the rows in a table

February 7, 2022

When I am scraping a table from a website, it is missing the bottom 5 rows of data and I do not know how to pull them. I am using a combination of BeautifulSoup and Selenium. I thought that they were not loading, so I tried scrolling to the bottom with Selenium, but that still did not work.

Code trials:

site = 'https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One'
PATH = my_path
driver = webdriver.Chrome(PATH)
driver.get(site)
webpage = bs.BeautifulSoup(driver.page_source, features='html.parser')

table = webpage.find('table', {'class': 'stats_table sortable min_width now_sortable'})
print(table.prettify())
df = pd.read_html(str(table))[0]

print(df.tail())

Please could you help with scraping the full table?

>Solution :

Using only Selenium to pull all the rows from the table within the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategy:

Using CSS_SELECTOR:

tabledata = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.stats_table.sortable.min_width.now_sortable"))).get_attribute("outerHTML")
tabledf = pd.read_html(tabledata)
print(tabledf)

Using XPATH:

driver.get('https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One')
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='stats_table sortable min_width now_sortable']"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:

[              Round   Wk  Day  ...             Referee  Match Report                         Notes
0    Regular Season    1  Sat  ...  Charles Breakspear  Match Report                           NaN
1    Regular Season    1  Sat  ...       Andrew Davies  Match Report                           NaN
2    Regular Season    1  Sat  ...       Kevin Johnson  Match Report                           NaN
3    Regular Season    1  Sat  ...   Anthony Backhouse  Match Report                           NaN
4    Regular Season    1  Sat  ...        Marc Edwards  Match Report                           NaN
..              ...  ...  ...  ...                 ...           ...                           ...
685     Semi-finals  NaN  Tue  ...       Robert Madley  Match Report                    Leg 1 of 2
686     Semi-finals  NaN  Wed  ...         Craig Hicks  Match Report                    Leg 1 of 2
687     Semi-finals  NaN  Fri  ...        Keith Stroud  Match Report     Leg 2 of 2; Blackpool won
688     Semi-finals  NaN  Sat  ...   Michael Salisbury  Match Report  Leg 2 of 2; Lincoln City won
689           Final  NaN  Sun  ...     Tony Harrington  Match Report                           NaN

    [690 rows x 13 columns]]