When I am scraping a table from a website, it is missing the bottom 5 rows of data and I do not know how to pull them. I am using a combination of BeautifulSoup and Selenium. I thought that they were not loading, so I tried scrolling to the bottom with Selenium, but that still did not work.
Code trials:
site = 'https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One'
PATH = my_path
driver = webdriver.Chrome(PATH)
driver.get(site)
webpage = bs.BeautifulSoup(driver.page_source, features='html.parser')
table = webpage.find('table', {'class': 'stats_table sortable min_width now_sortable'})
print(table.prettify())
df = pd.read_html(str(table))[0]
print(df.tail())
Please could you help with scraping the full table?
>Solution :
Using only Selenium to pull all the rows from the table within the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategy:
-
Using CSS_SELECTOR:
tabledata = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.stats_table.sortable.min_width.now_sortable"))).get_attribute("outerHTML") tabledf = pd.read_html(tabledata) print(tabledf) -
Using XPATH:
driver.get('https://fbref.com//en/comps/15/10733/schedule/2020-2021-League-One') data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='stats_table sortable min_width now_sortable']"))).get_attribute("outerHTML") df = pd.read_html(data) print(df) -
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC -
Console Output:
[ Round Wk Day ... Referee Match Report Notes 0 Regular Season 1 Sat ... Charles Breakspear Match Report NaN 1 Regular Season 1 Sat ... Andrew Davies Match Report NaN 2 Regular Season 1 Sat ... Kevin Johnson Match Report NaN 3 Regular Season 1 Sat ... Anthony Backhouse Match Report NaN 4 Regular Season 1 Sat ... Marc Edwards Match Report NaN .. ... ... ... ... ... ... ... 685 Semi-finals NaN Tue ... Robert Madley Match Report Leg 1 of 2 686 Semi-finals NaN Wed ... Craig Hicks Match Report Leg 1 of 2 687 Semi-finals NaN Fri ... Keith Stroud Match Report Leg 2 of 2; Blackpool won 688 Semi-finals NaN Sat ... Michael Salisbury Match Report Leg 2 of 2; Lincoln City won 689 Final NaN Sun ... Tony Harrington Match Report NaN [690 rows x 13 columns]]