I am trying to webscrape the list of DAOs from masari.io but I am having trouble because I get the following errors:
DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
DevTools listening on ws://127.0.0.1:56691/devtools/browser/b4609671-5e6e-4d25-b09e-4116b3dde4bf
[0525/100030.252:INFO:CONSOLE(1)] "enabling sentry error tracker", source: https://messari.io/static/js/main.977a4794.chunk.js (1)
[0525/100030.951:INFO:CONSOLE(2)] "Unable to refresh token: Login required", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.065:INFO:CONSOLE(2)] "
88b d88 88
888b d888 ""
88'8b d8'88
88 '8b d8' 88 ,adPPYba, ,adPPYba, ,adPPYba, ,adPPYYba, 8b,dPPYba, 88
88 '8b d8' 88 a8P_____88 I8[ "" I8[ "" "" 'Y8 88P' "Y8 88
88 '8b d8' 88 8PP""""""" '"Y8ba, '"Y8ba, ,adPPPPP88 88 88
88 '888' 88 "8b, ,aa aa ]8I aa ]8I 88, ,88 88 88
88 '8' 88 '"Ybbd8"' '"YbbdP"' '"YbbdP"' '"8bbdP"Y8 88 88
", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.069:INFO:CONSOLE(2)] "Interested in a CHALLENGE? Check out: https://messari.io/quiz", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
Traceback (most recent call last):
File "c:/Users/Student/webScrape/scraper.py", line 21, in <module>
matches = WebDriverWait(driver, 10).until(
File "C:\Users\Student\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x0096B8F3+2406643]
Ordinal0 [0x008FAF31+1945393]
Ordinal0 [0x007EC748+837448]
Ordinal0 [0x008192E0+1020640]
Ordinal0 [0x0081957B+1021307]
Ordinal0 [0x00846372+1205106]
Ordinal0 [0x008342C4+1131204]
Ordinal0 [0x00844682+1197698]
Ordinal0 [0x00834096+1130646]
Ordinal0 [0x0080E636+976438]
Ordinal0 [0x0080F546+980294]
GetHandleVerifier [0x00BD9612+2498066]
GetHandleVerifier [0x00BCC920+2445600]
GetHandleVerifier [0x00A04F2A+579370]
GetHandleVerifier [0x00A03D36+574774]
Ordinal0 [0x00901C0B+1973259]
Ordinal0 [0x00906688+1992328]
Ordinal0 [0x00906775+1992565]
Ordinal0 [0x0090F8D1+2029777]
BaseThreadInitThunk [0x777BFA29+25]
RtlGetAppContainerNamedObjectPath [0x77B77A7E+286]
RtlGetAppContainerNamedObjectPath [0x77B77A4E+238]
I know there is an API for messari.io, but I am almost certain it is only for their assets and not their list of DAOs. I tried using Selenium since it is a dynamic page but I am still having trouble. Here is my code:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
url = 'https://messari.io/governor/daos'
DRIVER_PATH = 'PATH_TO_DRIVER_ON_MY_PC'
options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")
# s = Service('PATH_TO_DRIVER_ON_MY_PC')
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get('https://messari.io/governor/daos')
try:
matches = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "td")))
# for match in matches:
# print(match.text)
finally:
driver.quit()
Update I fixed the executable_path warning, but I am still getting the same TimeoutException error. And when I run it without headless I also get the following message:
DevTools listening on ws://127.0.0.1:57773/devtools/browser/4450b78d-3a9f-401a-b39c-2c716ecad924
[9628:20616:0525/102300.840:ERROR:device_event_log_impl.cc(214)] [10:23:00.840] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9628:20616:0525/102300.841:ERROR:device_event_log_impl.cc(214)] [10:23:00.841] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
I assume this part is more of a hardware message that I shouldn’t worry about based on similar questions bc when I unplugged my mouse it removed one of them.
>Solution :
This page doesn’t use <td>
to display list of DAOs.
It uses <div>
(with CSS
) to display it similar to table.
And it keeps name of DAO in <h4>
At least it uses and in my Firefox on laptop with Linux.
Full working code (tested on Linux Mint, Python 3.8, Selenium 4.x, Chrome 101.x)
I used module webdriver_manager
so it automatically downloads fresh driver when Linux installs newer version of Chrome
I have to use find_elements()
(with s
in word elements
) or presence_of_all_elements_located()
to get all <h4>
.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://messari.io/governor/daos'
options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")
driver = webdriver.Chrome(options=options, service=Service(ChromeDriverManager().install()))
driver.get('https://messari.io/governor/daos')
try:
matches = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.TAG_NAME, "h4")))
#matches = driver.find_elements(By.TAG_NAME, "h4")
for match in matches:
if match.text:
print(match.text)
finally:
driver.quit()
Result:
Fei
Rook
Cosmos
Stargate Finance
Aave
Treasure DAO
DODO
Radicle
Goldfinch
Merit Circle
EPNS
Perpetual Protocol
Gitcoin
SuperRare
Indexed
Doodles
Rome DAO
Badger
Paraswap
Unlock
Terra
Shapeshift
Lobis
Pool Together
The Graph
Yearn Finance
Ampleforth
Alpaca Finance
Balancer
Gro Protocol
Sismo DAO
BeethovenX
ENS
Lido
Alchemist