Im trying to scrape data from this website: https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=102142&CodigoTipoInstituicao=2, but switching from "Demonstração do Resultado" to "Balanço Patrimonial Ativo" on the upper right box, the whole table is under the CSS selector "#ctl00_cphPopUp_tbDados" but I cant get the data using selenium webdriver, I think the table is dynamic and loads under a script, but I don’t know other way to get this data. Here is the complete code so far:
cvm = input('Códigos CVM separados por vírgula: ')
lstcvm = list(map(str,cvm.split(',')))
for i in lstcvm:
url="https://bvmf.bmfbovespa.com.br/cias-listadas/empresas-listadas/ResumoDemonstrativosFinanceiros.aspx?codigoCvm="+i+"&idioma=pt-br"
driver = webdriver.Firefox()
driver.get(url)
dfp = driver.find_element(By.CSS_SELECTOR, "#ctl00_contentPlaceHolderConteudo_rptDocumentosDFP_ctl00_lnkDocumento")
webdriver.ActionChains(driver).click(dfp).perform()
time.sleep(10)
tabs=driver.window_handles
driver.switch_to.window(tabs[1])
print(driver.current_url)
box = driver.find_element(By.CSS_SELECTOR, "#cmbQuadro")
box.send_keys(Keys.HOME, Keys.RETURN)
time.sleep(1)
driver.maximize_window()
time.sleep(1)
balanco=driver.find_element(By.CSS_SELECTOR, "#ctl00_cphPopUp_tbDados").text
balanco
driver.switch_to.window(tabs[0])
print(driver.current_url)
print("Finalizado")
The sample input here is 9512
The portion of the code used trying to scrape the data is this one:
balanco=driver.find_element(By.CSS_SELECTOR, "#ctl00_cphPopUp_tbDados").text
>Solution :
Selecting Balanço Patrimonial Ativo and then to extract the data from the DFs Consolidadas / Balanço Patrimonial Ativo – (Reais Mil) table from the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategy:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
driver.get("https://www.rad.cvm.gov.br/ENETCONSULTA/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=102142&CodigoTipoInstituicao=2")
Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select#cmbQuadro")))).select_by_visible_text("Balanço Patrimonial Ativo")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iFrameFormulariosFilho")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#ctl00_cphPopUp_tbDados"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
Console Output:
[ 0 1 2 3
0 Conta Descrição 31/12/2020 31/12/2019
1 1 Ativo Total 987.419.000 926.011.000
2 1.01 Ativo Circulante 142.323.000 112.101.000
3 1.01.01 Caixa e Equivalentes de Caixa 60.856.000 29.714.000
4 1.01.02 Aplicações Financeiras 3.424.000 3.580.000
.. ... ... ... ...
61 1.02.03.03 Imobilizado em Andamento NaN NaN
62 1.02.04 Intangível 77.678.000 78.489.000
63 1.02.04.01 Intangíveis NaN NaN
64 1.02.04.01.01 Contrato de Concessão NaN NaN
65 1.02.04.02 Goodwill NaN NaN
[66 rows x 4 columns]]