Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to programmatically inspect and retrieve dynamic content from an Angular website using Python?

I’m trying to scrape a website built with Angular using Python, but I’m encountering issues with retrieving the dynamically generated content. When I make a direct HTTP request or view the page source, I only get the initial HTML, which contains the

    <app-root>
     <!-- empty app root -->
    </app-root> 

placeholder. However, when I inspect the rendered page in a browser, I can see the full content.
Here’s what the inspected page returns when i select it from the page rendered in browser:

    <app-root _nghost-ynj-c115 ng-version="14.3.0">
      <!-- Rendered HTML content from browser inspection -->
      ...


    </app-root>

I’ve tried using Selenium to wait for the content to be rendered, but I’m not sure if I’m using the correct selectors or if there’s a better approach. Here’s the code I’ve been using:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager

service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.headless = True
driver = webdriver.Chrome(service=service, options=options)

try:
    driver.get("https://www.fedlex.admin.ch/de/cc/international-law/0.1")
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "app-root ng-version"))
    )
    page_source = driver.page_source
finally:
    driver.quit()

print(page_source)

This code doesn’t seem to retrieve the dynamic content as expected. How can I programmatically inspect the page and retrieve the full content that’s rendered by Angular? Is there a specific way to interact with Angular applications using Selenium, or is there another tool or method I should consider for this task?

>Solution :

Your problem is that "app-root" is presented at start but is empty

Change this line, this element is where data is presented

EC.presence_of_element_located((By.XPATH, "//div[@id='content']"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading