Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Selenium get text out of tags

I have been trying to scrape a webpage with Python and Selenium and ran into this problem. Basically the webpage that I’m scrapping shows information in a table with pagination, so I want to get the information from all pages. This is the HTML for the pagination system when I’m at a page that’s not last page (page 2 in this case):

<span class="pagelinks">
   " ["
   <a href="?page=1">First</a>
   "/"
   <a href="?page=2">Previous</a>
   "] "
   <a href="?page=1" title="Go to page 1">1</a>
   ", "
   <strong>2</strong>
   ", "
   <a href="?page=3" title="Go to page 3">3</a>
   " ["
   <a href="?page=3">Next</a>
   "/"
   <a href="?page=3">Last</a>
   "] "
</span>

And this is the HTML I get when I reach last page (page 3 in this case):

<span class="pagelinks">
   " ["
   <a href="?page=1">First</a>
   "/"
   <a href="?page=2">Previous</a>
   "] "
   <a href="?page=1" title="Go to page 1">1</a>
   ", "
   <a href="?page=2" title="Go to page 2">2</a>
   ", "
   <strong>3</strong>
   " [Next/Last]"
</span>

In this case page 3 is selected and appears as <strong>, but this changes deppending on current page.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In order to check if I’m at last page, I want to check if the text "[Next/Last]" is next text after the <strong>tag to stop the while loop that retrieves the information, but since this text is out of any tag, I found no way to check this, how can I check it?

>Solution :

According to your updated explanations we can look for a with href attribute and Next text content. The same can be done for Last text.
With Selenium / Python you can simply use this line:

if driver.find_elements(By.XPATH, "//span[@='pagelinks']//a[@href][contains(text(),'Next')]"):
    #do what you need to do while still not on the last page
    #otherwise you this block will be skipped 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading