Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python/Selenium: Any way to wildcard the end of an xpath? Or search for a specifically formatted piece of an xpath?

I am using python / selenium to archive some posts. They are simple text + images. As the site requires a login, I’m using selenium to access it.

The problem is, the page shows all the posts, and they are only fully readable on clicking a text labeled "read more", which brings up a popup with the full text / images.

So I’m writing a script to scroll the page, click read more, scrape the post, close it, and move on to the next one.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The problem I’m running into, is that each read more button is an identical element:

<a href="javascript:;" style="font-weight: 400">read more</a>

If I try to loop through them using XPaths, I run into the problem of them being formatted differently as well, for example:

//*[@id="page"]/div[2]/article[10]/div[2]/ul/li/a

//*[@id="page"]/div[2]/article[14]/div[2]/p[3]/a

I tried formatting my loop to just loop through the article numbers, but of course the xpath’s terminate differently. Is there a way I can add a wildcard to the back half of my xpaths? Or search just by the article numbers?

>Solution :

/ is used to go for direct child, use // instead to go from <article> to the <a>

//*[@id="page"]/div[2]/article//a[.="read more"]

This will give you a list of elements you can iterate. You might be able to remove the [.="read more"], but it might catch unrelated <a> tags, depends on the rest of the html structure.

You can also try looking for the read more elements directly by text

//a[.="read more"]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading