Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Xpath – select first occurence of node with specific type

I am trying to select all of the first occurrences of a specific type in the following structure:

<div class="jobs-list">
    <div class="job-listing">
        <h3>Title1</h3>
        <span class="organization">
            <a href="https://www.domain1.org/" target="_blank">Org1</a>
        </span>
        <span class="location">Loc1</span>
        <div class="description">
            desc1
            <a href="https://www.domain1-1.org/" target="_blank">https://www.domain1-1.org/</a>
            <span class="list-date">Posted on: 01/19/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>Title2</h3>
        <span class="organization">
            <a href="https://www.domain2.org/" target="_blank">Org2</a>
        </span>
        <span class="location">Loc2</span>
        <div class="description">
            desc2
            <a href="https://www.domain2.org/" target="_blank">https://www.domain2.org/</a>
            <span class="list-date">Posted on: 01/18/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>Title3</h3>
        <span class="organization">
            <a href="https://www.domain3.org/" target="_blank">Org3</a>
        </span>
        <span class="location">Loc3</span>
        <div class="description">
            desc3            
            <a href="mailto:user@domain3.org">user@domain3.org</a>
            <span class="list-date">Posted on: 01/19/2022</span>
        </div>
    </div>
    <div class="job-listing">
        <h3>TItle4</h3>
        <span class="organization">Org4</span>
        <span class="location">Loc4</span>
        <div class="description">
            desc4
            <a href="mailto:user@domain4.org">user@domain4.org</a>
            <a href="https://www.domain4.org/" target="_blank">https://www.domain4.org/</a>
            <a href="https://www.domain4-1.org/" target="_blank">https://www.domain4-1.org/</a>
            <span class="list-date">Posted on: 01/06/2022</span>
        </div>
    </div>
</div>

Specifically, I need the result to be the following:

https://www.domain1.org/
https://www.domain2.org/
https://www.domain3.org/
https://www.domain4.org/

Which should be the first a/@href under each div[@class='job-listing'], but I’m not sure how to express that. Some things to note:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • The <a> is always two nodes under the root (job-listing)
  • The first <a> isn’t always correct (only looking for http), but I can filter those out easily enough; I’m caught up on how to select the node, not filtering for the content or anything like that.
  • I need the value of a/@href, not the contents of <a>.

Thanks!

>Solution :

//div[@class='job-listing']/descendant::a[1] gives you the first a descendant of each of those divs, if you want to add the check then use e.g. //div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1].

If you need the href attribute node use //div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1]/@href. Note that some default serialization for XSLT or XQuery doesn’t allow you to serialize a sequence of standalone attribute nodes but in XPath 2 or 3 you can of course use e.g. //div[@class='job-listing']/descendant::a[starts-with(@href, 'http')][1]/@href/string() to get a sequence of attribute values instead.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading