Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

xpath to select text preceded by specific element

I’ve got the following html:

<body>
    <h1 id = 'example'>text</h1>
    "My car is a "
    <abbr>
        <a href = 'exampleRef'>
            Ferrari
        </a>
    </abbr>
    "that goes 100 km/h"
</body>

I’m trying to extract the text ‘My car is a Ferrari that goes 100 km/h". The text is not contained in any specific element so I thought of using the following-sibling syntax to extract at least ‘My car is’. I tried with the following expression:

//h1[@id ='example']/following-sibling::text()

and also

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

//h1[@id ='example']/following-sibling

but got no matches.

>Solution :

To extract the entire concatenated text "My car is a Ferrari that goes 100 km/h" from the HTML structure you provided, you’ll need to use XPath to navigate through the elements correctly. Since the desired text spans across multiple text nodes and elements, a straightforward XPath expression to directly extract this concatenated text might not be sufficient due to the structure of the HTML.

Instead, you can use XPath to individually select the relevant text nodes and then concatenate them programmatically. Here’s a step-by-step approach:

  1. Identify Relevant Nodes: First, identify the nodes that contain the text parts you want to concatenate:
  • The text "My car is a "
  • The text "that goes 100 km/h"
  • The text "Ferrari" within the <a> tag
  1. XPath to Select Specific Nodes:
  • To select the <h1> element with id="example":
    //h1[@id='example']
  • To select the text within the <a> tag:
//h1[@id='example']/following-sibling::abbr/a/text()
  1. Extract Text Content: Use XPath to extract the text content of these nodes.
  2. Concatenate Text: Combine the extracted text content programmatically to form the desired string.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading