Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

lxml xpath syntax to access the ancestor of an XML element of specific depth?

I am trying to access ancestors of depth 3 in an XML file, i.e. for element /a/b/c/d/e/f, I want to get element c.

Here is my more realistic example input file:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:QDA-XML:project:1.0">
  <Sources>
    <TextSource name="document example">
      <Description />
      <PlainTextSelection>
        <Description />
        <Coding>
          <CodeRef targetGUID="a2a627dd-f7e7-4fc7-b8db-918e3ad50450" />
        </Coding>
      </PlainTextSelection>
    </TextSource>
    <VideoSource name="myvideo">
      <Transcript>
        <SyncPoint/>
        <SyncPoint/>
        <TranscriptSelection>
          <Description />
          <Coding>
            <CodeRef targetGUID="a2a627dd-f7e7-4fc7-b8db-918e3ad50450" />
          </Coding>
        </TranscriptSelection>
      </Transcript>
      <VideoSelection>
        <Coding>
          <CodeRef targetGUID="a2a627dd-f7e7-4fc7-b8db-918e3ad50450" />
        </Coding>
      </VideoSelection>
    </VideoSource>
  </Sources>
  <Notes>
    <Note name="some text">
      <Description />
      <PlainTextSelection>
        <Description />
        <Coding>
          <CodeRef targetGUID="a2a627dd-f7e7-4fc7-b8db-918e3ad50450" />
        </Coding>
      </PlainTextSelection>
    </Note>
  </Notes>
</Project>

In this case for instance, I want to access the elements Note, TextSource and VideoSource that are ancestors of CodeRef elements.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have the following working code, but am wondering if there is a nicer way to go about it, perhaps using Xpath syntax:

import lxml.etree as ET

tree = ET.parse('coderef_examples/project_simplified.xml')
root = tree.getroot()

for i in root.findall('.//CodeRef', root.nsmap):
    p = tree.getelementpath(i)
    p = p.replace('{urn:QDA-XML:project:1.0}', '')
    print('namespace-free path: ', p)

    p = tree.getpath(i) # Xpath
    s = '/'.join(p.split('/')[:4]) # Xpath of depth 3
    print('xpath string: ', s)
    ancestor = root.xpath(s)[0]
    print('source tag: ', ancestor.tag, ', source name: ', ancestor.get('name'))

Output:

namespace-free path:  Sources/TextSource/PlainTextSelection/Coding/CodeRef
xpath string:  /*/*[1]/*[1]
source tag:  {urn:QDA-XML:project:1.0}TextSource , source name:  document example
namespace-free path:  Sources/VideoSource/Transcript/TranscriptSelection/Coding/CodeRef
xpath string:  /*/*[1]/*[2]
source tag:  {urn:QDA-XML:project:1.0}VideoSource , source name:  myvideo
namespace-free path:  Sources/VideoSource/VideoSelection/Coding/CodeRef
xpath string:  /*/*[1]/*[2]
source tag:  {urn:QDA-XML:project:1.0}VideoSource , source name:  myvideo
namespace-free path:  Notes/Note/PlainTextSelection/Coding/CodeRef
xpath string:  /*/*[2]/*
source tag:  {urn:QDA-XML:project:1.0}Note , source name:  some text

Can it be done directly in Xpath? (ideally in a way that is independent of the tag of the ancestor and the depth of the CodeRef elements)

edit: Solution based on Conal Tuohy’s answer:

import lxml.etree as ET

tree = ET.parse('coderef_examples/project_simplified.xml')
root = tree.getroot()

for ancestor in root.xpath('/*/*/*[descendant::qda:CodeRef]', namespaces={'qda': 'urn:QDA-XML:project:1.0'}):
    print('source tag: ', ancestor.tag, ', source name: ', ancestor.get('name'))

Much faster and much more efficient. It may not print an entry for every CodeRef node, but since this is what I want in my case, it is even better.

>Solution :

This XPath will return all elements which are at depth 3:

/*/*/*

(read as "any element which is the child of an element which is the child of an element which is the child of the document root")

You mention that you want elements which are ancestors of codeRef elements. To add that as a filter, you could do this:

/*/*/*[descendant::qda:codeRef]

(where qda is a namespace prefix bound to the URI urn:QDA-XML:project:1.0)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading