Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python, ElementTree: Find specific content in XML tag?

I’m trying to do something I thought should be very simple in ElementTree: find elements with specific tag content. The docs give the example:

*[tag='text']* Selects all elements that have a child named *tag* whose complete text content, including descendants, equals the given *text*.

Which seems straightforward enough. However, it does not work as I expect. Suppose I want to find all examples of <note>NEW</note>. The following complete example:

#!/usr/bin/env python
import xml.etree.ElementTree as ET

xml = """<?xml version="1.0"?>
<entry>
<foo>blah</foo>
<foo>bblic</foo>
<foo>fjdks<note>NEW</note></foo>
<foo>fdfsd</foo>
<foo>ljklj<note>NEW</note></foo>
</entry>
"""

root = ET.fromstring(xml)

print("Number of 'foo' elements: %d" % len(root.findall('.//foo')))
print("Number of new 'foo' elements: %d" % len(root.findall('.//[note="NEW"]')))

Yields:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

$ python foo.py 
Number of 'foo' elements: 5
Traceback (most recent call last):
  File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 370, in iterfind
    selector = _cache[cache_key]
KeyError: ('.//[note="NEW"]',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/foo.py", line 17, in <module>
    print("Number of new 'foo' elements: %d" % len(root.findall('.//[note="NEW"]')))
  File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 411, in findall
    return list(iterfind(elem, path, namespaces))
  File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 384, in iterfind
    selector.append(ops[token[0]](next, token))
  File "/usr/lib/python3.10/xml/etree/ElementPath.py", line 193, in prepare_descendant
    raise SyntaxError("invalid descendant")
SyntaxError: invalid descendant

How am I meant to do this simple task?

>Solution :

docs says also that

Predicates (expressions within square brackets) must be preceded by a
tag name, an asterisk, or another predicate.

taking this is account

root.findall('.//[note="NEW"]')

is illegal, you should add * before [ to denote any tag i.e.

root.findall('.//*[note="NEW"]')

xor use tag name before [ to denote certain tag i.e.

root.findall('.//foo[note="NEW"]')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading