Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python xmltodict shows inconsistent behaviour in XML arrays

I seen some, IMHO, inconsistent behaviour in Python xmltodoct.parse function.

  • When an array has only 1 element, it returns an iterator with the child-elements
  • When an array has more than 1 element, it returns an iterator with the elements in OrderedDict

See the example below:

import xmltodict

if __name__ == "__main__":
    xml01 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml01)
    print(xd)
    for x in xd['A']['C']:
        print(f"xml01: {x}")

    xml02 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml02)
    for x in xd['A']['C']:
        print(f"xml02: {x}")

The output is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

xml01: D
xml01: E
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])

I would expect that the output of the first iterator is:

xml01: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])

Now you need to do some type checking on the returned values of the iterator do know if there’s one or more elements.
And with more elements you need to do a new loop.

I’m curious what Python experts are thinking of this and what their solution would be.

>Solution :

You are right.

And if you are asking my opinion, then I think it should be changed such to have xml01 return a list with one child.

Though according to https://github.com/martinblech/xmltodict/issues/14, the devs are aware of this but will not fix it.

The accepted workaround is to add force_list as a parameter to the parse, thus forcing a list of OrderedDict for the child element.

In your case, this would look like this:

import xmltodict

if __name__ == "__main__":
    xml01 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml01, force_list=set('C'))
    for x in xd['A']['C']:
        print(f"xml01: {x}")

    xml02 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml02, force_list=set('C'))
    for x in xd['A']['C']:
        print(f"xml02: {x}")´

There is also a proposal to overwrite the dict_constructor to use defaultdict.

But you can browse this issue if that’s the way you want to go with it.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading