Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BeautifulSoup unable to find XML element with special characters in name

I’m trying to parse an XML document written by a BI software program (Tableau, specifically!). I’m using BS4 and have followed multiple other StackOverflow solutions which haven’t worked for me. Hoping someone will be able to point out what I’m doing wrong.

This is my XML
<datasources>
  <datasource>
    <_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
      <objects>
        <object caption='table' id='table'>
          <properties context='extract'>
            <relation name='Extract' table='[Extract].[Extract]' type='table' />
          </properties>
        </object>
      </objects>
    </_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
  </datasource>
</datasources>

And I’ve cleaned up code below so I can post it here:

Parsing the tree
soup = BeautifulSoup(xmlstr, 'lxml')
print(soup.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph"))
# This works! Prints the object markup

datasources = soup.find('datasources').find_all('datasource')
for ds in datasources:
    print(ds['caption'])
    print(ds['name'])
    # This works!

    result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
    print(result.name)
    # This doesn't work! returns none

    for tag in ds:
        if tag.name == "_.fcp.objectmodelencapsulatelegacy.true...object-graph":
           print(tag.name)
           # This works ^^

As you can tell, the item definitely exists within the tag it’s supposed to be in. Iterating the elements inside the datasource spits out the element I’m looking for & checking if name = the one I’m looking for confirms it’s in there. But for some reason when I access it with find or find_all when I’m looking inside the datasource, I keep getting none returned. I thought the issue was with the name (as some StackOverflow posts suggested) but it would appear not as soup.find catches the element. So I’m at a loss, any help would be appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks!

>Solution :

Try the following code. It should work.

from bs4 import BeautifulSoup

xmlstr = '''
<datasources>
  <datasource>
    <_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
      <objects>
        <object caption='table' id='table'>
          <properties context='extract'>
            <relation name='Extract' table='[Extract].[Extract]' type='table' />
          </properties>
        </object>
      </objects>
    </_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
  </datasource>
</datasources>
'''
soup = BeautifulSoup(xmlstr, 'lxml')

datasources = soup.find_all('datasources')#.find_all('datasource')
for ds in datasources:
    print(ds.find('object')['caption'])
    print(ds.find('relation')['name'])
    # This works!

    result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
    print(result.name)

Output:

table
Extract
_.fcp.objectmodelencapsulatelegacy.true...object-graph
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading