Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to parse XML children using python

I have parsed XML from website and i found that it has two branches (children),

How to Separate the two branches into two lists of dictionaries,

here’s my code so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content

#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)

#Get attributes of root
root_attr = root.attrib
print(root_attr)

#Finding children of root
for child in root:
    print(child.tag, child.attrib)

#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
    child_dict.append(child.tag)
    
tweets_branch = child_dict[0]
cities_branch = child_dict[1]

#the elements in the entire tree
[elem.tag for elem in root.iter()]

#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))

>Solution :

Using beautifulsoup module. To parse tweets and cities to list of dictionaries you can use this example:

import requests
from bs4 import BeautifulSoup

url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}

soup = BeautifulSoup(requests.get(url, params=params).content, "xml")

tweets = []
for t in soup.select("tweets > tweet"):
    tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})

cities = []
for c in soup.select("cities > city"):
    cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})

print(tweets)
print(cities)

Prints:

[
    {
        "id": "16620625 5686",
        "Name": "Kenyon Conley",
        "Phone": "0327 103 9485",
        "Email": "malesuada@lobortisClassaptent.edu",
        "Location": "45.5333, -73.2833",
        "GenderID": "male",
        "Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to.  #entrepreneur",
        "City": "Saint-Basile-le-Grand",
        "Country": "Canada",
        "Age": "34",
    },
    {
        "id": "16310427-5502",
        "Name": "Griffin Norton",
        "Phone": "0306 178 7917",
        "Email": "in.dolor.Fusce@necmalesuadaut.ca",
        "Location": "52.0000, 84.9833",
        "GenderID": "male",
        "Tweet": "!!!Veryy Bored!!!  ~~Craving Million's Of MilkShakes~~",
        "City": "Belokurikha",
        "Country": "Russia",
        "Age": "33",
    },

...
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading