How to parse XML children using python

April 9, 2022

I have parsed XML from website and i found that it has two branches (children),

How to Separate the two branches into two lists of dictionaries,

here’s my code so far:

import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content

#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)

#Get attributes of root
root_attr = root.attrib
print(root_attr)

#Finding children of root
for child in root:
    print(child.tag, child.attrib)

#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
    child_dict.append(child.tag)
    
tweets_branch = child_dict[0]
cities_branch = child_dict[1]

#the elements in the entire tree
[elem.tag for elem in root.iter()]

#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))

>Solution :

Using beautifulsoup module. To parse tweets and cities to list of dictionaries you can use this example:

import requests
from bs4 import BeautifulSoup

url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}

soup = BeautifulSoup(requests.get(url, params=params).content, "xml")

tweets = []
for t in soup.select("tweets > tweet"):
    tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})

cities = []
for c in soup.select("cities > city"):
    cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})

print(tweets)
print(cities)

Prints:

[
    {
        "id": "16620625 5686",
        "Name": "Kenyon Conley",
        "Phone": "0327 103 9485",
        "Email": "malesuada@lobortisClassaptent.edu",
        "Location": "45.5333, -73.2833",
        "GenderID": "male",
        "Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to.  #entrepreneur",
        "City": "Saint-Basile-le-Grand",
        "Country": "Canada",
        "Age": "34",
    },
    {
        "id": "16310427-5502",
        "Name": "Griffin Norton",
        "Phone": "0306 178 7917",
        "Email": "in.dolor.Fusce@necmalesuadaut.ca",
        "Location": "52.0000, 84.9833",
        "GenderID": "male",
        "Tweet": "!!!Veryy Bored!!!  ~~Craving Million's Of MilkShakes~~",
        "City": "Belokurikha",
        "Country": "Russia",
        "Age": "33",
    },

...