parsing XML to CSV using python "None type error"

January 27, 2022

Trying to convert XML to CSV.
new to python parsing
Data sample ("Dummy data")

<users>
<user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
<user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
<user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
<user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/></users>

I tried this code
import csv

import xml.etree.ElementTree as Xet

import pandas as pd

cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
    firstName = user.find("firstName").text
    lastName = user.find("lastName").text
    age = user.find("age").text
    sex = user.find("sex").text
    retired = user.find("retired").text
    dependants = user.find("dependants").text
    marital_status = user.find("marital_status").text
    salary = user.find("salary").text
    pension = user.find("pension").text
    company = user.find("company").text
    commute_distance = user.find("commute_distance").text
    address_postcode = user.find("address_postcode").text
  
    rows.append({"firstName": firstName,
                 "lastName": lastName,
                 "age": age,
                 "sex": sex,
                 "retired": retired,
                 "dependants": dependants,
                 "marital_status": marital_status,
                 "pension": pension,
                 "salary": salary,
                 "company": company,
                 "commute_distance": commute_distance,
                 "address_postcode": address_postcode})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv')

Getting this error

AttributeError Traceback (most recent call last)

<ipython-input-3-c6016197ed71> in <module>()
      7 root = xmlparse.getroot()
      8 for user in root:
----> 9     firstName = user.find("firstName").text
     10     lastName = user.find("lastName").text
     11     age = user.find("age").text

AttributeError: 'NoneType' object has no attribute 'text'

>Solution :

The tag has an attribute with firstName. Therefore you should use:

user.attrib['firstName']

If you check: user.attrib, it will return a dictionary (this is not true, it returns lxml.etree._Attrib which can be converted to a dictionary using (dict(user.attrib)). That will give you an opportunity to make your code easier since you can just use the dictionary like a normal python dictionary.

For example you can create a list and append all dictionaries to the list. At the end it is possible to convert a list of dictionaries to a pandas dataframe:

d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])

    name  age additional
0   john   19        NaN
1  Steve   16        NaN
2    Jim   25  something