Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

parsing XML to CSV using python "None type error"

Trying to convert XML to CSV.
new to python parsing
Data sample ("Dummy data")

<users>
<user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
<user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
<user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
<user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/></users>

I tried this code
import csv

import xml.etree.ElementTree as Xet

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
    firstName = user.find("firstName").text
    lastName = user.find("lastName").text
    age = user.find("age").text
    sex = user.find("sex").text
    retired = user.find("retired").text
    dependants = user.find("dependants").text
    marital_status = user.find("marital_status").text
    salary = user.find("salary").text
    pension = user.find("pension").text
    company = user.find("company").text
    commute_distance = user.find("commute_distance").text
    address_postcode = user.find("address_postcode").text
  
    rows.append({"firstName": firstName,
                 "lastName": lastName,
                 "age": age,
                 "sex": sex,
                 "retired": retired,
                 "dependants": dependants,
                 "marital_status": marital_status,
                 "pension": pension,
                 "salary": salary,
                 "company": company,
                 "commute_distance": commute_distance,
                 "address_postcode": address_postcode})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv') 

Getting this error

AttributeError Traceback (most recent call last)

<ipython-input-3-c6016197ed71> in <module>()
      7 root = xmlparse.getroot()
      8 for user in root:
----> 9     firstName = user.find("firstName").text
     10     lastName = user.find("lastName").text
     11     age = user.find("age").text

AttributeError: 'NoneType' object has no attribute 'text'

>Solution :

The tag has an attribute with firstName. Therefore you should use:

user.attrib['firstName']

If you check: user.attrib, it will return a dictionary (this is not true, it returns lxml.etree._Attrib which can be converted to a dictionary using (dict(user.attrib)). That will give you an opportunity to make your code easier since you can just use the dictionary like a normal python dictionary.

For example you can create a list and append all dictionaries to the list. At the end it is possible to convert a list of dictionaries to a pandas dataframe:

d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])

    name  age additional
0   john   19        NaN
1  Steve   16        NaN
2    Jim   25  something
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading