First time asking for help on stack overflow and I am way in over my head.
I am currently working on a project where I need to take percentage based coordinate tuples from very large, variable length xml files, split them into separate X and Y lists, and then find the average difference between values in the lists.
I am currently stuck on splitting the tuples into X and Y lists.
import xml.etree.ElementTree as ET'
lem = []
tree = ET.parse('testdata.xml')
root = tree.getroot()
for GazePointOnDisplayArea in root.findall("./GazeData/Left/GazePointOnDisplayArea"):
le = GazePointOnDisplayArea.get('Value')
lem.append(le)
print(lem)
#A test xml file shortened to five elements gives the following output
['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
Ideally I’d like to end up with
x = [0.48734050, 0.48989120, 0.48709830, 0.48531740, 0.48797150]
y = [0.50727710, 0.50335540, 0.50172430, 0.50473010, 0.51031550]
I’ve tried *zip and mapping methods but nothing seems to work with this. I’m unsure if I’ve made a parsing error, or if it is to do with there being a decimal, or whatever else.
I am open to using python, numpy, or pandas.
Please advise.
>Solution :
From the output you’re getting it’s a one liner to the output you desire.
First you extract the numbers using regular expressions and then you use numpy to rearrange them:
import re
import numpy as np
text = ['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
x,y = np.array([[float(x) for x in re.findall(r"(\d+\.\d+)",line) ] for line in text]).T