First time asking for help on stack overflow and I am way in over my head.
I am currently working on a project where I need to take percentage based coordinate tuples from very large, variable length xml files, split them into separate X and Y lists, and then find the average difference between values in the lists.
I am currently stuck on splitting the tuples into X and Y lists.
import xml.etree.ElementTree as ET' lem =  tree = ET.parse('testdata.xml') root = tree.getroot() for GazePointOnDisplayArea in root.findall("./GazeData/Left/GazePointOnDisplayArea"): le = GazePointOnDisplayArea.get('Value') lem.append(le) print(lem)
#A test xml file shortened to five elements gives the following output
['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
Ideally I’d like to end up with
x = [0.48734050, 0.48989120, 0.48709830, 0.48531740, 0.48797150] y = [0.50727710, 0.50335540, 0.50172430, 0.50473010, 0.51031550]
I’ve tried *zip and mapping methods but nothing seems to work with this. I’m unsure if I’ve made a parsing error, or if it is to do with there being a decimal, or whatever else.
I am open to using python, numpy, or pandas.
From the output you’re getting it’s a one liner to the output you desire.
First you extract the numbers using regular expressions and then you use numpy to rearrange them:
import re import numpy as np text = ['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)'] x,y = np.array([[float(x) for x in re.findall(r"(\d+\.\d+)",line) ] for line in text]).T