How to Parse XY Coordinate Tuples and Split Them Into Separate X and Y Lists

First time asking for help on stack overflow and I am way in over my head.

I am currently working on a project where I need to take percentage based coordinate tuples from very large, variable length xml files, split them into separate X and Y lists, and then find the average difference between values in the lists.

I am currently stuck on splitting the tuples into X and Y lists.

import xml.etree.ElementTree as ET'

lem = []

tree = ET.parse('testdata.xml')

root = tree.getroot()

for GazePointOnDisplayArea in root.findall("./GazeData/Left/GazePointOnDisplayArea"):

        le = GazePointOnDisplayArea.get('Value')

        lem.append(le)

print(lem)

#A test xml file shortened to five elements gives the following output

['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']

Ideally I’d like to end up with

x = [0.48734050, 0.48989120, 0.48709830, 0.48531740, 0.48797150]
y = [0.50727710, 0.50335540, 0.50172430, 0.50473010, 0.51031550]

I’ve tried *zip and mapping methods but nothing seems to work with this. I’m unsure if I’ve made a parsing error, or if it is to do with there being a decimal, or whatever else.

I am open to using python, numpy, or pandas.

Please advise.

>Solution :

From the output you’re getting it’s a one liner to the output you desire.
First you extract the numbers using regular expressions and then you use numpy to rearrange them:

import re
import numpy as np

text = ['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
x,y = np.array([[float(x) for x in re.findall(r"(\d+\.\d+)",line) ] for line in text]).T

Leave a Reply