I’m currently using basic image mapping (for shelf tests if anyone is into market research). I upload an image to an image mapping app and then manually insert boxes around particular products which generates co-ordinates. I have an excel sheet with a macro that once these co-ordinates are pasted into will create code that I can put directly into my surveys. Initially, the co-ordinates are embedded in html code like the following
<area alt="" title="" href="http://www.image-maps.com/" shape="rect"
coords="10,32,202,115" style="outline:none;" target="_self" />
<area alt="" title="" href="http://www.image-maps.com/" shape="rect"
coords="206,25,346,124" style="outline:none;" target="_self" />
<area alt="" title="" href="http://www.image-maps.com/" shape="rect"
coords="340,27,448,121" style="outline:none;" target="_self" />
<area alt="" title="" href="http://www.image-maps.com/" shape="rect"
coords="446,2,559,119" style="outline:none;" target="_self" />
<area shape="rect" coords="998,567,1000,569" alt="Image Map" style="outline:none;"
title="Image Map"/>
These can get very long I was just wondering how I could extract just the co-ordinates with each set printed on a new line. All I have currently is the following which simply returns all the numbers and only works if the pasted html code is all on one line
import re
html = '#paste html block onto one line'
coords = re.findall("\d+", html)
print(coords)
My ideal output here would be
10,32,202,115
206,25,346,124
340,27,448,121
446,2,559,119
998,567,1000,569
Any suggestions ?
>Solution :
Beautiful Soup package for Python can do this very easily.
You can install it with pip install beautifulsoup4
Using the XML you provided (I saved it in a file called "stuff.xml"), the following code will get the coordinates for you:
from bs4 import BeautifulSoup
with open("stuff.xml") as xml_file:
soup = BeautifulSoup(xml_file.read(), "html.parser")
coords = [area["coords"] for area in soup.find_all('area')]
print(coords)
# -> ['10,32,202,115', '206,25,346,124', '340,27,448,121', '446,2,559,119', '998,567,1000,569']
Using regular expressions on XML is gonna cause problems sooner rather than later, better to use an actual XML parser.