XML: Print previous Element of the findall() function

I’m working with an xml corpus that looks like this:

  <dialogue speaker="A">
    <sentence tag1="attribute1" tag2="attribute2"> Hello </sentence>
  <dialogue speaker="B">
    <sentence tag1="different_attribute1" tag2= "different_attribute2"> How are you </sentence>

I use root.findall() to search for all instances of "different_attribute2", but then I would like to print not only the parent element that contains the attribute but also the element that comes before that:

{'speaker': 'A'}
How are you

I’m quite new at coding, so I’ve tried a bunch of for loops and if statements without result. I start with:

for words in root.findall('.//sentence[@tag2="different_attribute2"]'):
    for speaker in root.findall('.//sentence[@tag2="different_attribute2"]...'):

But then I have absolutely no idea on how to retrieve Speaker A. Can anyone help me?

>Solution :

Using lxml and with a single xpath to find all elements:

>>> from lxml import etree
>>> tree = etree.parse('/home/lmc/tmp/test.xml')
>>> for e in tree.xpath('//sentence[@tag2="different_attribute2"]/parent::dialogue/@speaker | //sentence[@tag2="different_attribute2"]/text() | //dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/sentence/text() | //dialogue[following-sibling::dialogue/sentence[@tag2="different_attribute2"]]/@speaker'):
...      print(e)
 How are you 

Xpath details

Find speaker B

Find sentence of B

Find sentence of A given B

Find speaker=A given B

Leave a Reply