I have an xml file that has ref tags nested inside para tags:
<para>here be text<ref> REF 1 </ref>and here be some more text</para>
Is there a way using Beautiful Soup to extract the string between the opening para tag and the opening ref tag, ie:
here be text
I’ve tried various things to no avail, including find_previous:
soup = BeautifulSoup(file, 'xml')
ref = soup.find('ref')
ref_before = ref.find_previous('para')
But (obviously) ref_before returns the entire contents of the para tag, ie:
here be text REF 1 and here be some more text
I think this ought to be really simple but I don’t have much experience and just can’t crack it. Any help much appreciated.
>Solution :
You can use contents and select the first element:
soup.find('para').contents[0]
Output:
'here be text'