I have such a construction
<p>File name</p>
<a href="https://somelink.pdf">Download</a>
I need to capture the link a and its name p using CSS and XPath. I’m trying to do the following, first I find using the CSS selector all files whose href values end in .pdf (a[href$=".pdf"]):
for i in response.css('a[href$=".pdf"]'):
link = i.css('::attr("href")').get()
name = i.xpath(?????????)
print(name, link)
How do I capture the text in the p element using XPath?
>Solution :
Starting from a
This XPath,
//a[.="Download"]/preceding-sibling::p[1]
will select the first p element siblings preceding each a element whose string value equals "Download".
Starting from p
This XPath,
//p[.="File name"]/following-sibling::a[1]
will select the first a element siblings following each p element whose string value equals "File name".
In either case, you can select the text node child by appending /text() to the XPaths.