I’m trying to find the correct XPath expression to get only urls from all my documents, whatever the tag is. I’m trying with this one :
<urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://url
</loc>
<lastmod>2019-08-07T15:01:51+00:00
</lastmod>
</url>
</urlset>
The following expression gives me these results :
//*[contains(.,’http’)]//text()
2019-08-07T15:01:51+00:00
What I’m looking for is to get rid of the second line. I need to be able to get only urls from any xml file.
>Solution :
Well, let’s ignore the fact that not all URLs contain "http" and not everything that contains "http" is a URL…
To find all text nodes containing "http", just use //text()[contains(., 'http')].
Or you could find