Home Python XML parser with a specific rule

Questions

Python XML parser with a specific rule

December 8, 2021

I have an xml file like this:

<?xml version="1.0" encoding="utf-8"?><!--Generated by Screaming Frog SEO Spider 16.3-->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://orinab.com/</loc>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://orinab.com/cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%B4%D9%BE%D8%B2%D8%AE%D8%A7%D9%86%D9%87-%D8%A2%D9%85%D8%A7%D8%AF%D9%87-%D9%81%D9%84%D8%B2%DB%8C-%D8%AF%D8%B1%D8%A8-%DA%86%D9%88%D8%A8%DB%8C</loc>
    <changefreq>daily</changefreq>
    <priority>0.9</priority>
  </url>
  <url>
    <loc>https://orinab.com/sales-associates</loc>
    <changefreq>daily</changefreq>
    <priority>0.9</priority>
  </url>
  <url>
    <loc>https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC</loc>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
  ...

and I want to append links with kitchen-cabinet rule to a list.
any suggestions would be appreciated.

>Solution :

I am not that good with xml, but one thing you can use is regex:

import re
reg = re.compile(r'(https:.*kitchen-cabinet.*)(?=<)')
reg.findall(xml)

>> ['https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC']

# xml variable:
xml = '''
  <url>
    <loc>https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC</loc>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC</loc>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC</loc>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
'''
reg.findall(xml)
>>> ['https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC',
 'https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC',
 'https://orinab.com/kitchen-cabinet/%DA%A9%D8%A7%D8%A8%DB%8C%D9%86%D8%AA-%D8%A2%D8%A8%DA%86%DA%A9%D8%A7%D9%86-%D9%81%D9%84%D8%B2%DB%8C-%D8%B1%D9%86%DA%AF-%DA%A9%D8%A7%D8%AC']

Edit:

with open('file.xml', 'r') as f:
    trim = reg.findall(f.read())
    print(trim)

byMR

Published December 08, 2021

Add a comment

Network state casting issue in java

byMR

December 8, 2021

Questions

How to convert negative values to positive values before they are calculated in a generator expression with a slice?

byMR

December 8, 2021

Questions

BeautifulSoup: Extracting a Title and adjacent <a> tags

byMR

December 8, 2021

Questions

filtering pandas dataframe when data contains two parts

byMR

December 8, 2021

Questions

Why is my random binary data generated in JavaScript highly compressible?

byMR

December 8, 2021

Questions

How to display item info on selected item in RecyclerView using Kotlin

byMR

December 8, 2021

Python XML parser with a specific rule

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Network state casting issue in java

How to convert negative values to positive values before they are calculated in a generator expression with a slice?

BeautifulSoup: Extracting a Title and adjacent <a> tags

filtering pandas dataframe when data contains two parts

Why is my random binary data generated in JavaScript highly compressible?

How to display item info on selected item in RecyclerView using Kotlin

Keep Up to Date with the Most Important News

Python XML parser with a specific rule

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Network state casting issue in java

How to convert negative values to positive values before they are calculated in a generator expression with a slice?

BeautifulSoup: Extracting a Title and adjacent <a> tags

filtering pandas dataframe when data contains two parts

Why is my random binary data generated in JavaScript highly compressible?

How to display item info on selected item in RecyclerView using Kotlin

Discover more from Dev solutions