Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove substring of digits from string (Python)

<elem1><elem2>20,000 Leagues Under the Sea1050251</elem2></elem1>
<elem1><elem2>1002321Robinson Crusoe1050251</elem2></elem1>

I’m working with an XML file and had to insert elements above extracted from it into another XML file. The problem is, I have no idea how to remove the id (7-digit substrings) used to track the position from the string. Removing characters between ">" and "<" isn’t feasible, because text sometimes starts with id and sometimes with title that begins with numbers.
What I’d need is something that could remove only and any 7-digit substrings from a string, but I’ve only found code that can do it for specified substrings

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You can try with regex:

import re


string = """<elem1><elem2>20,000 Leagues Under the Sea1050251</elem2></elem1>
<elem1><elem2>1002321Robinson Crusoe1050251</elem2></elem1>"""

pattern = re.compile(r"\d{7}")  # pattern that matches exactly 7 consecutive ascii digits
result = pattern.sub("", string)  # returns a string where the matched pattern is replaced by the given string
print(result)

Output:

<elem1><elem2>20,000 Leagues Under the Sea</elem2></elem1>
<elem1><elem2>Robinson Crusoe</elem2></elem1>

Useful:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading