Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Match two regex patterns multiple times

I have this string "Energy (kWh/m²)" and I want to get "Energy__KWh_m__", meaning, replacing all non word characters and sub/superscript characters with an underscore.

I have the regex for replacing the non word characters -> re.sub("[\W]", "_", column_name) and the regex for replacing the superscript numbers -> re.sub("[²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]", "", column_name)

I have tried combining this into one single regex but I have had no luck. Every time I try I only get partial replacements like "Energy (KWh_m__" – with a regex like ([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]).*(\W)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any help? Thanks!

>Solution :

As per your current code, if you plan to remove the superscript chars and replace all other non-word chars with an underscore, you can use

re.sub(r'([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ])|\W', lambda x: '' if x.group(1) else '_', text)

If you plan to match all the non-word chars and the chars in the character class you have, just merge the two:

re.sub(r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]', '_', text)

See this second regex demo. Note that the \W matches the symbols, so you can even shorten this to r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹ⁿ]'.

See the Python demo:

import re
text="Energy (kWh/m²)"
print(re.sub(r'([²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ])|\W', lambda x: '' if x.group(1) else '_', text)) # => Energy__kWh_m_
print(re.sub(r'[\W²³¹⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ]', '_', text)) # => Energy__kWh_m__
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading