Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Turn camel case fields with numerical digits into snake case using Python

I have the below method in Python (3.10):

import re

def camel_to_screaming_snake(value):
    return '_'.join(re.findall('[A-Z][a-z]+|[0-9A-Z]+(?=[A-Z][a-z])|[0-9A-Z]{2,}|[a-z0-9]{2,}|[a-zA-Z0-9]'
                                        , value)).upper()

The goal I am trying to accomplish is to insert an underscore every time the case in a string changes, or between a non-numeric character and a numeric character. However, the following cases being passed in are not yielding expected results. (left is what is passed in, right is what is returned)

vn3b -> VN3B
vnRbb250V -> VN_RBB_250V

I am expecting the following return values:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

vn3b -> VN_3_B
vnRbb250V -> VN_RBB_250_V

What is wrong with my regex preventing this from working as expected?

>Solution :

You want to catch particular boundaries and insert an _ in them. To find a boundary, you can use the following regex format: (?<=<before>)(?=<after>). These are non capturing lookaheads and lookbehinds where before and after are replaced with the conditions.

In this case you have defined 3 ways that you might need to find a boundary:

  • A lowercase letter followed by an uppercase letter: (?<=[a-z])(?=[A-Z])
  • A number followed by a letter: (?<=[0-9])(?=[a-zA-Z])
  • A letter followed by a number: (?<=[a-zA-Z])(?=[0-9])

You can combine these into a single regex with an or | and then call sub on it to replace the boundaries. Afterwards you can uppercase the output:

import re

boundaries_re = re.compile(
    r"((?<=[a-z])(?=[A-Z]))"
    r"|((?<=[0-9])(?=[a-zA-Z]))"
    r"|((?<=[a-zA-Z])(?=[0-9]))"
)

def format_var(text):
    return boundaries_re.sub("_", text).upper()
>>> format_var("vn3b")
VN_3_B
>>> format_var("vnRbb250V")
VN_RBB_250_V
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading