Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

regex conditional extract in python-re

I am trying to extract the SQLCODE which typically is 4 integers from error text like below :

1. SQLSTATE: 22018, SQLCODE: 3535.
2. SQLSTATE: 40001, SQLCODE: 2631 Session Id           629709103
3. SQLSTATE: 40001, SQLCODE: 2631                Session Id  594700603
4. SQLSTATE: T7547, SQLCODE:   754Session Id613234380

I have below pattern matching regex for the same currently. However, there is aedge case like 4 which is failing.

error_cd = re.findall(r'SQLCODE:\s([^.,\s]+)', err_log)

If there are no 4 integers after SQLCODE, i would like to extract the SQLSTATE text after the letter ‘T’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected Output :

1. 3535
2. 2631
3. 2631
4. 7547

Any suggestions on how to achieve this is appreciated. Thank you.

>Solution :

This could probably be done purely in regex, using a lookahead/lookbehind to handle the conditional extraction, but that could get really messy.

SOLUTION 1: Pure Regex:

EDIT: Here’s the pure regex solution… simpler than I thought it would be (though definitely more error-prone than the mixed approach… would need some additional logic to make it more robust):

re.findall(r'((?:(?<=SQLSTATE: T)[0-9]{4})|(?:(?<=SQLCODE: )[0-9]{4}))', err_log)

SOLUTION 2: Regex and Python:

The following solution uses regex to pull both the SQLSTATE and SQLCODE values, and uses a list comprehension to do the conditional extraction:

err_log = '''
1. SQLSTATE: 22018, SQLCODE: 3535.
2. SQLSTATE: 40001, SQLCODE: 2631 Session Id           629709103
3. SQLSTATE: 40001, SQLCODE: 2631                Session Id  594700603
4. SQLSTATE: T7547, SQLCODE:   754Session Id613234380
'''

error_st_cd = re.findall(r'SQLSTATE: [A-Z]*([0-9]+), SQLCODE:\s([0-9]{4})?', err_log)
error_cd = [codes[1] or codes[0] for codes in error_st_cd]

for i, cd in enumerate(error_cd):
    print(f'{i+1}. {cd}')

Output:

  1. 3535
  2. 2631
  3. 2631
  4. 7547
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading