Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

common/unified regex for a set of pattern

I am trying to do some text processing and was interested to know if I can have a common/unified regex for a certain pattern. The pattern of interest is strings that ends with {string}_{i} where i is a number, on the second column of test.csv. Once the regex is matched, I wish to replace it with {string}[i].

For now the python script works as expected for the strings for which I explicitly mention the regex pattern. I want to have a more generic regex pattern that will match all the strings that have {string}_{i} instead of writing a regex for all the patterns (which is not scalable).

input test.csv
bom_a14 , COMP_NUM_0
bom_a17 , COMP_NUM_2
bom_a27 , COMP_NUM_11
bom_a35 , FUNC_1V8_OLED_OUT_7
bom_a38 , FUNC_1V8_OLED_OUT_9
bom_a39 , FUNC_1V8_OLED_OUT_10
bom_a46 , CAP_4
bom_a47 , CAP_3
bom_a48 , CAP_6
test.py
import csv
import re

# Match the values in the first column of the second file with the first file's data
with open('test.csv', 'r') as file2:
    reader = csv.reader(file2)
    for row in reader:
        row_1=row[1]
        # for matching COMP_NUM_{X}
        match_data = re.match(r'([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
        # for matching FUNC_1V8_OLED_OUT_{X}
        match_data2 = re.match(r'([A-Z]+)_([A-Z0-9]+)_([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
        # if match found, reformat the data
        if match_data:
            new_row_1 = match_data.group(1) +'_'+ match_data.group(2)+ '[' + match_data.group(3) + ']'
        elif match_data2:
            new_row_1 = match_data2.group(1) +'_'+ match_data2.group(2)+ '_'+ match_data2.group(3)+'_'+ match_data2.group(4)+'[' + match_data2.group(5) + ']'
        else:
            new_row_1 = row_1
        print new_row_1

output
COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
 CAP_4
 CAP_3
 CAP_6
expected output
COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I would use sub with a single generic pattern :

with open("test.csv", "r") as file2:
    for row in csv.reader(file2):

        s = re.sub(r"(.+)_(\d+)$", r"\1[\2]", row[-1].strip())

        print(s)

Regex : [demo]

Output :

COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading