I am trying to do some text processing and was interested to know if I can have a common/unified regex for a certain pattern. The pattern of interest is strings that ends with {string}_{i} where i is a number, on the second column of test.csv. Once the regex is matched, I wish to replace it with {string}[i].
For now the python script works as expected for the strings for which I explicitly mention the regex pattern. I want to have a more generic regex pattern that will match all the strings that have {string}_{i} instead of writing a regex for all the patterns (which is not scalable).
input test.csv
bom_a14 , COMP_NUM_0
bom_a17 , COMP_NUM_2
bom_a27 , COMP_NUM_11
bom_a35 , FUNC_1V8_OLED_OUT_7
bom_a38 , FUNC_1V8_OLED_OUT_9
bom_a39 , FUNC_1V8_OLED_OUT_10
bom_a46 , CAP_4
bom_a47 , CAP_3
bom_a48 , CAP_6
test.py
import csv
import re
# Match the values in the first column of the second file with the first file's data
with open('test.csv', 'r') as file2:
reader = csv.reader(file2)
for row in reader:
row_1=row[1]
# for matching COMP_NUM_{X}
match_data = re.match(r'([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
# for matching FUNC_1V8_OLED_OUT_{X}
match_data2 = re.match(r'([A-Z]+)_([A-Z0-9]+)_([A-Z]+)_([A-Z]+)_(\d+)',row_1.strip())
# if match found, reformat the data
if match_data:
new_row_1 = match_data.group(1) +'_'+ match_data.group(2)+ '[' + match_data.group(3) + ']'
elif match_data2:
new_row_1 = match_data2.group(1) +'_'+ match_data2.group(2)+ '_'+ match_data2.group(3)+'_'+ match_data2.group(4)+'[' + match_data2.group(5) + ']'
else:
new_row_1 = row_1
print new_row_1
output
COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP_4
CAP_3
CAP_6
expected output
COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]
>Solution :
I would use sub with a single generic pattern :
with open("test.csv", "r") as file2:
for row in csv.reader(file2):
s = re.sub(r"(.+)_(\d+)$", r"\1[\2]", row[-1].strip())
print(s)
Regex : [demo]
Output :
COMP_NUM[0]
COMP_NUM[2]
COMP_NUM[11]
FUNC_1V8_OLED_OUT[7]
FUNC_1V8_OLED_OUT[9]
FUNC_1V8_OLED_OUT[10]
CAP[4]
CAP[3]
CAP[6]