I am trying to process the input.txt
using the test.py
script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.
In general, I am looking for a [XXXYY] {TAG}
pattern and once I find that pattern, if the next column starts with J
, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND
( and [00272] POS_3V3
) until I see the next [XXXYY] {TAG}
pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.
input.txt
<<< Test List >>>
Mounting Hole MH1 APBC_MH_3.2x7cm
Mounting Hole MH2 APBC_MH_3.2x7cm
Mounting Hole MH3 APBC_MH_3.2x7cm
Mounting Hole MH4 APBC_MH_3.2x7cm
[00001] DEBUG_SCAR_RX
J1 B30 PIO37 PASSIVE TRA6-70-01.7-R-4-7-F-UG
R2 2 2 PASSIVE 4.7kR
[00002] DEBUG_SCAR_TX
J1 B29 PIO36 PASSIVE TRA6-70-01.7-R-4-7-F-UG
[00003] DYOR_DAT_0
J2 B12 APB10_CC_P PASSIVE TRA6-70-01.7-R-4-7-F-UG
[00033] GND
DP1 5 5 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
DP1 6 6 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
DP1 7 7 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
[00271] POS_3.3V_INH
Q2 3 DRAIN PASSIVE 2N7002
R34 2 2 PASSIVE 4.7kR
[00272] POS_3V3
J1 B13 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
J1 B14 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
J2 B59 FETO_HDB PASSIVE TRA6-70-01.7-R-4-7-F-UG
test.py
import re
# Read the input file
with open('input.txt', 'r') as file:
content = file.readlines()
# Process the data and extract the required information
result = []
component_name = ""
for line in content:
line = line.strip()
if line.startswith("["):
s = re.sub(r"([\[0-9]+\]) (\w+)$", r"\2", line)
elif line.startswith("J"):
sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"\1\2", line)
print("%s\t%s" % (s, sp))
output
DEBUG_SCAR_RX J1B30 PASSIVE TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX J1B29 PASSIVE TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0 J2B12 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B13 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B14 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J2B59 PASSIVE TRA6-70-01.7-R-4-7-F-UG
expected
DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
>Solution :
Maybe you can use:
import re
TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
data = []
with open('input.txt') as file:
for row in file:
row = row.strip()
if row.startswith('['):
tag = row.split(']')[1].strip()
elif row == '':
continue
else:
cols = re.split('\s+', row)
if cols[0].startswith('J') and tag in TAGS:
data.append([tag, cols[0], cols[1], cols[3][:3]])
Output:
# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
['DYOR_DAT_0', 'J2', 'B12', 'APB']]