Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

extracting columns, skipping certain rows in a file for data processing

I am trying to process the input.txt using the test.py script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.

In general, I am looking for a [XXXYY] {TAG} pattern and once I find that pattern, if the next column starts with J, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND ( and [00272] POS_3V3) until I see the next [XXXYY] {TAG} pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.

input.txt
<<< Test List >>>
Mounting Hole                   MH1            APBC_MH_3.2x7cm
Mounting Hole                   MH2            APBC_MH_3.2x7cm
Mounting Hole                   MH3            APBC_MH_3.2x7cm
Mounting Hole                   MH4            APBC_MH_3.2x7cm

[00001] DEBUG_SCAR_RX
        J1         B30     PIO37          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        R2         2       2              PASSIVE     4.7kR

[00002] DEBUG_SCAR_TX
        J1         B29     PIO36          PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00003] DYOR_DAT_0
        J2         B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00033] GND
        DP1        5       5              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        6       6              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        7       7              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V

[00271] POS_3.3V_INH
        Q2         3       DRAIN          PASSIVE     2N7002
        R34        2       2              PASSIVE     4.7kR

[00272] POS_3V3
        J1         B13     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J1         B14     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J2         B59     FETO_HDB       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

test.py
import re

# Read the input file
with open('input.txt', 'r') as file:
    content = file.readlines()

# Process the data and extract the required information
result = []
component_name = ""
for line in content:
    line = line.strip()
    if line.startswith("["):
        s = re.sub(r"([\[0-9]+\]) (\w+)$", r"\2", line)
    elif line.startswith("J"):
        sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"\1\2", line)
        print("%s\t%s" % (s, sp))

output
DEBUG_SCAR_RX   J1B30          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX   J1B29          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0  J2B12     PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B13       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B14       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J2B59       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

expected
DEBUG_SCAR_RX   J1 B30 PIO
DEBUG_SCAR_TX   J1 B29 PIO
DYOR_DAT_0  J2 B12 APB

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Maybe you can use:

import re

TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']

data = []
with open('input.txt') as file:
    for row in file:
        row = row.strip()       
        if row.startswith('['):
            tag = row.split(']')[1].strip()
        elif row == '':
            continue
        else:
            cols = re.split('\s+', row)
            if cols[0].startswith('J') and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[3][:3]])

Output:

# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
 ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
 ['DYOR_DAT_0', 'J2', 'B12', 'APB']]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading