Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – Extracting only necessary elements from a string

I’m trying to extract only the parts I need from the table.

    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0

All I need is ‘text text’ and/ immediately followed by two numbers and ‘low’ as shown below.

    texttext    0   100 lowness
    texttext    10  650 lowness
    texttext    0   30  lowness
    texttext    0   0   lowness
    texttext    10  70  lowness

I tried this but failed.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

text = """
    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0
"""

for a in text.split('\n'):
    if a == "":
        continue
    else:
        print(a)
        m = re.match('(^\D\d*\D)(\w*\s)(\d*\s)(\d*\s)(\d*\s\d*\s\d*\s\d*\s\d*\s)(\w+)', a)
        print(m)
        print(m.group(2), m.group(3), m.group(4), m.group(6))

I tried to group by regex and get the parts, but I got the following error: Help / print(m.group(2), m.group(3), m.group(4), m.group(6))
AttributeError: ‘NoneType’ object has no attribute ‘group’

>Solution :

If you absolutely want to use a regular expression:

import re

text = """
    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0
"""
pattern = re.compile(
    r"\s*\d+\s+(\w+)\s+(\d+)\s+(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\w+)\s+"
)

for line in text.strip().split('\n'):
    match = re.search(pattern, line)
    print(*match.groups())

Output:

texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness

But if it is really the case that it’s always the same number of space-separated substrings of characters, then you might really be better off just splitting the lines by spaces:

for line in text.strip().split('\n'):
    parts = line.split()
    print(parts[1], parts[2], parts[3], parts[9])

Same output.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading