Home Python – Extracting only necessary elements from a string

Questions

Python – Extracting only necessary elements from a string

September 13, 2022

I’m trying to extract only the parts I need from the table.

    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0

All I need is ‘text text’ and/ immediately followed by two numbers and ‘low’ as shown below.

    texttext    0   100 lowness
    texttext    10  650 lowness
    texttext    0   30  lowness
    texttext    0   0   lowness
    texttext    10  70  lowness

I tried this but failed.

text = """
    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0
"""

for a in text.split('\n'):
    if a == "":
        continue
    else:
        print(a)
        m = re.match('(^\D\d*\D)(\w*\s)(\d*\s)(\d*\s)(\d*\s\d*\s\d*\s\d*\s\d*\s)(\w+)', a)
        print(m)
        print(m.group(2), m.group(3), m.group(4), m.group(6))

I tried to group by regex and get the parts, but I got the following error: Help / print(m.group(2), m.group(3), m.group(4), m.group(6))
AttributeError: ‘NoneType’ object has no attribute ‘group’

>Solution :

If you absolutely want to use a regular expression:

import re

text = """
    2555    texttext    0   100 100 0   0   0   0   lowness 0
    2557    texttext    10  650 660 0   0   0   0   lowness 0
    2564    texttext    0   30  30  0   0   0   0   lowness 0
    2566    texttext    0   0   0   0   0   0   0   lowness 0
    2567    texttext    10  70  80  0   0   0   0   lowness 0
"""
pattern = re.compile(
    r"\s*\d+\s+(\w+)\s+(\d+)\s+(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\w+)\s+"
)

for line in text.strip().split('\n'):
    match = re.search(pattern, line)
    print(*match.groups())

Output:

texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness

But if it is really the case that it’s always the same number of space-separated substrings of characters, then you might really be better off just splitting the lines by spaces:

for line in text.strip().split('\n'):
    parts = line.split()
    print(parts[1], parts[2], parts[3], parts[9])

Same output.

regex

byMR

Published September 13, 2022

Add a comment

Can anyone explain what this script in SQL means?

byMR

September 13, 2022

Questions

How to route URL pattern with URL path separator?

byMR

September 13, 2022

Questions

BASH sed expression optimisation or conversion to native bash substitution

byMR

September 13, 2022

Questions

.getBoundingClientRect() not changing values on scroll

byMR

September 13, 2022

Questions

Check dataframe value to dictionary, append key to new column

byMR

September 13, 2022

Questions

How to access item in a nested list of dictionaries python?

byMR

September 13, 2022

Python – Extracting only necessary elements from a string

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Can anyone explain what this script in SQL means?

How to route URL pattern with URL path separator?

BASH sed expression optimisation or conversion to native bash substitution

.getBoundingClientRect() not changing values on scroll

Check dataframe value to dictionary, append key to new column

How to access item in a nested list of dictionaries python?

Keep Up to Date with the Most Important News

Python – Extracting only necessary elements from a string

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Can anyone explain what this script in SQL means?

How to route URL pattern with URL path separator?

BASH sed expression optimisation or conversion to native bash substitution

.getBoundingClientRect() not changing values on scroll

Check dataframe value to dictionary, append key to new column

How to access item in a nested list of dictionaries python?

Discover more from Dev solutions