I’m trying to extract only the parts I need from the table.
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
All I need is ‘text text’ and/ immediately followed by two numbers and ‘low’ as shown below.
texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness
I tried this but failed.
text = """
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
"""
for a in text.split('\n'):
if a == "":
continue
else:
print(a)
m = re.match('(^\D\d*\D)(\w*\s)(\d*\s)(\d*\s)(\d*\s\d*\s\d*\s\d*\s\d*\s)(\w+)', a)
print(m)
print(m.group(2), m.group(3), m.group(4), m.group(6))
I tried to group by regex and get the parts, but I got the following error: Help / print(m.group(2), m.group(3), m.group(4), m.group(6))
AttributeError: ‘NoneType’ object has no attribute ‘group’
>Solution :
If you absolutely want to use a regular expression:
import re
text = """
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
"""
pattern = re.compile(
r"\s*\d+\s+(\w+)\s+(\d+)\s+(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\w+)\s+"
)
for line in text.strip().split('\n'):
match = re.search(pattern, line)
print(*match.groups())
Output:
texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness
But if it is really the case that it’s always the same number of space-separated substrings of characters, then you might really be better off just splitting the lines by spaces:
for line in text.strip().split('\n'):
parts = line.split()
print(parts[1], parts[2], parts[3], parts[9])
Same output.