divide sentence into words using regex

February 16, 2022

i want to devide a sentence into words using regex, i’m using this code:

import re
sentence='<30>Jan 11 11:45:50 test-tt systemd[1]: tester-test.service: activation successfully.'
sentence = re.split('\s|,|>|<|\[|\]:', sentence)

but i’m getting not what i’m waiting for

expected output is :

['30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', 'tester-test.service: activation successfully.']

but what i’m getting is :

['', '30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', '', 'tester-test.service:', 'activation', 'successfully.']

i tried actually to ingnore the whitespace but actually it should be ignored only in the last long-word and i have no idea how can i do that..
any suggestions/help
Thank you in advance

>Solution :

You can use

import re
sentence='<30>Jan 11 11:45:50 test-tt systemd[1]: tester-test.service: activation successfully.'
chunks = sentence.split(': ', 1)
result = re.findall(r'[^][\s,<>]+', chunks[0])
result.append(chunks[1])
print(result)
# => ['30', 'Jan', '11', '11:45:50', 'test-tt', 'systemd', '1', 'tester-test.service: activation successfully.']

See the Python demo

Here,

chunks = sentence.split(': ', 1) – splits the sentence into two chunks with the first : substring
result = re.findall(r'[^][\s,<>]+', chunks[0]) – extracts all substrings consisting of one or more chars other than ], [, whitespace, ,, < and > chars from the first chunk
result.append(chunks[1]) – append the second chunk to the result list.