Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split a string in python by certain characters?

I am trying to solve a problem with prefix notation, but I am stuck on the part, where I want to split my string into an array:
If I have the input +22 2 I want to get the array to look like this:['+', '22', '2']
I tried using the

import re 

function, but I am not sure how it works.
I tried the

word.split(' ')

method, but it only helps with the spaces.. any ideas?
P.S:
In the prefix notation I will also have + – and *.
So I need to split the string so the space is not in the array, and +, -, * is in the array
I am thinking of

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

word = input()
array = word.split(' ')

Then after that I am thinking of splitting a string by these 3 characters.

Sample input:
'+-12 23*67 1'

Output:
['+', '-', '12', '23', '*', '67', '1']

>Solution :

You can use re to find patterns in text, it seems you are looking for either one of these: +, - and * or a digit group. So compile a pattern that looks for that and find all that match this pattern and you will get a list:


import re

pattern = re.compile(r'([-+*]|\d+)')

string = '+-12 23*67 1'
array = pattern.findall(string)
print(array)

# Output:
# ['+', '-', '12', '23', '*', '67', '1']

Also a bit of testing (comparing your sample strings with the expected output):

test_cases = {
    '+-12 23*67 1': ['+', '-', '12', '23', '*', '67', '1'],
    '+22 2': ['+', '22', '2']
}

for string, correct in test_cases.items():
    assert pattern.findall(string) == correct

print('Tests completed successfully!')

Pattern explanation (you can read about this in the docs linked below):
r'([-+*]|\d+)'
r in front to make it a raw string so that Python interprets all the characters literally, this helps with escape sequences in the regex pattern because you can escape them with one backslash
(...) parentheses around (they are not necessary in this case) indicate a group which can later be retrieved if needed (but in this case they don’t matter much)
[...] indicates that any single character from this group can be matched so it will match if any of -, + and * will be present
| logical or, meaning that can match either side (to differentiate between numbers and special characters in this case)
\d special escape sequence for digits, meaning to match any digit, the + there indicates matching any one or more digits

Useful:

  • re module, the docs there explain what each character in the pattern does
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading