Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python – re.split a string with a keyword unless there is a specific keyword preceding it

here is the code:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe" 
splitter = re.split('Sir|Mrs', text)

I want the text to be split by the words ‘Sir’ or ‘Mrs’ unless there is the string ‘married to’ before it.

Current output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'

Desired output:

''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'

>Solution :

I would use an re.findall approach here:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'\b(?:Sir|Mrs) \w+ \w+(?:, married to (?:Mrs|Sir) \w+ \w+)?', text)
print(matches)

This prints:

['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']

The regex pattern used here says to match:

\b(?:Sir|Mrs)                         leading Sir/Mrs
  \w+ \w+                             first and last names
(?:
    , married to (?:Mrs|Sir) \w+ \w+  optional 'married to' followed by another name
)?                                    zero or one time
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading