Home Python Regex Lookaround newline behavior

Questions

Python Regex Lookaround newline behavior

November 23, 2021

I am using Python in google colab. Shown below, I have three strings that each repeat ‘abcd’. I am trying to extract only ‘5678’ from the strings. For string2 I pressed Enter and then moved it over with tabs and spaces. For string3 I only pressed Enter to move it to the next line.

string1 = 'abcd1234ppppabcd5678oooo'
string2 = '''abcd1234pppp
             abcd5678oooo'''
string3 = '''abcd1234pppp
abcd5678oooo'''

reg1 = re.search('(?<=abcd)(.*)(?=oooo)', string1)
print(reg1.group(0))
reg2 = re.search('(?<=abcd)(.*)(?=oooo)', string2)
print(reg2.group(0))
reg3 = re.search('(?<=abcd)(.*)(?=oooo)', string3)
print(reg3.group(0))

Here is the output:

1234ppppabcd5678
5678
5678

I can understand why I got the results I did for the first string, but why did the code ‘work’ for string 2 and 3? Will regex automatically try and shorten the results if it’s broken up over multiple lines?

>Solution :

According to the Python docs,

The special characters are:

. (Dot.) In the default mode, this matches any character except a newline. If the
DOTALL flag has been specified, this matches any character including a newline.

Since you’re using (.*) in the middle match, Regex will not match multiple lines unless you use the re.DOTALL flag:

reg1 = re.search('(?<=abcd)(.*)(?=oooo)', string1, re.DOTALL)
print(reg1.group(0))
reg2 = re.search('(?<=abcd)(.*)(?=oooo)', string2, re.DOTALL)
print(reg2.group(0))
reg3 = re.search('(?<=abcd)(.*)(?=oooo)', string3, re.DOTALL)
print(reg3.group(0))

or, alternatively,

pattern = re.compile('(?<=abcd)(.*)(?=oooo)', re.DOTALL)
for string in (string1, string2, string3):
    reg = pattern.search(string)
    print(reg.group(0))

This outputs

1234ppppabcd5678
1234pppp
             abcd5678
1234pppp
abcd5678

regex

byMR

Published November 23, 2021

Add a comment

Expand a string from a column into different separate columns in Pandas

byMR

November 23, 2021

Questions

Filtering for different conditions in different columns pandas

byMR

November 23, 2021

Questions

Create duplicate row in Pandas dataframe with a one to many mapping

byMR

November 23, 2021

Questions

Why does the recommended `getExternalStorageState()` to replace deprecated `getExternalStorageDirectory()` not work correctly?

byMR

November 23, 2021

Questions

Format output of query

byMR

November 23, 2021

Questions

Add empty rows at specific positions of dataframe

byMR

November 23, 2021

Python Regex Lookaround newline behavior

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Expand a string from a column into different separate columns in Pandas

Filtering for different conditions in different columns pandas

Create duplicate row in Pandas dataframe with a one to many mapping

Why does the recommended `getExternalStorageState()` to replace deprecated `getExternalStorageDirectory()` not work correctly?

Format output of query

Add empty rows at specific positions of dataframe

Keep Up to Date with the Most Important News

Python Regex Lookaround newline behavior

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Expand a string from a column into different separate columns in Pandas

Filtering for different conditions in different columns pandas

Create duplicate row in Pandas dataframe with a one to many mapping

Why does the recommended `getExternalStorageState()` to replace deprecated `getExternalStorageDirectory()` not work correctly?

Format output of query

Add empty rows at specific positions of dataframe

Discover more from Dev solutions