Home How expensive is accessing Match.group()?

Questions

How expensive is accessing Match.group()?

January 27, 2022

Trying to optimize some code that reuses a matched group, I was wondering whether accessing Match.group() is expensive. I tried to dig in re.py‘s source, but the code was a bit cryptic.

A few tests seem to indicate that it might be better to store the output of Match.group() in a variable, but I would like to understand what exactly happens when Match.group() is called, and if there is another internal way to maybe access the content of the group directly.

Some example code to illustrate a potential use:

import re

m = re.search('X+', f'__{"X"*10000}__')

# do something
# m.group()

# do something else
# m.group()

Timings

direct access:

%%timeit
len(m.group())
220 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

intermediate variable:

X = m.group()

%%timeit
len(X)
# 51 ns ± 0.172 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

References:
current re.py code (python 3.10)
current sre_compile.py code (python 3.10)

removing the effect of attribute access (doesn’t change much)

G = m.group

%%timeit
len(G())
230 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>Solution :

The match object holds a reference to the original string you searched in, and indexes where each group starts and ends, including group 0, the whole matched string. Every call to group() slices the original string to create a new string to return.

Saving the return value to a variable avoids the time and memory cost of having to slice the string every time. (It also avoids repeating the method call overhead.)

You can see that group() isn’t just returning a cached string by the fact that the return value isn’t always the same object:

>>> import re
>>> x = re.search(r'sd', 'asdf')
>>> x.group() is x.group()
False

If you want to see the implementation of group(), it’s match_group in Modules/_sre.c in the Python source code.

optimization

byMR

Published January 27, 2022

Add a comment

CALCULATE HOW MANY 1 IN A GIVEN TABLE

byMR

January 27, 2022

Questions

Retrieve value from associative array-error

byMR

January 27, 2022

Questions

Range of Timestamps

byMR

January 27, 2022

Questions

check if a date is valid with the format yyyy/mm/dd

byMR

January 27, 2022

Questions

How to write a RegEx that would capture a piece of a string, unless the string ends with a certain pattern?

byMR

January 27, 2022

Questions

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

byMR

January 27, 2022

How expensive is accessing Match.group()?

MEDevel.com: Open-source for Healthcare and Education

Timings

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

CALCULATE HOW MANY 1 IN A GIVEN TABLE

Retrieve value from associative array-error

Range of Timestamps

check if a date is valid with the format yyyy/mm/dd

How to write a RegEx that would capture a piece of a string, unless the string ends with a certain pattern?

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Keep Up to Date with the Most Important News

How expensive is accessing Match.group()?

MEDevel.com: Open-source for Healthcare and Education

Timings

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

CALCULATE HOW MANY 1 IN A GIVEN TABLE

Retrieve value from associative array-error

Range of Timestamps

check if a date is valid with the format yyyy/mm/dd

How to write a RegEx that would capture a piece of a string, unless the string ends with a certain pattern?

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Discover more from Dev solutions