Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Complexity of string += operation

I am kinda confused. I was sure that due do immutability of str operation like s += "abc" need to copy whole s content to another place in memory so effectively adding character to very long string will consume much time.

I wrote snippet to prove my theory:

import timeit

def foo(i):
    res = ""
    for _ in range(i):
        res += "a"  
    return res


def foo2(i):
    res = []
    for _ in range(i):
        res.append("a")
    return "".join(res)


iterations = 100000
print(timeit.timeit('foo(iterations)', globals=globals(), number=100))
print(timeit.timeit('foo2(iterations)', globals=globals(), number=100))

However it looks like

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. foo execution time grows linearly based on iterations
  2. foo2 is like just two times faster than foo

I tried to inspect bytecode searching for some hidden optimizations. I tried to change constant string to randomized one with proper length to deny interning as well but couldn’t find any explanation of that behaviour.

Was I wrong then? Does += depend on string length or not? If so, how can I prove that?

>Solution :

CPython, the reference implementation of Python, has an optimization for repeated concatenation of strings that is keeping your performance close to O(n) here. It’s a brittle optimization, and PEP 8’s very first programming recommendation tells you not to rely on it:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

It works because the optimization handles the specific case of a concatenation occurring where the left-hand side is being reassigned and there is only one reference to that string, so it can directly realloc the string’s storage and mutate it in place, rather than constructing a new string. But as noted, it’s brittle even on CPython, and not available at all on other implementations of Python, so avoid it.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading