Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why sublist access time increases with sublist size?

The code below initializes a list of random integers, and iterates over it. Given a subset_size, at every iteration i, a sublist of i: i + subset_size is accessed. The time to access the sublist grows with subset_size. For n = 100000 and subset_size = 50000, it takes 15+ seconds on my i5 mbp. I thought sublists are retrieved using 2 pointers and lazy evaluation but it looks like there’s some c loop behind the scenes that populates a new list and returns it as a result. Is this a proper description to what actually happens or is there another explanation?

import random
from datetime import timedelta
from time import perf_counter


def example(n, subset_size):
    x = [random.randint(0, 10000) for _ in range(n)]
    t = perf_counter()
    for i in range(n - subset_size):
        _ = x[i : i + subset_size]
    print(timedelta(seconds=perf_counter() - t))


if __name__ == '__main__':
    example(100000, 50000)

0:00:15.131059

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I thought sublists are retrieved using 2 pointers and lazy evaluation
but it looks like there’s some c loop behind the scenes that populates
a new list and returns it as a result.

Your assumption is correct. slicing a list always creates new list. Here is the relevant part of the source code. I have added some comments to understand what is being going on each steps.

static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
    PyListObject *np;
    PyObject **src, **dest;
    Py_ssize_t i, len;
    len = ihigh - ilow;
    if (len <= 0) {
        return PyList_New(0);
    }
    # create new list which is long enough to hold the slice length elements.
    np = (PyListObject *) list_new_prealloc(len);
    if (np == NULL)
        return NULL;
    # Adjust the pointer offset, because list internally uses an array of pointers.
    src = a->ob_item + ilow;
    dest = np->ob_item;
    # Copy the elements back.
    for (i = 0; i < len; i++) {
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
    Py_SET_SIZE(np, len);
    return (PyObject *)np;
}

As you can see when you slice a list it has to first call list_new_prealloc which is where an empty list is created and allocates the memory upto the slice length.

static PyObject *
list_new_prealloc(Py_ssize_t size)
{
    assert(size > 0);
    # Create new list
    PyListObject *op = (PyListObject *) PyList_New(0);
    if (op == NULL) {
        return NULL;
    }
    assert(op->ob_item == NULL);
    # Allocating memory
    op->ob_item = PyMem_New(PyObject *, size);
    if (op->ob_item == NULL) {
        Py_DECREF(op);
        return PyErr_NoMemory();
    }
    op->allocated = size;
    return (PyObject *) op;
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading