Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Random sampling a large Cartesian product of iterables

I have multiple iterables and I need to create the Cartesian product of those iterables and then randomly sample from the resulting pool of tuples. The problem is that the total number of combinations of these iterables is somewhere around 1e19, so I can’t possibly load all of this into memory.

What I thought was using itertools.product in combination with a random number generator to skip random number of items, then once I get to the randomly selected item, I perform my calculations and continue until I run out of the generator. The plan was to do something like:

from itertools import product
from random import randint

iterables = () # tuple of 18 iterables
versions = product(iterables)

def do_stuff():
    # do stuff

STEP_SIZE = int(1e6)

# start both counts from 0. 
# First value to be taken is start + step
# after that increment start to be equal to count and repeat
start = 0
count = 0

while True:
    try:
        step = randint(1, 100) * STEP_SIZE

        for v in versions:
            # if the count is less than required skip values while incrementing count
            if count < start + step:
                versions.next()
                count += 1
            else:
                do_stuff(*v)
                start = count             
    except StopIteration:
        break

Unfortunately, itertools.product objects don’t have the next() method, so this doesn’t work. What other way is there to go through this large number of tuples and either take a random sample or directly run calculations on the values?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Which version of Python are you using? Somewhere along the way .next() methods were deprecated in favor a new next() built-in function. That works fine with all iterators. Here, for example, under the current released 3.10.1:

>>> import itertools
>>> itp = itertools.product(range(5), repeat=6)
>>> next(itp)
(0, 0, 0, 0, 0, 0)
>>> next(itp)
(0, 0, 0, 0, 0, 1)
>>> next(itp)
(0, 0, 0, 0, 0, 2)
>>> next(itp)
(0, 0, 0, 0, 0, 3)
>>> for ignore in range(50):
...     ignore = next(itp)
>>> next(itp)
(0, 0, 0, 2, 0, 4)

Beyond that, you didn’t show us the most important part of your code: how you build your product.

Without seeing that, I can only guess that it would be far more efficient to make a random choice from the first sequence passed to product(), then another from the second, and so on. Build a random element of the product from one component at a time.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading