Home What is the most efficient way to generate the joint distribution of outcomes given a numpy matrix?

Questions

What is the most efficient way to generate the joint distribution of outcomes given a numpy matrix?

November 11, 2021

Suppose there are i clients (i = 1,…,5) whose demand for a good D_i may be low, medium, or high. I have the following NumPy matrix with the demand figures:

import numpy as np

# Client demand
demand = np.array([[150, 160, 170],
                   [100, 120, 135],
                   [250, 270, 300],
                   [300, 325, 350],
                   [600, 700, 800]])

Now, I want to obtain the joint distribution of outcomes for all clients. As there are 3 (statistically independent) events for each client and 5 clients in total, there are 3^5 = 243 possible combinations. What is the most efficient way to obtain a new matrix that gives all the demand figures for each client i and scenario j (j=1,…243)?

EDIT:

I found that np.meshgrid does what I am looking for, but it seems that it only takes 1-D arrays representing the coordinates of a grid so that feeding the NumPy matrix does not work:

import numpy as np

# Client demand
demand = np.array([[150, 160, 170],
                   [100, 120, 135],
                   [250, 270, 300],
                   [300, 325, 350],
                   [600, 700, 800]])

# working
scenarios = np.array(np.meshgrid([150, 160, 170],
                     [100, 120, 135],
                     [250, 270, 300],
                     [300, 325, 350],
                     [600, 700, 800]
                     )).T.reshape(-1,5)

print(scenarios.shape)

# not working
np.array(np.meshgrid(demand)).T.reshape(-1,5)

>Solution :

Here are two options. The second is much faster than the first:

import numpy as np

# Client demand
demand = np.array([[150, 160, 170],
                   [100, 120, 135],
                   [250, 270, 300],
                   [300, 325, 350],
                   [600, 700, 800]])

import itertools
idxs = list(itertools.product(np.arange(3),repeat=5))

# 803 µs ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
res = np.array([demand[np.arange(5),idx] for idx in idxs])
# 155 µs ± 3.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
res2 = demand[np.arange(5), idxs]