Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Define a random variable

I’m new with Python. I have to create a new rv U(t). Assuming that Z has a standard normal distribution c = 1.57, I have that:

U(t) = 0 if Z(t) <= c
U(t) = Φ(Z(t)) Z(t) > c

Where Φ(·) is the cdf of the standard normal distribution N(0, 1).

I start sampling random numbers from the normal distribution and I create an array of zeros:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

z = np.random.normal(0, 1, 100)
u = np.zeros([1, 100])

Then, I write the following loop:

for i in list(range(1, 101, 1)):
    if z[:i] < c:
        u[:i] = 0
    else z[:i] > c:
        u[:i] = norm.cdf(z[:i], loc=0, scale=1) 

However, there is something wrong. I got this error:

File "<ipython-input-837-1cbdd0641a75>", line 4
    else z[:i] > c:
         ^
SyntaxError: invalid syntax

Can someone please help me find out where the error is? Or suggest another way to deal with the problem?

Thank you!

>Solution :

Use the power of numpy!

z = np.random.normal(0, 1, 100)
u = np.zeros(z.shape)

Since you initialized u to zeros, you don’t need to do anything for the z <= c cases. For the others, you can use numpy’s logical indexing to only set the elements that fulfill the condition

# Get only the elements of z where z > c
z_filt = z[z > c]

# Calculate the norm.cdf at these values of z, 
norm_vals = norm.cdf(z_filt, loc=0, scale=1)

# Assign those to the elements of u where z > c
u[z > c] = norm_vals

Of course, you can condense these three lines down to a single line:

u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)

This approach will be significantly faster than iterating over the arrays and setting individual elements.


If you’re curious why your code didn’t work and how to fix it,

  • You don’t need to list out a range to iterate on it. Just for i in range(100) is good enough
  • z[:i] < c will give an array containing i boolean values. Putting an if condition on that will give you an error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). I suspect you meant to do z[i] < c
  • u[:i] = 0 will set all elements of u from the start to the ith index to zero. You probably only wanted u[i] = 0. This is really not even necessary since you already initialized u to be zeros
  • else <condition> doesn’t work. You want elif <condition>. Although you don’t really need that here, since you could just do if z[i] > c and that’s the only condition you need.
for i in range(100):
    if z[i] > c:
        u[i] = norm.cdf(z[i], loc=0, scale=1) 

Comparing the runtimes of these two approaches:

import timeit
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt

def f_numpy(size):
    z = np.random.normal(0, 1, size)
    u = np.zeros(z.shape)
    u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
    return u

def f_loopy(size):
    z = np.random.normal(0, 1, size)
    u = np.zeros(z.shape)
    for i in range(size):
        if z[i] > c:
            u[i] = norm.cdf(z[i], loc=0, scale=1)    
    return u

c = 0

sizes = [10, 100, 1000, 10_000, 100_000]
times = np.zeros((len(sizes), 2))
for i, s in enumerate(sizes):
    times[i, 0] = timeit.timeit('f_numpy(s)', globals=globals(), number=100) / 100
    print(">")
    times[i, 1] = timeit.timeit('f_loopy(s)', globals=globals(), number=10) / 10
    print(".")
    
fig, ax = plt.subplots()
ax.plot(sizes, times[:, 0], label="numpy")
ax.plot(sizes, times[:, 1], label="loopy")
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Array size')
ax.set_ylabel('Time per function call (s)')
ax.legend()
fig.tight_layout()

we get the following plot, which shows that the numpy approach is orders of magnitude faster than the loopy approach.
Runtime comparison -- numpy vs. loopy

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading