Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate binomial probabilities in Python with very small numbers?

I’m trying to calculate the likelihood of successfully guessing a password in Python. For example, if we take a 10 character lowercase password (26**10 possible passwords) and we can make 1 billion guesses a second, we can calculate the probability of successfully guessing the password in one hour with:

from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**10))

Which gives us the the result of 0.02525515384826793 (i.e, 2.5%). However, this doesn’t work as we increase the length of the password (or more strictly, as p gets closer to zero). For instance if we increase the length of the password to 12 characters:

from scipy.stats import binom
(1 - binom.pmf(k=0, n=1000000000*3600, p=1 / 26**12))

Then the returned value is just 0.0, which is incorrect – presumably due to the float rounding down to zero at some point. How can I calculate this to get a more precise answer?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I think this is about the limitations of floating-point arithmetic which I also often face (i.e, float overflow).

you can use the mpmath library, which provides arbitrary-precision arithmetic:

from mpmath import mp
mp.dps = 50  # Set the desired precision (adjust as needed)
total_passwords = 26**12
guesses_per_second = 1000000000
seconds_in_an_hour = 3600

p = mp.mpf(1) / mp.mpf(total_passwords)
n = mp.mpf(guesses_per_second) * mp.mpf(seconds_in_an_hour)

probability = 1 - mp.power(1 - p, n)
print(float(probability))

Extra-Note: Floating-point numbers lose precision even when you are just working with such seemingly harmless numbers like 0.2 or 76.5. You should be extra careful when working with a large amount of floating-point operations over the same data as errors may build up rather quickly.

I hope this can help somehow.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading