Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What causes this arithmetic discrepancy between numpy and MATLAB and how can I force either behavior in Python?

I tried to normalize a probability distribution of the form $p_k := 2^{-k^2}$ for $k \in {1,\dots,n}$ for $n = 8$ in numpy/Python 3.8 along the following lines, using an equivalent of MATLAB’s num2hex a la C++ / Python Equivalent of Matlab's num2hex. The sums of the normalized distributions differ in Python and MATLAB R2020a.

If $n < 8$, there is no discrepancy.

What is going on, and how can I force Python to produce the same result as MATLAB for $n > 7$? It’s hard for me to tell which of these is IEEE 754 compliant (maybe both, with a discrepancy in grouping that affects a carry[?]) but I need the MATLAB behavior.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I note that there are still discrepancies in rounding between numpy and MATLAB per Differences between Matlab and Numpy and Python's `round` function (which I verified myself) but not sure this has any bearing.

import numpy as np
import struct # for MATLAB num2hex equivalent below
n = 8
unnormalizedPDF = np.array([2**-(k**2) for k in range(1,n+1)])
# MATLAB num2hex equivalent a la https://stackoverflow.com/questions/24790722/
num2hex = lambda x : hex(struct.unpack('!q', struct.pack('!d',x))[0])
hexPDF = [num2hex(unnormalizedPDF[k]/np.sum(unnormalizedPDF)) for k in range(0,n)]
print(hexPDF)
# ['0x3fec5862805436a4', 
#  '0x3fbc5862805436a4', 
#  '0x3f6c5862805436a4', 
#  '0x3efc5862805436a4', 
#  '0x3e6c5862805436a4', 
#  '0x3dbc5862805436a4', 
#  '0x3cec5862805436a4', 
#  '0x3bfc5862805436a4']
hexPDFSum = num2hex(np.sum(unnormalizedPDF/np.sum(unnormalizedPDF)))
print(hexPDFSum)
# 0x3ff0000000000000

Here is the equivalent in MATLAB:

n = 8;
unnormalizedPDF = 2.^-((1:n).^2);
num2hex(unnormalizedPDF/sum(unnormalizedPDF))
% ans =
% 
%   8×16 char array
% 
%     '3fec5862805436a4'
%     '3fbc5862805436a4'
%     '3f6c5862805436a4'
%     '3efc5862805436a4'
%     '3e6c5862805436a4'
%     '3dbc5862805436a4'
%     '3cec5862805436a4'
%     '3bfc5862805436a4'
num2hex(sum(unnormalizedPDF/sum(unnormalizedPDF)))
% ans =
% 
%     '3fefffffffffffff'

Note that the unnormalized distributions are exactly the same, but the sums of their normalizations differ by a single bit. If I use $n = 7$, everything agrees (both give 0x3fefffffffffffff), and both give the same results for $n < 7$ as well.

>Solution :

According to the manual, numpy.sum uses pairwise summation to get more precision. Another common algorithm is Kahan summation.

Anyway, I wouldn’t count too much on Numpy and MATLAB giving the same result up to the last bit, as there might me operation reordering if computations are made in parallel. See this for the kind of problem that can arise.

However, we can cheat a little bit and force Python to do the sum without optimization:

hexPDFSum = num2hex(np.sum(np.hstack((np.reshape(unnormalizedPDF / np.sum(unnormalizedPDF), (n, 1)), np.zeros((n, 1)))), 0)[0])
hexPDFSum
'0x3fefffffffffffff'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading