How to maintain decimals when dividing with numpy arrays in Python

January 28, 2023

So, I was working on implementing my own version of the Statsitical Test of Homogeneity in Python where the user would submit a list of lists and the fuction would compute the corresponding chi value.

One issue I found was that my function was removing decimals when performing division, resulting in a somewhat innaccurate chi value for small sample sizes.

Here is the code:

def test_of_homo(list1):
    a = np.array(list1)
    #n = a.size
    num_rows = a.shape[0]
    num_cols = a.shape[1]
    dof = (num_cols-1)*(num_rows-1)
    column_totals = np.sum(a, axis=0)
    row_totals = np.sum(a, axis=1)
    n = sum(row_totals)
    b = np.array(list1)
    c = 0
    for x in range(num_rows):
      for y in range(num_cols):
        print("X is " + str(x))
        print("Y is " + str(y))
        print("a[x][y] is " + str(a[x][y]))
        print("row_totals[x] is " + str(row_totals[x]))
        print("column_total[y] is " + str(column_totals[y]))
        b[x][y] = (float(row_totals[x])*float(column_totals[y]))/float(n)
        print("b[x][y] is " + str(b[x][y]))
        numerator = ((a[x][y]) - b[x][y])**2
        chi =  float(numerator)/float(b[x][y])
        c = float(c)+ float(chi)
    print(b)
    print(c)
    print(stats.chi2.cdf(c, df=dof))
    print(1-(stats.chi2.cdf(c, df=dof)))

listc = [(21, 36, 30), (48, 26, 19)]

test_of_homo(listc)

When the resulted were printed I saw that the b[x][y] values were[[33 29 23]
[35 32 25]] instead of like "33.35, 29.97, 23.68 etc". This caused my resulting chi value to be 15.58 with a p of 0.0004 instead of the expected 14.5.

I tried to convert everything to float but that didn’t seem to work. Using the "decimal.Decimal(b[x][y])" resulted in a type error. Any help?

>Solution :

I think the problem could be due to the numbers you are providing to the function in the list. Note that if you convert a list to a Numpy array without specifying the data type it will try to guess based on the values:

>>> listc = [(21, 36, 30), (48, 26, 19)]
>>> a = np.array(listc)
>>> a.dtype
dtype('int64')

Here is how you force conversion to a desired data type:

>>> a = np.array(listc, dtype=float)
>>> a.dtype
dtype('float64')

Try that in the first and 9th lines of your function and see if it solves the problem. If you do this you shouldn’t need to use float() all the time.