Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to maintain decimals when dividing with numpy arrays in Python

So, I was working on implementing my own version of the Statsitical Test of Homogeneity in Python where the user would submit a list of lists and the fuction would compute the corresponding chi value.

One issue I found was that my function was removing decimals when performing division, resulting in a somewhat innaccurate chi value for small sample sizes.

Here is the code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def test_of_homo(list1):
    a = np.array(list1)
    #n = a.size
    num_rows = a.shape[0]
    num_cols = a.shape[1]
    dof = (num_cols-1)*(num_rows-1)
    column_totals = np.sum(a, axis=0)
    row_totals = np.sum(a, axis=1)
    n = sum(row_totals)
    b = np.array(list1)
    c = 0
    for x in range(num_rows):
      for y in range(num_cols):
        print("X is " + str(x))
        print("Y is " + str(y))
        print("a[x][y] is " + str(a[x][y]))
        print("row_totals[x] is " + str(row_totals[x]))
        print("column_total[y] is " + str(column_totals[y]))
        b[x][y] = (float(row_totals[x])*float(column_totals[y]))/float(n)
        print("b[x][y] is " + str(b[x][y]))
        numerator = ((a[x][y]) - b[x][y])**2
        chi =  float(numerator)/float(b[x][y])
        c = float(c)+ float(chi)
    print(b)
    print(c)
    print(stats.chi2.cdf(c, df=dof))
    print(1-(stats.chi2.cdf(c, df=dof)))

listc = [(21, 36, 30), (48, 26, 19)]

test_of_homo(listc)


When the resulted were printed I saw that the b[x][y] values were[[33 29 23]
[35 32 25]] instead of like "33.35, 29.97, 23.68 etc". This caused my resulting chi value to be 15.58 with a p of 0.0004 instead of the expected 14.5.

I tried to convert everything to float but that didn’t seem to work. Using the "decimal.Decimal(b[x][y])" resulted in a type error. Any help?

>Solution :

I think the problem could be due to the numbers you are providing to the function in the list. Note that if you convert a list to a Numpy array without specifying the data type it will try to guess based on the values:

>>> listc = [(21, 36, 30), (48, 26, 19)]
>>> a = np.array(listc)
>>> a.dtype
dtype('int64')

Here is how you force conversion to a desired data type:

>>> a = np.array(listc, dtype=float)
>>> a.dtype
dtype('float64')

Try that in the first and 9th lines of your function and see if it solves the problem. If you do this you shouldn’t need to use float() all the time.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading