Home Calculating the averages of elements in one array based on data in another array

Questions

Calculating the averages of elements in one array based on data in another array

June 14, 2022

I need to average the Y values corresponding to the values in the X array…

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3 ... ])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])

In other words, the equivalents of the 1 values in the X array are 10 and 30 in the Y array, and the average of this is 20, the equivalents of the 2 values are 15, 10, 16, and 10, and their average is 12.75, and so on…

How can I calculate these average values?

>Solution :

One option is to use a property of linear regression (with categorical variables):

import numpy as np

x = np.array([  1,  1,  2,  2,  2,  2,  3,  3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])

x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20.   12.75 17.5 ]