I need to average the Y values corresponding to the values in the X array…
X=np.array([ 1, 1, 2, 2, 2, 2, 3, 3 ... ])
Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])
In other words, the equivalents of the 1 values in the X array are 10 and 30 in the Y array, and the average of this is 20, the equivalents of the 2 values are 15, 10, 16, and 10, and their average is 12.75, and so on…
How can I calculate these average values?
>Solution :
One option is to use a property of linear regression (with categorical variables):
import numpy as np
x = np.array([ 1, 1, 2, 2, 2, 2, 3, 3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])
x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20. 12.75 17.5 ]