Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Average distance within group in pandas

I have a dataframe like this

df = pd.DataFrame({
    'id': ['A','A','B','B','B'],
    'x': [1,1,2,2,3],
    'y': [1,2,2,3,3]
})

enter image description here

The output I want is the average distance for each point in the group, in this example

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

group A: (distance((1,1),(1,2))) /1 = 1

group B: (distance((2,2),(2,3)) + distance((2,3),(3,3)) + distance((2,2),(3,3))) /3 = 1.138

enter image description here

I can calculate the distance using np.linalg.norm but I confused to use it in pandas groupby. Thank you

Note: 1 of my idea is I am trying to make this dataframe first (where I stuck), which is contains the pairs of point that I need to calculate the distance and after this I just need to calculate distance and groupby mean

enter image description here

>Solution :

A possible solution, based on numpy broadcasting:

def calc_avg_distance(group):
    x = group[['x', 'y']].values
    dist_matrix = np.sqrt(((x - x[:, np.newaxis])**2).sum(axis=2))
    np.fill_diagonal(dist_matrix, np.nan)
    avg_distance = np.nanmean(dist_matrix)
    return avg_distance


(df.groupby('id').apply(lambda x: calc_avg_distance(x))
 .reset_index(name='avg_distance'))

Output:

 id  avg_distance
0  A      1.000000
1  B      1.138071
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading