Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to exclude double values in sklearn.metrics.pairwise.euclidean_distances results

I am measuring the euclidean distances between multiple points, with their coordinates stored in an array.

from sklearn.metrics.pairwise import euclidean_distances
points = [[1,2], [1,3], [4,5], [2,6]]

distances = euclidean_distances(points)
distances
array([[0.        , 1.        , 4.24264069, 4.12310563],
       [1.        , 0.        , 3.60555128, 3.16227766],
       [4.24264069, 3.60555128, 0.        , 2.23606798],
       [4.12310563, 3.16227766, 2.23606798, 0.        ]])

In the array that is returned, every value occurs twice. Is there a way to efficiently return values that only occur once?
This would be my preferred outcome:

[1.0, 4.242640687119285, 4.123105625617661, 3.605551275463989, 3.1622776601683795, 2.23606797749979]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I looked at the documentation for the euclidean_distances formula, but there does not seem to be an argument to exclude double values.

I can exclude the double values the following way:

dist_list = []
for i in range(len(distances)):
    unique_dist = distances[i][i+1:]
    dist_list.extend(unique_dist)

but I am wondering if there is a more efficient way. I do not want to use unique(), as there might be double distances in my data.

>Solution :

Numpy has a very useful to extract the indices of the upper (or lower) triangular part of a matrix. I set k=1 to exclude the diagonal part here, if you want to include it, use k=0.

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
points = [[1,2], [1,3], [4,5], [2,6]]

distances = euclidean_distances(points)
print(distances[np.triu_indices_from(distances, k=1)])

array([1.        , 4.24264069, 4.12310563, 3.60555128, 3.16227766,
       2.23606798])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading