Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Flatten only part of a dataframe shape for Euclidean calculation?

I have a data frame with shape:

(20,30,1024)

I want to find the Euclidean distance between every entry and every other entry in the dataframe (ideally non-redundantly, i.e. don’t find the distance of row 1 and 5….and then row 5 and 1 but not there yet). I have this code:

from scipy.spatial.distance import pdist,squareform

distances = pdist(df_test,metric='euclidean')
dist_matrix = squareform(distances)

print(dist_matrix)

The error says:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A 2-dimensional array must be passed.

So I guess I want to convert my matrix from shape (20,30,1024) to (20,30720), and then calculate the pdist/squareform between the rows (i.e. 20 rows of vectors that are 30720 in length).

I know that I can use test_df[0:20].flatten().tolist()

But that completely flattened my matrix, the output shape was (1,614400).

Can someone show me how to convert a shape from (20,30,1024) to (20,3072), or if i’m not going about this the right way?

The ultimate end goal is to calculate Euclidean distance between all non-redundant pairs in a data set, but the data set is big, so I need to do it as efficiently as possible/not duplicating calculations.

>Solution :

The most straightforward way to reshape that I can think of, according to how you described the problem, is:

df_test.values.reshape(20, -1)

By calling .values, you are retrieving your dataframe data as a numpy array. From there, .reshape finishes your job. Since you need a 2D-array, you provide the size of the first dimension (in your case, 20), and by passing -1 Numpy will calculate the size of the second dimension for you (in this case it will multiply the remaining dimension sizes in the original 3D-array)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading