Euclidian distance between two rows

I have a number of objects that I get via API.
The object consists of several boolean fields.

I am struggling with counting the euclidian distance between my dataframe (df_survey) and each object that I get from API (df is the dataframe with all objects, df_first – first of them)

df_survey = pd.DataFrame([["True", "True", "False", "True", "True"]], columns=columns, index=["survey"])
similarities = np.zeros((data["count"], 1))

dataset = pd.json_normalize(data["results"])
df = pd.DataFrame(dataset, columns=columns,
df_first = pd.DataFrame(dataset.head(1), columns=columns, index=[0])

euclidean = scipy.spatial.distance.cdist(df_survey, df_first, metric='euclidean')
distance = pd.DataFrame(euclidean, columns=df_survey.index.values, index=df_first.index.values)

In this solution I get an error: ValueError: Unsupported dtype object

I also tried using scipy.spatial.distance.euclidean but it expects integer values, not boolean or str, maybe I can change every value to int but I don’t know if there are better solutions.

Thanks in advance!

>Solution :

You are declaring the booleans as strings and not actual booleans, since you’re doing ["True","False"]. You should declare them as [True, False] without the quotes. In pandas, the string type is interpreted as a generic object type. That’s why you see this error.

I suggest you to fix this and try to calculate the distance again. In case it doesn’t work, just convert them to 0s and 1s.

Leave a Reply