I have a number of objects that I get via API.
The object consists of several boolean fields.
I am struggling with counting the euclidian distance between my dataframe (df_survey) and each object that I get from API (df is the dataframe with all objects, df_first – first of them)
df_survey = pd.DataFrame([["True", "True", "False", "True", "True"]], columns=columns, index=["survey"]) similarities = np.zeros((data["count"], 1)) dataset = pd.json_normalize(data["results"]) df = pd.DataFrame(dataset, columns=columns, index=dataset.id-1) df_first = pd.DataFrame(dataset.head(1), columns=columns, index=) euclidean = scipy.spatial.distance.cdist(df_survey, df_first, metric='euclidean') distance = pd.DataFrame(euclidean, columns=df_survey.index.values, index=df_first.index.values)
In this solution I get an error: ValueError: Unsupported dtype object
I also tried using scipy.spatial.distance.euclidean but it expects integer values, not boolean or str, maybe I can change every value to int but I don’t know if there are better solutions.
Thanks in advance!
You are declaring the booleans as strings and not actual booleans, since you’re doing
["True","False"]. You should declare them as
[True, False] without the quotes. In pandas, the string type is interpreted as a generic object type. That’s why you see this error.
I suggest you to fix this and try to calculate the distance again. In case it doesn’t work, just convert them to 0s and 1s.