scikit-learn: fitting KNeighborsClassifier without labels

July 26, 2022

I’m trying to fit a simple KNN classifier, and wanted to use the scikit-learn implementation in order to benefit from their efficient implementation (multiprocessing, tree-based algorithms).

However, what I want to get as a result is just the list of distances and nearest neighbours for each data point, rather than the predicted label.
I will then compute the label separately in a non-standard way.

The kneighbors method seems exactly what I need, however I cannot call it without fitting the model with fit first. The issue is, fit() requires the labels (y) as a parameter.

Is there a way to achieve what I’m after? Perhaps I can pass fake labels in the fit() method – is there any issues I’m missing by doing this? E.g. is this going to affect the results (of the computed distances and list of nearest neighbours for each datapoint) in any way? I wouldn’t expect so but I’m not familiar with the workings of the scikit-learn implementation.

>Solution :

There is another algorithm for what you desire: NearestNeighbors.

This algorithm is unsupervised (you don’t need the y labels); moreover, there is one method (kneighbors) that calculates distances to points and which sample is.

Check the link, it is quite clear.