Unknown label type error in sklearn logloss

December 6, 2022

I am trying to calculate sklearn log loss but continuously getting value error. how to resolve the error. The code is simple – fit the label encoder to array and then use sklearn logloss that takes three arguments – the labels, the ground truth and the probability values of each class.

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0])
from sklearn.metrics import log_loss
le.classes_
log_loss([6.0], [[0.,         0.,         0.,         0.,         0.28571429, 0.14285714,  0.,         0.57142857, 0.        ]], labels=list(le.classes_))

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\PRANAV~1\AppData\Local\Temp/ipykernel_25368/2311544075.py in <module>
----> 1 log_loss([6.0], [[0.,         0.,         0.,         0.,         0.28571429, 0.14285714,
      2   0.,         0.57142857, 0.        ]], labels=list(le.classes_))

~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\AppData\Roaming\Python\Python39\site-packages\sklearn\metrics\_classification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels)
   2233 
   2234     if labels is not None:
-> 2235         lb.fit(labels)
   2236     else:
   2237         lb.fit(y_true)

~\AppData\Roaming\Python\Python39\site-packages\sklearn\preprocessing\_label.py in fit(self, y)
    295 
    296         self.sparse_input_ = sp.issparse(y)
--> 297         self.classes_ = unique_labels(y)
    298         return self
    299 

~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\multiclass.py in unique_labels(*ys)
     96     _unique_labels = _FN_UNIQUE_LABELS.get(label_type, None)
     97     if not _unique_labels:
---> 98         raise ValueError("Unknown label type: %s" % repr(ys))
     99 
    100     ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))

ValueError: Unknown label type: ([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0],)

>Solution :

What you are doing is not valid.

Log_loss excepts as input arguments y_true, y_pred), which are the ground truth (correct) labels for n_samples samples and the predicted probabilities, as returned by a classifier’s predict_proba method, respectively.

To solve this, convert the numerical (invalid) labels into strings:

log_loss(['6.0'], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]], 
                labels=list(le.classes_.astype(str)))

# 34.53877639491069

The problem: you have floats as labels and this breaks the function. In sklearn, numerical labels need to be integers.

Here is a full numerical example:

log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]],
    ...:                 labels=[0,1,2,3,4,5,6,7,8])

Here is another problematic case:

log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]],
    ...:                 labels=[0,1.1,2,3,4,5,6,7,8])

# ValueError: Unknown label type: ([0, 1.1, 2, 3, 4, 5, 6, 7, 8],)