Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unknown label type error in sklearn logloss

I am trying to calculate sklearn log loss but continuously getting value error. how to resolve the error. The code is simple – fit the label encoder to array and then use sklearn logloss that takes three arguments – the labels, the ground truth and the probability values of each class.

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0])
from sklearn.metrics import log_loss
le.classes_
log_loss([6.0], [[0.,         0.,         0.,         0.,         0.28571429, 0.14285714,  0.,         0.57142857, 0.        ]], labels=list(le.classes_))

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\PRANAV~1\AppData\Local\Temp/ipykernel_25368/2311544075.py in <module>
----> 1 log_loss([6.0], [[0.,         0.,         0.,         0.,         0.28571429, 0.14285714,
      2   0.,         0.57142857, 0.        ]], labels=list(le.classes_))

~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\AppData\Roaming\Python\Python39\site-packages\sklearn\metrics\_classification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels)
   2233 
   2234     if labels is not None:
-> 2235         lb.fit(labels)
   2236     else:
   2237         lb.fit(y_true)

~\AppData\Roaming\Python\Python39\site-packages\sklearn\preprocessing\_label.py in fit(self, y)
    295 
    296         self.sparse_input_ = sp.issparse(y)
--> 297         self.classes_ = unique_labels(y)
    298         return self
    299 

~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\multiclass.py in unique_labels(*ys)
     96     _unique_labels = _FN_UNIQUE_LABELS.get(label_type, None)
     97     if not _unique_labels:
---> 98         raise ValueError("Unknown label type: %s" % repr(ys))
     99 
    100     ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))

ValueError: Unknown label type: ([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0],)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

What you are doing is not valid.

Log_loss excepts as input arguments y_true, y_pred), which are the ground truth (correct) labels for n_samples samples and the predicted probabilities, as returned by a classifier’s predict_proba method, respectively.

To solve this, convert the numerical (invalid) labels into strings:

log_loss(['6.0'], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]], 
                labels=list(le.classes_.astype(str)))

# 34.53877639491069

The problem: you have floats as labels and this breaks the function. In sklearn, numerical labels need to be integers.

Here is a full numerical example:

log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]],
    ...:                 labels=[0,1,2,3,4,5,6,7,8])

Here is another problematic case:

log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714,  0., 0.57142857, 0.]],
    ...:                 labels=[0,1.1,2,3,4,5,6,7,8])

# ValueError: Unknown label type: ([0, 1.1, 2, 3, 4, 5, 6, 7, 8],)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading