Home How does sklearn calculate AUC for random forest and why it is different when using different functions?

Questions

How does sklearn calculate AUC for random forest and why it is different when using different functions?

November 29, 2023

I start with the example given for ROC Curve with Visualization API:

import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import train_test_split

X, y = load_wine(return_X_y=True)
y = y == 2

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

rfc = RandomForestClassifier(n_estimators=10, random_state=42)
rfc.fit(X_train, y_train)
ax = plt.gca()
rfc_disp = RocCurveDisplay.from_estimator(rfc, X_test, y_test, ax=ax, alpha=0.8)
print(rfc_disp.roc_auc)

with the answer 0.9823232323232323.

Following this immediately by

from sklearn.metrics import roc_auc_score
y_pred = rfc.predict(X_test)
auc = roc_auc_score(y_test, y_pred)
print(auc)

I obtain 0.928030303030303, which is manifestly different.

Interestingly, I obtain the same result with the ROC Curve Visualization API, if I use the predicted values:

rfc_disp1 = RocCurveDisplay.from_predictions(y_test, y_pred)
print(rfc_disp1.roc_auc)

However the area under the curve obtained does sum up to the former result (using trapezoid integration):

import numpy as np
I = np.sum(np.diff(rfc_disp.fpr) * (rfc_disp.tpr[1:] + rfc_disp.tpr[:-1])/2.)
print(I)

What is the reason for this discrepancy? I assume that it is related to how teh two functions calculate AUC (perhaps different way of smoothing the curve?) This brings me to a more general question: how is ROC curve obtained for random forest in sklearn? – what parameter/threshold is changed to obtain different predictions? Are these just scores for separate trees of the forest?

>Solution :

You should use predict_proba for AUC.

try this one:

from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_test, rfc.predict_proba(X_test)[:, 1])
print(auc)

byMR

Published November 29, 2023

Add a comment

vlookup like function with awk

byMR

November 29, 2023

Questions

Power BI – basic column refferencing

byMR

November 29, 2023

Questions

Find the AVERAGE of Multiple Dollar Amounts in Row, Skipping Blank Cells, with an ArrayFormula (or similar) in Google Sheets

byMR

November 29, 2023

Questions

Get Distinct Values With Consistent Value

byMR

November 29, 2023

Questions

Filter Numpy Array for Values Greater than Preceding Value

byMR

November 29, 2023

Questions

Split string by all words except specified words Regex JavaScript

byMR

November 29, 2023

How does sklearn calculate AUC for random forest and why it is different when using different functions?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

vlookup like function with awk

Power BI – basic column refferencing

Find the AVERAGE of Multiple Dollar Amounts in Row, Skipping Blank Cells, with an ArrayFormula (or similar) in Google Sheets

Get Distinct Values With Consistent Value

Filter Numpy Array for Values Greater than Preceding Value

Split string by all words except specified words Regex JavaScript

Keep Up to Date with the Most Important News

How does sklearn calculate AUC for random forest and why it is different when using different functions?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

vlookup like function with awk

Power BI – basic column refferencing

Find the AVERAGE of Multiple Dollar Amounts in Row, Skipping Blank Cells, with an ArrayFormula (or similar) in Google Sheets

Get Distinct Values With Consistent Value

Filter Numpy Array for Values Greater than Preceding Value

Split string by all words except specified words Regex JavaScript

Discover more from Dev solutions