I am training a binary classifier and I want to know the AUC value for its performance on a test set. I thought there were 2 similar ways to do this: 1) I enter the test set into parameter eval_set
, and then I receive corresponding AUC values for each boosting round in model.evals_result()
; 2) After model training I make a prediction for the test set and then calculate the AUC for that prediction. I had thought that these methods should produce similar values, but the latter method (calculating AUC of a prediction) consistently produces much lower values. Can you help me understand what is going on? I must have misunderstood the function of eval_set
.
Here is a fully reproducible example using a kaggle dataset (available here):
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import RocCurveDisplay, roc_curve, auc
from xgboost import XGBClassifier # xgboost version 1.7.6
import matplotlib.pyplot as plt
# Data available on kaggle here https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009/
data = pd.read_csv('winequality-red.csv')
data.head()
# Separate targets
X = data.drop('quality', axis=1)
y = data['quality'].map(lambda x: 1 if x >= 7 else 0) # wine quality >7 is good, rest is not good
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create model and fit
params = {
'eval_metric':'auc',
'objective':'binary:logistic'
}
model = XGBClassifier(**params)
model.fit(
X_train,
y_train,
eval_set=[(X_test, y_test)]
)
First I visualize the AUC metrics resulting from evaluating the test set provided in eval_set:
results = model.evals_result()
plt.plot(np.arange(0,100),results['validation_0']['auc'])
plt.title("AUC from eval_set")
plt.xlabel("Estimator (boosting round)")
plt.ylabel("AUC")
Next, I make a prediction on the same test set, get the AUC, and visualize the ROC curve:
test_predictions = model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_predictions,pos_label=1)
roc_auc = auc(fpr, tpr)
display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr)
display.plot()
As you can see, the AUC value of the prediction is 0.81, which is lower than any AUC calculated from evaluating the same test set in eval_set
. How have I misunderstood the two methods? Thanks, xgboost is new to me and I appreciate your advice.
>Solution :
XGBoost’s eval_results
uses predict_proba
to calculate the AUC values in your first graph. By using predict
, you are getting the predicted class labels, instead of the predicted probabilities, hence the difference you are observing.
You should use predict_proba
instead of predict
:
test_probabs = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_probabs, pos_label=1)
roc_auc = auc(fpr, tpr)
display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr)
display.plot()
Output: