Home Why does xgboost prediction have lower AUC than evaluation of same data in eval_set?

Questions

Why does xgboost prediction have lower AUC than evaluation of same data in eval_set?

November 21, 2023

I am training a binary classifier and I want to know the AUC value for its performance on a test set. I thought there were 2 similar ways to do this: 1) I enter the test set into parameter eval_set, and then I receive corresponding AUC values for each boosting round in model.evals_result(); 2) After model training I make a prediction for the test set and then calculate the AUC for that prediction. I had thought that these methods should produce similar values, but the latter method (calculating AUC of a prediction) consistently produces much lower values. Can you help me understand what is going on? I must have misunderstood the function of eval_set.

Here is a fully reproducible example using a kaggle dataset (available here):

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import RocCurveDisplay, roc_curve, auc
from xgboost import XGBClassifier # xgboost version 1.7.6
import matplotlib.pyplot as plt

# Data available on kaggle here https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009/
data = pd.read_csv('winequality-red.csv')
data.head()

# Separate targets
X = data.drop('quality', axis=1)
y = data['quality'].map(lambda x: 1 if x >= 7 else 0) # wine quality >7 is good, rest is not good

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create model and fit
params = {
    'eval_metric':'auc',
    'objective':'binary:logistic'
}
model = XGBClassifier(**params)
model.fit(
    X_train, 
    y_train,
    eval_set=[(X_test, y_test)]
)

First I visualize the AUC metrics resulting from evaluating the test set provided in eval_set:

results = model.evals_result()
plt.plot(np.arange(0,100),results['validation_0']['auc'])
plt.title("AUC from eval_set")
plt.xlabel("Estimator (boosting round)")
plt.ylabel("AUC")

Next, I make a prediction on the same test set, get the AUC, and visualize the ROC curve:

test_predictions = model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_predictions,pos_label=1)
roc_auc = auc(fpr, tpr)
display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr)
display.plot()

As you can see, the AUC value of the prediction is 0.81, which is lower than any AUC calculated from evaluating the same test set in eval_set. How have I misunderstood the two methods? Thanks, xgboost is new to me and I appreciate your advice.

>Solution :

XGBoost’s eval_results uses predict_proba to calculate the AUC values in your first graph. By using predict, you are getting the predicted class labels, instead of the predicted probabilities, hence the difference you are observing.

You should use predict_proba instead of predict:

test_probabs = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_probabs, pos_label=1)
roc_auc = auc(fpr, tpr)
display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr)
display.plot()

Output: