0% found this document useful (0 votes)
23 views5 pages

ROC Auc

The ROC-AUC Curve is a graphical tool that evaluates the performance of binary classification models by plotting the True Positive Rate against the False Positive Rate at various thresholds. The AUC value, ranging from 0 to 1, quantifies the model's ability to distinguish between classes, with higher values indicating better performance. This metric is particularly useful in business contexts such as customer churn prediction, fraud detection, medical diagnosis, and targeted marketing.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

ROC Auc

The ROC-AUC Curve is a graphical tool that evaluates the performance of binary classification models by plotting the True Positive Rate against the False Positive Rate at various thresholds. The AUC value, ranging from 0 to 1, quantifies the model's ability to distinguish between classes, with higher values indicating better performance. This metric is particularly useful in business contexts such as customer churn prediction, fraud detection, medical diagnosis, and targeted marketing.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ROC-AUC Curve: Overview and Explanation

ROC (Receiver Operating Characteristic) Curve is a graphical


representation of the performance of a binary classification model. It plots
the True Positive Rate (TPR) against the False Positive Rate (FPR) at
various threshold settings. The AUC (Area Under the Curve) is the area
under this ROC curve and is a numerical measure of how well the model
distinguishes between classes.

Key Terminology:

1. True Positive Rate (TPR) / Sensitivity / Recall: The proportion of


actual positives that are correctly identified by the model.

TPR=TPTP+FNTPR = \frac{TP}{TP + FN}TPR=TP+FNTP

Where:

o TP = True Positives

o FN = False Negatives

2. False Positive Rate (FPR): The proportion of actual negatives that


are incorrectly identified as positive by the model.

FPR=FPFP+TNFPR = \frac{FP}{FP + TN}FPR=FP+TNFP

Where:

o FP = False Positives

o TN = True Negatives

3. AUC (Area Under the Curve): The area under the ROC curve,
representing the likelihood that the model ranks a randomly chosen
positive instance higher than a randomly chosen negative instance.
The AUC value ranges from 0 to 1:

o AUC = 1: Perfect classifier.

o AUC = 0.5: Random classifier, no discriminative ability.

o AUC < 0.5: Worse than random classifier.

ROC Curve Construction:

To plot the ROC curve:

 Vary the decision threshold from 0 to 1.

 For each threshold, calculate the TPR and FPR.

 Plot the points (FPR, TPR) to generate the curve.


Example:

Imagine you have a binary classifier that predicts whether a customer will
churn (leave the service, which we treat as "1") or not churn (stay, treated
as "0").

 True Positives (TP): Customers who churned and the model


correctly predicted churn.

 False Positives (FP): Customers who didn’t churn but the model
incorrectly predicted churn.

 True Negatives (TN): Customers who didn’t churn and the model
correctly predicted no churn.

 False Negatives (FN): Customers who churned but the model


incorrectly predicted no churn.

Step-by-Step Example:

 Model predicts probabilities for each instance in the test set (for
example, for customer churn, the probability that a customer will
churn).

 Vary the threshold for predicting "1" (churn), from 0 to 1.

o At each threshold, calculate TPR and FPR.

 Plot these values (FPR, TPR) on the ROC curve.

If your model gives the following probabilities for a set of customers:

 Customer A: 0.85 (Predicted: Churn)

 Customer B: 0.45 (Predicted: No Churn)

 Customer C: 0.60 (Predicted: Churn)

 Customer D: 0.20 (Predicted: No Churn)

By adjusting thresholds, you can plot points representing the TPR and FPR.

Business Use Cases of ROC-AUC Curve:

1. Customer Churn Prediction:

o Problem: A telecom company wants to predict which


customers are likely to leave (churn). Using a machine
learning model, they can classify customers as churn or not
churn.

o Why ROC-AUC: The ROC-AUC will help evaluate how well the
model differentiates between customers who will churn vs.
those who won’t, across various thresholds. If the AUC is high,
the model does a good job of identifying at-risk customers.

2. Fraud Detection:

o Problem: In financial transactions, the goal is to detect


fraudulent transactions. A model is trained to classify
transactions as fraudulent (1) or not (0).

o Why ROC-AUC: In fraud detection, false positives (incorrectly


classifying a legitimate transaction as fraudulent) can be
costly. The ROC-AUC helps balance between false positives
and false negatives, giving a clearer view of the model's
overall performance across different thresholds.

3. Medical Diagnosis:

o Problem: A model is developed to detect whether a patient


has a particular disease (binary classification: sick vs. not
sick).

o Why ROC-AUC: Medical decisions often involve different


thresholds for risk tolerance. The ROC-AUC will help doctors
assess how well the model discriminates between healthy and
sick patients across various threshold levels (e.g., setting a
higher threshold for false positives when medical
consequences are severe).

4. Marketing and Targeting:

o Problem: A company builds a model to predict which


customers are likely to respond to a marketing campaign
(binary: will respond or will not respond).

o Why ROC-AUC: The model's ability to differentiate between


responders and non-responders can be crucial in targeting the
right customers. The ROC-AUC provides insight into how well
the model can distinguish between these groups, helping
marketers optimize their targeting.

Interpreting ROC-AUC in Business Context:

 High AUC (close to 1): A model that does an excellent job in


distinguishing between the positive and negative classes. This is
desirable in critical use cases like fraud detection or medical
diagnosis.
 AUC ~ 0.5: The model has no discriminatory ability, indicating
random guessing. This typically calls for improvements in the model
or data.

 AUC < 0.5: A model worse than random guessing, which would be
problematic in any business setting, as it means the model is
actively making wrong predictions.

Example Code (Python):

python

Copy

import matplotlib.pyplot as plt

from sklearn.metrics import roc_curve, roc_auc_score

from sklearn.model_selection import train_test_split

from sklearn.datasets import make_classification

from sklearn.ensemble import RandomForestClassifier

# Generate synthetic binary classification dataset

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2,


random_state=42)

# Split dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Train a RandomForest model

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Predict probabilities on the test set

y_probs = model.predict_proba(X_test)[:, 1] # Probabilities for the


positive class
# Compute ROC curve and AUC

fpr, tpr, thresholds = roc_curve(y_test, y_probs)

roc_auc = roc_auc_score(y_test, y_probs)

# Plot the ROC curve

plt.figure()

plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC =


{roc_auc:.2f})')

plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver Operating Characteristic (ROC) Curve')

plt.legend(loc='lower right')

plt.show()

Conclusion:

The ROC-AUC curve is a powerful tool for evaluating the performance of


binary classification models. It provides insights into how well the model
distinguishes between positive and negative classes across different
thresholds. For businesses, it can be crucial for making informed
decisions, especially in high-stakes areas like fraud detection, medical
diagnosis, and customer churn prediction.

You might also like