ROC Auc
ROC Auc
Key Terminology:
Where:
o TP = True Positives
o FN = False Negatives
Where:
o FP = False Positives
o TN = True Negatives
3. AUC (Area Under the Curve): The area under the ROC curve,
representing the likelihood that the model ranks a randomly chosen
positive instance higher than a randomly chosen negative instance.
The AUC value ranges from 0 to 1:
Imagine you have a binary classifier that predicts whether a customer will
churn (leave the service, which we treat as "1") or not churn (stay, treated
as "0").
False Positives (FP): Customers who didn’t churn but the model
incorrectly predicted churn.
True Negatives (TN): Customers who didn’t churn and the model
correctly predicted no churn.
Step-by-Step Example:
Model predicts probabilities for each instance in the test set (for
example, for customer churn, the probability that a customer will
churn).
By adjusting thresholds, you can plot points representing the TPR and FPR.
o Why ROC-AUC: The ROC-AUC will help evaluate how well the
model differentiates between customers who will churn vs.
those who won’t, across various thresholds. If the AUC is high,
the model does a good job of identifying at-risk customers.
2. Fraud Detection:
3. Medical Diagnosis:
AUC < 0.5: A model worse than random guessing, which would be
problematic in any business setting, as it means the model is
actively making wrong predictions.
python
Copy
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
plt.figure()
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc='lower right')
plt.show()
Conclusion: