Model Explainablity
Model Explainablity
1. Overview
Model Explainability refers to the process of making machine learning (ML) or deep learning
models' predictions transparent and understandable. The objective is to provide insights into how a
model makes decisions, identify which features are influencing the results, and ensure that the
model's behavior is interpretable by humans. This is critical for trust, transparency, debugging, and
compliance, particularly in domains where decision-making has significant consequences, such
as finance, healthcare, and telecom.
The use case for model explainability spans a wide range of model types, including classical
machine learning models (e.g., Logistic Regression, Random Forests) and more complex deep
learning models (e.g., CNNs, RNNs, Transformers). Each model type requires different
explainability techniques, and understanding the differences is crucial for building an effective,
transparent model pipeline.
2. Objective
• Objective: To create a unified framework for explaining predictions across various types of
models, from classical machine learning to deep learning models, ensuring stakeholders
can trust, validate, and improve the models based on clear insights.
• Classical Machine Learning Models: These include models like Linear Regression,
Logistic Regression, Decision Trees, Random Forests, Gradient Boosting Machines
(GBM), and Support Vector Machines (SVM).
• Deep Learning Models: These include Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), Transformers, Autoencoders, and Deep
Reinforcement Learning.
Each of these models requires different techniques to explain their predictions, which we will cover
next.
Global Explainability:
• Feature Importance: Shows which features most influence the model's predictions.
o For tree-based models (e.g., Random Forest, XGBoost), tools like SHAP (SHapley
Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations)
are commonly used to compute feature importance.
o For linear models (e.g., Logistic Regression), simply looking at the coefficients of
the model provides a basic form of explainability, though this is less comprehensive
for complex data.
Local Explainability:
• LIME: Used to explain individual predictions by approximating the model locally with an
interpretable surrogate model. LIME works by perturbing the input data and observing how
the model responds, creating interpretable local explanations.
• SHAP: SHAP values provide both global and local feature importance by allocating a
“contribution” score to each feature based on how much it changes the model's prediction.
SHAP can be visualized using summary plots, force plots, and dependence plots.
Model Transparency:
• Partial Dependence Plots (PDP) and Accumulated Local Effects (ALE): These plots are
used to visualize the relationship between a feature and the predicted outcome for the
entire dataset.
Global Explainability:
• Attention Maps: For RNNs and Transformers, attention mechanisms can be used to show
which parts of the input the model is focusing on while making predictions (e.g., in NLP
tasks). Attention heatmaps help visualize this.
• Layer-wise Relevance Propagation (LRP): LRP is used in CNNs and fully connected
networks to attribute the importance of each neuron or layer in the network. It traces how
the model’s output is influenced by the input through the layers.
Local Explainability:
Model Transparency:
• Feature Attribution: For deep learning models, feature attribution methods like Integrated
Gradients and SHAP can be applied, although they are computationally expensive.
Global Explainability:
• Surrogate Models: In complex deep learning models, simpler surrogate models like
decision trees or linear models can be used to approximate the predictions of the model
and explain them in a more interpretable way.
Local Explainability:
The Explainable AI Toolkit (XAITK) is a comprehensive suite of tools designed to aid users,
developers, and researchers in understanding and analyzing complex machine learning models.
• Analytics Tools: Features tools like the After Action Review for AI (AARfAI), which enhances
domain experts’ ability to systematically analyze AI’s reasoning processes.
• Bayesian Teaching for XAI: Incorporates a human-centered framework based on cognitive
science, applicable in various domains like image classification and medical diagnosis.
• Counterfactual Explanations: Provides frameworks for generating counterfactual
explanations, particularly useful in enhancing human-machine teaming.
• Datasets with Multimodal Explanations: Offers datasets for activity recognition and visual
question answering, complete with multimodal explanations.
• Misinformation Detection: Includes research tools for understanding and combating the
spread of misinformation through XAI-assisted platforms.
• Natural Language Explanations and Psychological Models: Provides methods for generating
natural language explanations for image classification and technical reports on explanatory
reasoning models.
2. SHAP
SHAP (Shapley Additive Explanations) is a method widely used in machine learning and AI for
interpreting predictions of ML models. It stands out as a versatile and popular tool in the domain of
explainable AI (XAI), offering insights into the predictions of various models.
• Shapley Values: Measure the average marginal contribution of a feature in a dataset across all
possible combinations.
• Marginal Contribution Calculation: Evaluating all possible combinations or ‘coalitions’ a
feature can participate in within a dataset.
• Interpreting Complex Models: SHAP effectively handles models with a large number of
features, including discrete and continuous variables.
• Application: It can be applied to any model type and works by distributing the “credit” for a
model’s output among the features. This technique uses Shapley values from cooperative
game theory.
• Use Case: SHAP is effective for classical machine learning models and neural networks,
providing both local and global explanations.
3. LIME
Local Interpretable Model-agnostic Explanations (LIME) is a tool used in the field of explainable AI
(XAI) to provide understandable explanations for the predictions made by complex machine
learning models.
• Model-Agnostic Capability: LIME can be applied to any machine learning model, regardless of
its internal workings or complexity.
• Local Explanation: LIME focuses on providing explanations for individual predictions, making
the insights highly specific and relevant to the given instance.
• Interpretable Proxy Models: LIME generates simpler models (like linear models) that
approximate the complex model’s behavior around the prediction to be explained.
• Feature Importance: LIME provides quantitative measures of the impact of each feature on the
prediction, known as feature importance scores.
• Customization and Configuration: Users can configure and tune various aspects of LIME,
such as the choice of the surrogate model and the sampling strategy.
4. ELI5
ELI5, short for “Explain Like I’m 5,” is a Python library designed for visualizing and debugging
machine learning models, providing a unified API to explain and interpret predictions from various
models.
• Unified API for ML Model Explanation: Offers a consistent and user-friendly API to interpret
and debug a wide range of machine learning models.
• Visualization and Debugging: Provides tools for visualizing machine learning models, making
it easier to understand and debug them. It also allows visualization of features impacting model
predictions.
• Built-in Support for Multiple ML Frameworks: Integrates seamlessly with several major
machine learning frameworks and packages.
5. InterpretML
6. Skater
Skater is an open-source, model-agnostic unified Python framework for model explainability and
interpretability. Data scientists can build interpretability into a machine learning system for real-
world use cases.
Skater approaches explainability both globally (inference based on a complete dataset) and locally
(inference individual predictions). It supports deep neural networks, tree algorithms, and scalable
Bayes.
7. What-if Tool
WIT, developed by the TensorFlow team, is an interactive, visual, no-code interface for visualizing
datasets and models in TensorFlow for a better understanding of model outcomes. In addition to
TensorFlow models, you can also use the What-If Tool for XGBoost and Scikit-Learn models.
Once a model has been deployed, its performance can be viewed on a dataset in the What-If tool.
Additionally, you can slice the dataset by features and compare performance across those slices.
Then you can identify subsets of data where the model performs best or worst. This can be very
helpful for ML fairness investigations.