IMPLEMENT ETHICAL AND UNBIASED
ALGORITHMS
INTRODUCTION
Potential problems with AI if not built and used responsibly:
Bias and unfairness: AI models can inherit and replicate bias/discrimination from
training data, especially against minority populations.
Socio-economic inequality: If left uncontrolled, AI can have a negative impact on
the distribution of resources and rights in society.
Invasion of privacy: Collecting and processing personal data for AI training, if not
followed, may violate privacy.
Market fluctuations: AI breakthroughs can cause shocks, causing strong economic
fluctuations.
With these potential implications, Responsible Artificial Intelligence (RAI) is
proposed to build and operate AI reliably, without causing harm. RAI should
achieve the following goals:
Ability to make decisions based on user merit, without discrimination.
Transparent and unbiased, with explainability.
Protect privacy and data security.
Comply with ethical and legal rules.
The next section introduces the four main aspects of RAI:
Fairness: Ensures the AI does not discriminate based on identifying characteristics
such as gender, race, religion, etc.
Explainability: Provides transparency about how AI operates and makes decisions.
Accountability: Monitor the entire lifecycle of the model, from going into use to
adapting to changing data.
Data/model security: Protect the confidentiality and integrity of data and models
used for AI.
FAIRNESS AND PROXY FEATURES
Introduction to the concept of fairness in Machine Learning
This section explains why the issue of fairness becomes important in ML. When
we don't have enough historical data to train the model from scratch, we are forced
to use transfer learning techniques - applying knowledge from another task. This
risks introducing existing biases and prejudices into the new model, causing
undesirable discriminatory results against certain user groups.
Basic concepts and parameters
Desired/unwanted outcomes: Explain how to classify the output of a prediction
model as favorable or unfavorable.
Favored/disadvantaged class: Explain the concept of historically advantaged
(privileged) classes and disadvantaged classes based on characteristics such as
gender, race, economic status.
Independent features and protected features: Classify input features based on
whether they contain sensitive information (gender, race,...) or not.
Confusion Matrix and evaluation indicators
This section presents how to evaluate the performance of the classification model
using the Confusion Matrix, defining True Positive, True Negative, False Positive,
False Negative cases. From there, the indices Coverage, Specificity, and False
Positive/Negative Rate are introduced to measure the ability of the model to
correctly or incorrectly predict cases.
Fairness concepts and metrics
Fairness is defined based on the relationship between the protected characteristic
and the model's predicted results compared to reality. For example, if women (a
disadvantaged class) have a higher rate of job rejection than men, this is a case of
inequity. Indicators such as Predicted Fairness, Equal Opportunity, Predicted
Symmetry, Demographic Distribution, Mean Opportunity Difference are used to
quantify the level of fairness.
Proxy Features
This is the bulk of the chapter, explaining what proxy features are and why they
are dangerous. Proxies are independent characteristics that seem harmless, but are
correlated with protected characteristics such as gender and race. Therefore, even
when removing protected features, proxies can still bias and discriminate
inheritance models. The chapter gives many real-life examples of proxies such as
zip codes, shopping patterns, annual income, etc.
Proxy detection methods
To handle the proxy problem, it is first necessary to detect and identify them in the
input data. The chapter introduces 4 main methods including:
Linear Regression and Variance Magnification Factor: Based on the regression
model to check the degree of multicollinearity between pairs of features.
Information Sum Effect: Evaluates the relevance of a feature to the label based on
statistical information.
Cosine Similarity Method: Uses distance measurement technique to check the
similarity between pairs of features.
Common Information Method: Calculate the amount of common information
between pairs of features, even if they have a nonlinear relationship.
BIAS IN DATA
Chapter 3 - Bias in Data
The concept of data bias and the importance of handling this issue:
Data bias refers to systematic biases or biases in data, leading to inaccurate or
unfair results.
Bias can occur during data collection when the data are not fully representative of
the entire population under study.
If left uncorrected, data bias will cause AI models trained on that data to
systematically inherit and replicate biases.
The chapter gives two typical examples of data bias:
Amazon's candidate assessment tool: While developing an AI tool to evaluate
candidate resumes in 2018, Amazon discovered the tool was biased against women
by learning from hiring decisions in The past has favored men.
SEPTA Security System in Philadelphia: Security algorithms at SEPTA trained
from data tend to link people of color more with criminal behavior, leading to the
risk of discrimination and racial bias .
The next section introduces two metrics to evaluate the degree of bias in the data:
Statistical Parity Difference (SPD):
SPD measures the difference in rates of good/bad outcomes between two groups
(e.g., privileged and disadvantaged).
Formula: SPD = P(Ŷ=1|D=priority) - P(Ŷ=1|D=disadvantage)
Where Ŷ is the predicted label, D is the protected feature.
The closer the SPD value is to 0, the more fair it is.
Disparate Impact Ratio (DIR):
DIR is the ratio of probability of achieving good results between disadvantaged
and priority groups.
Formula: DIR = P(Ŷ=1|D=disadvantaged) / P(Ŷ=1|D=priority)
A DIR in the range of 0.8-1.2 is generally considered acceptable.
The chapter also provides examples of specific calculations of SPD and DIR with
hypothetical data. The final section emphasizes the importance of identifying and
eliminating bias early in the AI project development process.
EXPLANABILITY
The introduction emphasizes the importance of explainability in AI:
Explainability is an essential part of how we understand and trust AI, from how a
machine works to explaining human behavior.
The birth and development of Explainable Artificial Intelligence (XAI) meets the
need for transparency and accountability of AI systems.
The chapter presents three main approaches to building XAI:
Create local surrogate models: Build simpler models to explain the results of the
original complex model.
Use explanatory models from the beginning (explainable models/glass box
models): For example, Linear Regression models, Decision Trees, Linear Sum
Models.
Sensitivity analysis: Check the change in output when changing the input.
Next, the chapter goes into detail about specific interpretation techniques
belonging to the above 3 approaches, divided into 2 groups:
Explanation based on characteristics:
Information Value plots
Partial Dependency Plots
Accumulated Local Effects plots
Sensitivity Analysis
Model-based explanation:
Split and Compare Quantiles
Global Explanation using SHAP
Local Explanation using Force plot, Decision plot, Waterfall plot
Morris Sensitivity Analysis
For each technique, the chapter clearly explains the operating principle, presents
the formula/algorithm, advantages and disadvantages, and comes with vivid
illustrative examples with images and charts.
Then, the chapter introduces two classes of models with high explanatory power:
Generalized Additive Models and Counterfactual Explanation. These are emerging
models that help build transparent AI solutions right from the design stage.
REMOVING BIAS FROM ML MODEL
The introduction explains why it is not a good idea to simply remove features
related to protected classes (like gender, race, religion) to address bias in the
model:
Doing so can result in the loss of a lot of valuable information, causing the model
to lose its predictive ability.
Even if protected features are removed, there may still exist proxy features that
carry information about the protected class, causing bias to still be present.
In some cases, imputation may increase bias rather than reduce it.
The next chapter introduces two main methods for removing bias from machine
learning models:
Reweighting method (Reweighting)
Idea: Adjust the statistical distribution of the training data set by assigning
different weights to the data samples.
Weight calculation formula: W(y,s) = P(Y=y) / P(Y=y|S=s) where Y is the label, S
is the protected feature.
Training loss function: L'(ŷ, y, s) = W(y, s) * L(ŷ, y)
This method is effective in shifting class boundaries, improving fairness.
Clearly illustrated with data distribution graphs.
Additive Counterfactual Fairness
Idea: Normalize the input to remove the component related to the protected class,
keeping only the fair "merit" part.
Algorithm: Uses a simple model (usually linear regression) to predict the "bias" in
the input feature based on the protected class. Then subtract this prediction from
the original input to get the normalized input.
The method allows to handle even the case where the class is continuously
protected.
Illustrated in detail the steps of the algorithm.
At the end of the chapter, an index to evaluate the level of model bias is
introduced, Counterfactual Unfairness (CUF):
CUF = E[(M(X,S) - M(X,-S))^2]
Where M is the model, X is the data, S is the protected class.
CUF measures the difference between a model's output with the same input but
different protected classes.
REMOVING BIAS FROM ML OUTPUT
The introduction explains the problem to be solved:
The models in use may carry bias from the historical data used for training.
An approach is needed to address the problem of bias after the model makes a
prediction.
This approach should not rely on feature design techniques or model optimization.
It should work with any model, be easy to interpret, and have minimal impact on
accuracy.
The method proposed in this chapter is called Reject Option Classifier (ROC).
Overview of ROC:
Binary classification algorithms produce predictions between 0 and 1.
The decision threshold is usually set at 0.5.
ROC introduces the concept of rejecting (R) selection for uncertain predictions.
Distinguish between hard decisions (predictions far from the threshold) and soft
decisions (predictions near the threshold).
Introduces the concept of Critical Region (θ) around a decision threshold, which
specifies the distance from the threshold. θ is an important parameter to reduce
bias.
When to use ROC:
Suitable for any probabilistic classifier for binary classification or continuous
output problems.
No modifications to the learning algorithm or data preprocessing are required.
Provides high control and explainability for bias reduction.
ROC parameters:
Decision threshold: Usually 0.5 but can be adjusted based on data distribution.
Theta (θ): Determines the critical region around the decision threshold. It is
possible to define θ separately to promote the disadvantaged group (θd) and punish
the favored group (θa).
How ROC works:
Divide the data into 3 regions: Favorable results, unfavorable results, and rejection
region (R).
For predictions in region R, ROC does:
Push disadvantaged groups above the threshold
Push the priority group below the threshold
Applies only to "soft" probabilities in the critical region.
The effect of ROC processing on different protected features is also presented,
accompanied by a table of accuracy and fairness indices before and after applying
ROC.
The chapter also offers two ways to handle when there are multiple protected
characteristics:
Process each feature sequentially in order of priority
Create a composite feature that combines many individual features
Finally, the ROC optimization guide:
Optimize the critical region (θ) to balance fairness and accuracy
Optimize decision threshold (different from 0.5)
ACCOUNTABILITY IN AI
Model monitoring techniques and activities to identify any data or conceptual
discrepancies that may impact model performance over time. This is considered
the most important stage after model development.
The chapter explains two main concepts:
Data drift: Is a change in the distribution of input (P(X)) or output (P(Y)) data.
Concept drift: Is a change in the relationship between input and output, i.e. P(X|Y)
or P(Y|X).
The main methods and techniques for detecting differences are introduced:
Jensen-Shannon divergence: Measures the distance between two distributions.
Wasserstein distance: Measures the minimum amount of work required to
transform one distribution into another.
Stability Index: Use the Population Stability Index (PSI) and Characteristic
Stability Index (CSI) to evaluate the consistency between training data and product
data.
Kolmogorov-Smirnov Test: Widely used in financial services to detect concept
drift for binary classification problems.
Brier Score: Simple, tracks a model's overall accuracy to detect deterioration over
time.
Early Difference Detection Method (EDDM): Tracks the average and maximum
error, suitable for binary classification.
Quadruple Hierarchical Linear Ratio (HLFR): Monitor ratios such as True Positive
Rate, True Negative Rate to evaluate model performance.
The chapter also emphasizes the importance of monitoring model performance and
setting appropriate alert thresholds. If the threshold is not met, tuning, retraining,
or replacing the model may be necessary.
DATA AND MODEL PRIVACY
Introduce:
Data security refers to the protection of sensitive or personal information from
unauthorized access, use or disclosure.
Model security refers to protecting machine learning models and associated data
from unauthorized access or tampering.
The goals of the attack can be: identify and leak user data/model parameters, cause
misclassification of target objects, reduce model accuracy.
Basic techniques:
Hashing: Hides data but has disadvantages with high-dimensional data, does not
hide relationships, is not suitable for data discovery.
K-anonymity, L-diversity and T-closeness: Improve k-anonymity by ensuring
intra-group diversity for sensitive, but still breakable, features.
Differential Privacy:
Ensure confidentiality when sharing information about a group of individuals by
describing patterns within the group without revealing specific details about each
record.
The main idea is to add controlled noise to the data to prevent precise identification
of individuals but still maintain overall statistical properties.
Sensitivity is a central concept for the design of differential security algorithms,
measuring the maximum change in output when a record is changed.
The security budget (epsilon) determines the level of security, a small value
ensures higher security.
Exponential Mechanism:
To add noise to functions that handle non-arithmetic data.
Adds noise by sometimes returning false values.
Always returns a value that belongs to the original data set.
Differential Security ML Algorithm:
Add noise to the model's objective function, output, or weights.
In the case of tree-based algorithms, noise is added to the information gain
parameter.
Federated Learning:
Each node in the system has its own data set.
Model weights are calculated by aggregating local loss functions from all nodes.
Conclude:
Easily extracting personal information from non-private data sets requires strong
measures to ensure security.
Careful selection of parameters to avoid excessive noise causing problems such as
over-discrimination or model degradation.
Data security is important in machine learning algorithms because it protects user
data and reduces bias from the model.
Applying a security strategy early in the model building process will improve the
model's immunity to attacks.