0% found this document useful (0 votes)
4 views11 pages

Module2_FATE

This document discusses the importance of transparency and fairness in machine learning, highlighting the risks of bias and discrimination in algorithmic decision-making. It outlines guidelines for responsible development, accountability, explainability, and fairness, while also addressing challenges related to unbalanced datasets. Solutions include data preprocessing, using appropriate evaluation metrics, and exploring various modeling approaches to ensure equitable outcomes.

Uploaded by

steve.martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Module2_FATE

This document discusses the importance of transparency and fairness in machine learning, highlighting the risks of bias and discrimination in algorithmic decision-making. It outlines guidelines for responsible development, accountability, explainability, and fairness, while also addressing challenges related to unbalanced datasets. Solutions include data preprocessing, using appropriate evaluation metrics, and exploring various modeling approaches to ensure equitable outcomes.

Uploaded by

steve.martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MODULE 2: TRANSPARENCY AND FAIRNESS IN

MACHINE LEARNING
INTRODUCTION
Despite the promises shown by machine learning research, there is a growing awareness that it also
raises novel challenges and concerns related to fairness and equitability. Policymakers and
regulators have highlighted the potentially discriminatory impact of autonomous models and the
dangers they pose by inadvertently promoting biases and racism. Also, the autonomous nature of
decision making allows plausible deniability for developers and corporations implementing the
complex models, which are often black-box and complex, with little or no explanations for the
decisions provided.

Fig 1. Diversity in end-users (Image Source: https://www.rocketspace.com/corporate-innovation/why-


diversity-and-inclusion-driving-innovation-is-a-matter-of-life-and-death)

TRANSPARENCY AND FAIRNESS IN MACHINE LEARNING


Most computer programmers and ML researchers (like us) do not have sufficient exposure to the
diversity present in the target user community. What is often perceived as outliers in our data are
real people who are just different from the norm (women, gender-fluids, people of color, people
with disability, to name a few)
ML researchers focus on specific evaluation parameters to test the efficiency of the algorithm.
However, the predicted score generated by an algorithm should be fair to different demographic
groups (as the engineering schools have more male students, an automated algorithm may learn to
use gender as a differentiating factor to make admission decisions). Most of the current models lack
accountability. For any decision taken automatedly (and algorithmically), the model should be able
to explain the decision on demand (Transparency)
Now the questions to ask here are:
1) When can we blindly trust an algorithm to make a decision for us?
2) What are the different checks which should be in place for such autonomous decision
making?
3) Do we trust the people who develop algorithms to be bias-free? If not, the algorithm could
be biased too.

DISCRIMINATION IN ML
Safiya U. Noble, in her book “Algorithms of Oppression: How Search Engines Reinforce Racism,”
points out numerous examples of search engines being insensitive and biased towards minorities
and people of color. The author worked in the tech industry and offers constructive criticism on
how to remedy such biases in search engines.

Fig 2. Anti-black autocomplete recommendations from a search engine (Source: Safiya U. Noble. Algorithms of
Oppression: How Search Engines Reinforce Racism)

The author also shows how Google’s image search results for beautiful has a strong bias for blonde,
white girls (Figure 3)
Fig 3. Google’s image search results for ‘beautiful’ (Source: Safiya U. Noble. Algorithms of Oppression: How
Search Engines Reinforce Racism)

Similarly, Google was heavily criticized after the new Photos app categorized the photos came
under fire this week after its new Photos app categorized photos in one of the most racist ways
possible. You can read about it here: https://www.theverge.com/2015/7/1/8880363/google-
apologizes-photos-app-tags-two-black-people-gorillas

Fig 4. Google Image Tagging

There are several other examples of bias in AI. Amazon’s algorithm learned how to downgrade
female applicants for technical roles in the company systematically. You can read about it here:
https://phys.org/news/2018-11-amazon-sexist-hiring-algorithm-human.html
The potential harm skyrockets when we use blackbox ML models without any human control. We
already learned about automated CV selector. Other application domains could include College
Admissions or Criminal Sentencing.

Fig 6. Criminal Sentencing

This article (https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-


sentencing) shows how black people are more likely to be identified as high-risk individuals by
automated algorithms predicting future crimes.

GUIDELINES FOR AUTONOMOUS, ALGORITHMIC DECISION-


MAKING
Responsibility
Responsible development should help address the adverse individual or societal effects of the
system. The developer should be aware of the affordances offered, the possible implications, and
the likelihood of misuse. There should also be designated auditors to identify and remedy any
potential issues.

Accountability
Any person who has some degree of control to cause harm through the use of an algorithm or can
prevent it should be held accountable for any wrongful application of the algorithm. The
accountability should be extended to provide compensation or accept legal ramifications should the
algorithm harm any individual or groups of individuals.
Who is accountable for ML algorithm behavior?
1) Developers who must design algorithms so that oversight authorities meet pre-defined
rules (“procedural regularity”)?
2) Data providers?
3) Regulators, who determine the scope of oversight (e.g., require describing and explaining
failures in ML systems)?
A possible solution would be to create an infrastructure to oversee algorithm decision-makers.
You can read about Uber’s self-driving car accident here: https://towardsdatascience.com/another-
self-driving-car-accident-another-ai-development-lesson-b2ce3dbb4444

Explainability
The algorithmic decisions should be explainable to the end-user and other stakeholders in simple,
non-technical terms. The system, on-demand, should be able to explain the steps undertaken and
the logic behind the decision, the alternate actions which were available but not pursued, and the
variables which most likely influenced the outcomes.

Fairness
A fair algorithm should not perpetuate bias in the decisions or discriminate against end-users based
on their gender, nationality, color, sexual orientation, religion, or other demographic factors. You
can find 21 definitions of fairness here: https://www.youtube.com/watch?v=jIXIuYdnyyk
Fairness is hard to assess objectively and is, therefore, a complex problem. There are no best
practices (apart from the moral compass, of course) to guide the evaluation. The ideas of fairness
are case-specific and may vary for every situation. Therefore, we need to include experts,
researchers, and scholars from ethics, social work, law, politics, and computer science to develop
the best practices. The end goal should be to develop a bias-free system which is explainable,
transparent, and non-discriminatory towards end-users.

SOLUTIONS
How to make fair algorithms?
1) Pre-processing:
a. Training data: modify it
b. Do not remove outliers if they are not data entry errors
c. Oversample minority classes
2) Optimization at training
a. Algorithm
i. objective functions to penalize unfairness (e.g., add regularization)
ii. focus on metrics like TPR instead of just accuracy
iii. prejudice removal
b. Features
i. remove those that reflect bias, e.g., gender, race, age, education, sexual
orientation, etc.
3) Post-process predictions
a. Counterfactual assumption: check the impact of modifying a single feature
b. Qualitative analysis of results with a focus on explaining results

Accountability
1) Develop legislatures to hold developers, corporations, and application owners accountable.
2) Create infrastructure to oversee algorithm decision-makers.
3) Educate stakeholders and algorithm developers about potential issues.

Transparency
The system should be able to explain:
1) Why was the decision made?
2) What were the alternatives, and why were those rejected?
3) How does the algorithm define success (or criteria for success)?
4) How does the algorithm define failures (or criteria for failures)?
5) How does the system recognize and correct errors?
The developers/researchers should be able to answer all the above questions in addition to:
1) Why should end users trust the system?
2) Can they explain randomly picked instances and explain the decisions which the system
made?
Another solution would be to develop a collection of explainable models (Explainable AI or XAI)
which shows comparable performance (to blackbox models).

Fig 6. ML models and Explainability 1

1 https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf
Fig 7. How decision trees work (Explainable)2

Fig 8. How deep neural networks work (Hard to explain)3

2https://dataaspirant.com/how-decision-tree-algorithm-works/
3https://towardsdatascience.com/build-your-first-deep-learning-classifier-using-tensorflow-dog-breed-
example-964ed0689430
HOW TO DEAL WITH UNBALANCED DATASETS?
What is an unbalanced dataset?
After our Problem Solving with Data part-1 (Fall 2020), you are aware that statistical models are
used for predicting continuous values (as linear regression does) or output classes (when the
output is categorical and not continuous). It is highly probable that in the dataset, the target or
predicted variable contains more instances of a particular output category than others (for
example, more males than females for gender). For example, if we predict cyberbullying using
Instagram images, almost 99% of instances will say, “No cyberbullying.” Here, Cyberbullying is the
binary target variable. We will use 0 to denote “No Cyberbullying” and 1 for “Cyberbullying.” “No
Cyberbullying” is the majority class as instances of cyberbullying are rare in our data.
“Cyberbullying” is the minority class as we have 1% of observations (or instances) belonging to it.
Similar observations could be made for multiclass predictions, and there could be multiple minority
classes in the collection.

What are the challenges of working with unbalanced data?


When predicting the output class, the models need to be trained on the dataset. However, for
models trained on unbalanced datasets, the accuracy may be high initially, but the results fail to
generalize on unseen data. During training, the algorithm receives more examples from the
majority class and learns a biased interpretation of the data. For example, in the previous example,
it learns that it can predict every instance as “No cyberbullying”, and yet, get an accuracy of 99%.
This is known as the accuracy paradox.

Accuracy Paradox
The underlying class distribution of the unbalanced dataset affects the accuracy of the models. By
predicting every (or almost every) instance as the majority class, the model shows high accuracy,
but it defeats the purpose of classification (for example, it will never detect any case of
cyberbullying)
It can be mitigated by selecting alternate metrics (e.g., recall) instead of accuracy.

Model 1:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 0 (TP) 0 (FP)
Predicted “No” 50 (FN) 950 (TN)
TP = True Positive, FP = False Positive, TN = True Negative, FN = False Negative
Accuracy = 950/1000 = 0.95 (or 95%)
Precision = TP/(TP+FP) = 0
Recall = TP/(TP+FN) = 0
Model 2:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 25 (TP) 100 (FP)
Predicted “No” 25 (FN) 850 (TN)

Accuracy = 875/1000 = 0.875 (or 87.5%)


Precision = TP/(TP+FP) = 25/(25+100) = 0.2
Recall = TP/(TP+FN) = 25/(25+25) = 0.5

Model 3:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 50 (TP) 200 (FP)
Predicted “No” 0 (FN) 750 (TN)

Accuracy = 800/1000 = 0.8 (or 80%)


Precision = TP/(TP+FP) = 50/(50+200) = 0.2
Recall = TP/(TP+FN) = 50/(50+0) = 1
The bottom line is, if it is crucial to identify the minority class, a good idea is to go with recall
(which is also known as the true positive rate or TPR)

Steps to solve the problem


Collect more data
Can you collect more data? A larger dataset with a more uniform distribution is always the desired
option. Alternatively, you can also try combining different datasets with similar features. This
approach works as long as the final dataset obtained is not unbalanced.

Use proper evaluation metric


As shown in the accuracy paradox, the choice of metric is important based on the problem at hand.
If the dataset is unbalanced, it is always better to use a combination of metrics to capture the full
story. Some of the options are Confusion Matrix, Precision, Recall, F-score, Kappa, and AUC/ROC
curves.

Resampling the dataset


The goal is to reduce the ratio of majority to minority classes (for example, getting the ratio 99:1 to
something closer to 50:50). There are two ways of doing that:

Undersampling
You can randomly select observations from the dataset to obtain a desired ratio (50:50 or 60:40 for
binary classification). For multiclass prediction, the ratio of all classes should be as close to 1 as
possible. You can either perform random sampling without replacement or discard observations
from the majority class(es). Another option would be to perform clustering initially to obtain
groups and then picking observations from each group to maintain the desired ratio.
Preferred when working with a big dataset with a large number of instances.

Oversampling
You can oversample the minority class by adding synthetic or artificially generated instances of the
minority class based on the available data. Some algorithms that allow for this include Variational
Autoencoders (VAE), SMOTE (Synthetic Minority Over-sampling Technique), or MSMOTE (Modified
Synthetic Minority Over-sampling Technique).

Try alternate models


It may not be a good idea to use the same model for every problem. After all, one size does not fit all.
Researchers use a collection or ensemble of algorithms for prediction because each comes with its
own merits and shortcomings. It has been observed that decision trees work well with unbalanced
datasets. Some decision tree variants (Random Forest) average out multiple trees and are more
immune to class biases and overfitting.
You could also use penalized classification models that penalize false negatives (missing out) for
minority classes during training. This way, the model prioritizes identifying the instances belonging
to the minority class.

Try novel approaches


You can always get more creative and try new approaches. You can find some of them here:
Quora: https://www.quora.com/In-classification-how-do-you-handle-an-unbalanced-training-set
Reddit:
https://www.reddit.com/r/MachineLearning/comments/12evgi/classification_when_80_of_my_tra
ining_set_is_of/

LIBRARIES
In Python, one of the best options is the imbalanced-learn (imblearn) package, which implements
oversampling (from imblearn.over_sampling import SMOTE) and undersampling (from
imblearn.under_sampling import RandomUnderSampler).
You can find more details about the libraries in the API libraries:
Imbalanced Learn: https://imbalanced-learn.readthedocs.io/en/stable/index.html#
SMOTE: https://imbalanced-
learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html
RandomUnderSampler: https://imbalanced-
learn.readthedocs.io/en/stable/generated/imblearn.under_sampling.RandomUnderSampler.html
READINGS
Mandatory
1) Why fairness in algorithmic decision making is important? Introduction:
https://fairmlbook.org/introduction.html
2) Ethics guideline for trustworthy AI: https://ec.europa.eu/digital-single-
market/en/news/draft-ethics-guidelines-trustworthy-ai
3) Algorithmic fairness is as hard as causation: http://joshualoftus.com/post/algorithmic-
fairness-is-as-hard-as-causation/

Optional
1) Facebook’s control you’re your newsfeed:
https://www.forbes.com/sites/gregorymcneal/2014/06/28/facebook-manipulated-user-
news-feeds-to-create-emotional-contagion/?sh=70c795bc39dc
2) Interpretable Machine Learning: https://www.h2o.ai/blog/what-is-your-ai-thinking-part-
1/
3) Managing risks in machine learning: https://www.oreilly.com/radar/managing-risk-in-
machine-learning/
4) DARPA’s program on explainable AI: https://www.darpa.mil/program/explainable-
artificial-intelligence
5) FAT conference 2020: https://dl.acm.org/doi/proceedings/10.1145/3351095
6) FATE: https://www.microsoft.com/en-us/research/theme/fate/
7) Fairness in Machine Learning:
https://fairmlbook.org/
https://fairmlclass.github.io/
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb

You might also like