Module2_FATE
Module2_FATE
MACHINE LEARNING
INTRODUCTION
Despite the promises shown by machine learning research, there is a growing awareness that it also
raises novel challenges and concerns related to fairness and equitability. Policymakers and
regulators have highlighted the potentially discriminatory impact of autonomous models and the
dangers they pose by inadvertently promoting biases and racism. Also, the autonomous nature of
decision making allows plausible deniability for developers and corporations implementing the
complex models, which are often black-box and complex, with little or no explanations for the
decisions provided.
DISCRIMINATION IN ML
Safiya U. Noble, in her book “Algorithms of Oppression: How Search Engines Reinforce Racism,”
points out numerous examples of search engines being insensitive and biased towards minorities
and people of color. The author worked in the tech industry and offers constructive criticism on
how to remedy such biases in search engines.
Fig 2. Anti-black autocomplete recommendations from a search engine (Source: Safiya U. Noble. Algorithms of
Oppression: How Search Engines Reinforce Racism)
The author also shows how Google’s image search results for beautiful has a strong bias for blonde,
white girls (Figure 3)
Fig 3. Google’s image search results for ‘beautiful’ (Source: Safiya U. Noble. Algorithms of Oppression: How
Search Engines Reinforce Racism)
Similarly, Google was heavily criticized after the new Photos app categorized the photos came
under fire this week after its new Photos app categorized photos in one of the most racist ways
possible. You can read about it here: https://www.theverge.com/2015/7/1/8880363/google-
apologizes-photos-app-tags-two-black-people-gorillas
There are several other examples of bias in AI. Amazon’s algorithm learned how to downgrade
female applicants for technical roles in the company systematically. You can read about it here:
https://phys.org/news/2018-11-amazon-sexist-hiring-algorithm-human.html
The potential harm skyrockets when we use blackbox ML models without any human control. We
already learned about automated CV selector. Other application domains could include College
Admissions or Criminal Sentencing.
Accountability
Any person who has some degree of control to cause harm through the use of an algorithm or can
prevent it should be held accountable for any wrongful application of the algorithm. The
accountability should be extended to provide compensation or accept legal ramifications should the
algorithm harm any individual or groups of individuals.
Who is accountable for ML algorithm behavior?
1) Developers who must design algorithms so that oversight authorities meet pre-defined
rules (“procedural regularity”)?
2) Data providers?
3) Regulators, who determine the scope of oversight (e.g., require describing and explaining
failures in ML systems)?
A possible solution would be to create an infrastructure to oversee algorithm decision-makers.
You can read about Uber’s self-driving car accident here: https://towardsdatascience.com/another-
self-driving-car-accident-another-ai-development-lesson-b2ce3dbb4444
Explainability
The algorithmic decisions should be explainable to the end-user and other stakeholders in simple,
non-technical terms. The system, on-demand, should be able to explain the steps undertaken and
the logic behind the decision, the alternate actions which were available but not pursued, and the
variables which most likely influenced the outcomes.
Fairness
A fair algorithm should not perpetuate bias in the decisions or discriminate against end-users based
on their gender, nationality, color, sexual orientation, religion, or other demographic factors. You
can find 21 definitions of fairness here: https://www.youtube.com/watch?v=jIXIuYdnyyk
Fairness is hard to assess objectively and is, therefore, a complex problem. There are no best
practices (apart from the moral compass, of course) to guide the evaluation. The ideas of fairness
are case-specific and may vary for every situation. Therefore, we need to include experts,
researchers, and scholars from ethics, social work, law, politics, and computer science to develop
the best practices. The end goal should be to develop a bias-free system which is explainable,
transparent, and non-discriminatory towards end-users.
SOLUTIONS
How to make fair algorithms?
1) Pre-processing:
a. Training data: modify it
b. Do not remove outliers if they are not data entry errors
c. Oversample minority classes
2) Optimization at training
a. Algorithm
i. objective functions to penalize unfairness (e.g., add regularization)
ii. focus on metrics like TPR instead of just accuracy
iii. prejudice removal
b. Features
i. remove those that reflect bias, e.g., gender, race, age, education, sexual
orientation, etc.
3) Post-process predictions
a. Counterfactual assumption: check the impact of modifying a single feature
b. Qualitative analysis of results with a focus on explaining results
Accountability
1) Develop legislatures to hold developers, corporations, and application owners accountable.
2) Create infrastructure to oversee algorithm decision-makers.
3) Educate stakeholders and algorithm developers about potential issues.
Transparency
The system should be able to explain:
1) Why was the decision made?
2) What were the alternatives, and why were those rejected?
3) How does the algorithm define success (or criteria for success)?
4) How does the algorithm define failures (or criteria for failures)?
5) How does the system recognize and correct errors?
The developers/researchers should be able to answer all the above questions in addition to:
1) Why should end users trust the system?
2) Can they explain randomly picked instances and explain the decisions which the system
made?
Another solution would be to develop a collection of explainable models (Explainable AI or XAI)
which shows comparable performance (to blackbox models).
1 https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf
Fig 7. How decision trees work (Explainable)2
2https://dataaspirant.com/how-decision-tree-algorithm-works/
3https://towardsdatascience.com/build-your-first-deep-learning-classifier-using-tensorflow-dog-breed-
example-964ed0689430
HOW TO DEAL WITH UNBALANCED DATASETS?
What is an unbalanced dataset?
After our Problem Solving with Data part-1 (Fall 2020), you are aware that statistical models are
used for predicting continuous values (as linear regression does) or output classes (when the
output is categorical and not continuous). It is highly probable that in the dataset, the target or
predicted variable contains more instances of a particular output category than others (for
example, more males than females for gender). For example, if we predict cyberbullying using
Instagram images, almost 99% of instances will say, “No cyberbullying.” Here, Cyberbullying is the
binary target variable. We will use 0 to denote “No Cyberbullying” and 1 for “Cyberbullying.” “No
Cyberbullying” is the majority class as instances of cyberbullying are rare in our data.
“Cyberbullying” is the minority class as we have 1% of observations (or instances) belonging to it.
Similar observations could be made for multiclass predictions, and there could be multiple minority
classes in the collection.
Accuracy Paradox
The underlying class distribution of the unbalanced dataset affects the accuracy of the models. By
predicting every (or almost every) instance as the majority class, the model shows high accuracy,
but it defeats the purpose of classification (for example, it will never detect any case of
cyberbullying)
It can be mitigated by selecting alternate metrics (e.g., recall) instead of accuracy.
Model 1:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 0 (TP) 0 (FP)
Predicted “No” 50 (FN) 950 (TN)
TP = True Positive, FP = False Positive, TN = True Negative, FN = False Negative
Accuracy = 950/1000 = 0.95 (or 95%)
Precision = TP/(TP+FP) = 0
Recall = TP/(TP+FN) = 0
Model 2:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 25 (TP) 100 (FP)
Predicted “No” 25 (FN) 850 (TN)
Model 3:
Cyberbullying Actual=yes Actual=no
Predicted “yes” 50 (TP) 200 (FP)
Predicted “No” 0 (FN) 750 (TN)
Undersampling
You can randomly select observations from the dataset to obtain a desired ratio (50:50 or 60:40 for
binary classification). For multiclass prediction, the ratio of all classes should be as close to 1 as
possible. You can either perform random sampling without replacement or discard observations
from the majority class(es). Another option would be to perform clustering initially to obtain
groups and then picking observations from each group to maintain the desired ratio.
Preferred when working with a big dataset with a large number of instances.
Oversampling
You can oversample the minority class by adding synthetic or artificially generated instances of the
minority class based on the available data. Some algorithms that allow for this include Variational
Autoencoders (VAE), SMOTE (Synthetic Minority Over-sampling Technique), or MSMOTE (Modified
Synthetic Minority Over-sampling Technique).
LIBRARIES
In Python, one of the best options is the imbalanced-learn (imblearn) package, which implements
oversampling (from imblearn.over_sampling import SMOTE) and undersampling (from
imblearn.under_sampling import RandomUnderSampler).
You can find more details about the libraries in the API libraries:
Imbalanced Learn: https://imbalanced-learn.readthedocs.io/en/stable/index.html#
SMOTE: https://imbalanced-
learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html
RandomUnderSampler: https://imbalanced-
learn.readthedocs.io/en/stable/generated/imblearn.under_sampling.RandomUnderSampler.html
READINGS
Mandatory
1) Why fairness in algorithmic decision making is important? Introduction:
https://fairmlbook.org/introduction.html
2) Ethics guideline for trustworthy AI: https://ec.europa.eu/digital-single-
market/en/news/draft-ethics-guidelines-trustworthy-ai
3) Algorithmic fairness is as hard as causation: http://joshualoftus.com/post/algorithmic-
fairness-is-as-hard-as-causation/
Optional
1) Facebook’s control you’re your newsfeed:
https://www.forbes.com/sites/gregorymcneal/2014/06/28/facebook-manipulated-user-
news-feeds-to-create-emotional-contagion/?sh=70c795bc39dc
2) Interpretable Machine Learning: https://www.h2o.ai/blog/what-is-your-ai-thinking-part-
1/
3) Managing risks in machine learning: https://www.oreilly.com/radar/managing-risk-in-
machine-learning/
4) DARPA’s program on explainable AI: https://www.darpa.mil/program/explainable-
artificial-intelligence
5) FAT conference 2020: https://dl.acm.org/doi/proceedings/10.1145/3351095
6) FATE: https://www.microsoft.com/en-us/research/theme/fate/
7) Fairness in Machine Learning:
https://fairmlbook.org/
https://fairmlclass.github.io/
https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb