0% found this document useful (0 votes)

605 views19 pages

Unit 6. Ethical Issues in Data Science PDF

The document discusses ethics in data science. It covers topics like data ethics, ethical concerns around data collection, storage, usage and sharing. It also discusses biases that can occur like in-group favoritism, out-group negativity, and common cognitive biases. Finally, it provides some approaches to addressing biases like group unaware selection, adjusted group thresholds, demographic parity, equal opportunity, and precision parity.

Uploaded by

test test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

605 views19 pages

Unit 6. Ethical Issues in Data Science PDF

Uploaded by

test test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Ethical Issues in Data Science

Chapter 6
Ethics in Data Science
• Data Ethics encompasses moral obligations of collecting, storing, protecting
and using data which sometimes are sensitive, personal, emotional and
behavioral
• Studies and evaluates the moral problems related to data (including
generation, recording, curation, processing, sharing, uses) and algorithms
(including AI, machine learning and robotics) and related practices.
• We learn about ethical problems that occur during the usage of data.
• The modern technologies generates and make use of huge amount of data
collected from huge varieties of sources.
• Almost every human activities and behaviors are transformed and
translated to the data to make products, decisions, betterment of services
etc.
Ethics in Data Science (contd.)
• Advanced technologies like Machine Learning and AI has brought
many innovation to life that makes human life better. Example use of
ML to diagnosis a disease.
• Data Ethics is the center of concern for anyone who works/handles
data like data analysts or data scientists or any IT professionals.
Ethical Concerns
Data Ethics concerns during:
1. Data Collection
• Data is collected through various technique like Survey, Web scrapping, social medias
etc.
• What data needs to be collected and what is the privacy concern related with those
data? Is the utmost concern.
• Data can be theft, shared or downloaded via TOS violation concerning data.
• Data from secondary sources or secondary use of data should be well taken care of.
• Because data collection can be repetitious, time-consuming, and tedious there is a
temptation to underestimate its importance.
• Those responsible for collecting data must be adequately trained and motivated.
• They should employ methods that limit or eliminate the effect of bias
• They should keep records of what was done by whom and when.
Ethical Concerns (contd.)
2. Data Storage
• Third party storage - Is it safe? Is it secure? Who can access it? What is the
mechanism of authentication and authorization?
• Hardware security - Accessibility to the computers or machines to access
data. Accessibility and portability of storage devices. Can it be moved? Can it
be accessed easily? Where is it stored?
• Data Security - What does the contract say? What is the level of privacy or
severity?
Ethical Concerns (contd.)
3. Data Usage, Sharing and Reproducibility
• How the public data is used? Who can access it? Can it be used or reused for
different research? Can it be reproduced?
• What is the terms of use for public data?
• Can the data can be shared? How is it shared?
Ethical Concerns (contd.)
4. Re-identification and Consent
• Accessing through google, something published for public usage.
• Permission to reuse the data.
• What is the privacy level of data?
5. Data Security
Limiting Access
• Locked Paper Records Offices
• Limiting access to Paper or Electronic records to appropriate personnel
• Password Protection of electronic records
• Defined privileges for electronic data users
• Firewalls to prevent outside access
• Regular Backups and proper archiving
Bias and Fairness in Data
Bias
• Machine learning model should produce a fair result.
• ML model based on data of human behavior can also have biased behaviors
and tendences.
• Cognitive biases are an obstacle when trying to interpret information
• Can easily skew results
• They are innate tendencies
Fairness
• Model should treat people, group or community equally irrespective of caste,
religion, gender, income level, education level etc.
• Free from unnecessary and undue weighting to certain groups or viewpoints.
Common Biases
In Group Favoritism and Outgroup Negativity
In Group Favoritism
• Also called ingroup love
• Tendency to give preferential treatment to the same group they belong to.
• Very likely to occur during data collection.
• Also likely to occur during data filtering or removing irrelevant data.
• Highly impacts when data diversity is needed.
Outgroup Negativity
• Also known as outgroup hate.
• Tendency to unlike the behavior, activities or people themselves who do not belong
to the group they do.
• Very likely to occur during data collection
• Likely to have covered the most of negative aspects only of outgroup community
Common Biases (contd.)
Fundamental Attribution Error
• Tendency that the situational activity or behavior are attributed as intrinsic
quality of someone's character.
• These are the judged or observed pattern and is very likely to occur during
data collection.
• This feeds negative data to the machine learning model resulting in biased
conclusion.
Negativity Bias
• Tendency of emphasizing negative experiences over positives ones.
• This is very likely to occur during decision making.
• The negative thought about society may expect the negative conclusion from
the data science projects.
Common Biases (contd.)
Stereotyping
• This is the tendency of expect a certain characteristics or behaviors without having
actual information.
• This is the expectation set prior to the exploration.
• This is likely to occur during data wrangling and exploratory data analysis.
Bandwagon Effect
• Tendency to follow others because
• Some other top ranked researcher or people did.
• All people are doing I.e. following the mass.
• Likely to occur during data collection like same sort of data is collected based on pre-
collected data or research.
• Some might expect the same result as others has inferred.
Common Biases (contd.)
Bias Blind Spot
• Our tendency not to see own personal biases.
• Likely to ignore or remain unnoticed where there are personal blind spot
biases.
• Likely to occur from data collection to result analysis of data science process
Addressing Bias
• Addressing bias in data science is an
extremely complex topic and most
importantly there are no universal
solutions or silver bullets.
• Before any data scientist can work on
the mitigation of biases we need to
define fairness in the context of our
business problem by consulting the
following:
• As an example, imagine you
want to design some ML system
to process mortgage loan
applications and only a small
fraction of applications are by
women.
Addressing Biases (contd.)
1. Group unaware selection
• It's a preventive measure
• This is the process of preventing the bias by eliminating the factor that is
likely to cause.
• For example, avoid the collection of gender to avoid bias by gender.
2. Adjusted group threshold
• Adjust any biased and unbalanced data
• Because historic biases make women appear less loan-worthy than men,
e.g. work history and childcare responsibilities, we use different approval
thresholds by group.
Addressing Biases (contd.)
3. Demographic Parity
• The output of the machine learning model should not depend on the
sensitive demographic attribute like gender, race, ethnicity, education
level etc.
Addressing Biases (contd.)
4. Equal Opportunity
• Equal opportunity fairness ensures that the proportion of people who
should be selected by the model ("positives") that are correctly selected by
the model is the same for each group. We refer to this proportion as the
true positive rate (TPR) or sensitivity of the model.
• A doctor uses a tool to identify patients in need of extra care, who could be
at risk for developing serious medical conditions. (This tool is used only to
supplement the doctor's practice, as a second opinion.) It is designed to
have a high TPR that is equal for each demographic group.
• Provide equal opportunity to the diverse population.
• Should be fair enough for the representation in sampling and treatment.
• E.g. The representation of men and women should be same for granting loan in
bank.
Addressing Biases (contd.)
5. Precision Parity
• Tune the output of model to treat the group equally.
• Male and Female should get equal salary based on the position. If
machine learning model suggests lesser salary to women compared
to men in same post, then such model should be tuned so that both
have similar earning.
• When building a ML model, keep de-biasing in mind.

UNIT 1 DVT
No ratings yet
UNIT 1 DVT
22 pages
Lesson Plan 5 - Comparing Fractions
No ratings yet
Lesson Plan 5 - Comparing Fractions
4 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
R Programming UNIT-1
No ratings yet
R Programming UNIT-1
48 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
OOPs Unit 1 & 2 Notes PDF
67% (3)
OOPs Unit 1 & 2 Notes PDF
109 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Data Analysis
100% (1)
Data Analysis
4 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
0% (1)
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
2 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
Data Science Capstone Project
No ratings yet
Data Science Capstone Project
21 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Database Management System
No ratings yet
Database Management System
32 pages
Module - 1 IDS
100% (1)
Module - 1 IDS
19 pages
Excel Lab Manual
No ratings yet
Excel Lab Manual
19 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Unit 5
No ratings yet
Unit 5
104 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Mining Social Network Graphs
No ratings yet
Mining Social Network Graphs
35 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Data Science Note
No ratings yet
Data Science Note
24 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
Hill Climbing Algorithm
100% (1)
Hill Climbing Algorithm
49 pages
Unit-I Python Notes
No ratings yet
Unit-I Python Notes
62 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Data Analytics-Lab Manual
No ratings yet
Data Analytics-Lab Manual
19 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Ds Capstone Template Coursera
No ratings yet
Ds Capstone Template Coursera
49 pages
C Programming Question Bank
No ratings yet
C Programming Question Bank
3 pages
Assignment I Data Analytics
No ratings yet
Assignment I Data Analytics
3 pages
Algorithm Solved Mcqs Part II
No ratings yet
Algorithm Solved Mcqs Part II
15 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Handling Missing Value
No ratings yet
Handling Missing Value
12 pages
Python Machine Learning
100% (2)
Python Machine Learning
70 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Predictive Analytics: Course Syllabus
No ratings yet
Predictive Analytics: Course Syllabus
8 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
Unit-Iv-Transaction Concept
No ratings yet
Unit-Iv-Transaction Concept
50 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
Unit 1
No ratings yet
Unit 1
70 pages
R22 DVT Handout
No ratings yet
R22 DVT Handout
10 pages
Ethics in Data Science-2
No ratings yet
Ethics in Data Science-2
10 pages
Data Science Notes Resu
No ratings yet
Data Science Notes Resu
2 pages
Final Project
No ratings yet
Final Project
7 pages
English Lesson Plan 2017
100% (3)
English Lesson Plan 2017
2 pages
Research Reference
No ratings yet
Research Reference
4 pages
The Principle of Child Development
No ratings yet
The Principle of Child Development
2 pages
University of Rizal System Graduate Studies: Republic of The Philippines Province of Rizal Antipolo Campus
No ratings yet
University of Rizal System Graduate Studies: Republic of The Philippines Province of Rizal Antipolo Campus
4 pages
Strategy Craft Skills v1
No ratings yet
Strategy Craft Skills v1
1 page
Critical Thinking Analysis PSSLC
No ratings yet
Critical Thinking Analysis PSSLC
15 pages
EED
No ratings yet
EED
45 pages
Principles of Survey Work and Different Techniques For Urban Design
No ratings yet
Principles of Survey Work and Different Techniques For Urban Design
10 pages
MGT 301 2nd Assignment (Update)
100% (1)
MGT 301 2nd Assignment (Update)
9 pages
DLP Science Grade 7 - Sept 5
57% (7)
DLP Science Grade 7 - Sept 5
2 pages
Observation 8 (Eliciting: Teacher Prompts) - M. Perez
100% (3)
Observation 8 (Eliciting: Teacher Prompts) - M. Perez
3 pages
Inclusiveness-1012 Final Exam (June 2014)
100% (8)
Inclusiveness-1012 Final Exam (June 2014)
5 pages
DARPA 2009 Budget Includes 'Silent Talk' Mind Reading Project, Remote EEG (Electroencephalography) / MEG (Magnetoencephalography)
0% (1)
DARPA 2009 Budget Includes 'Silent Talk' Mind Reading Project, Remote EEG (Electroencephalography) / MEG (Magnetoencephalography)
471 pages
Business Communication
No ratings yet
Business Communication
3 pages
Umesh 156340007 Psychoanalysis Film Theory
No ratings yet
Umesh 156340007 Psychoanalysis Film Theory
10 pages
Mental Health CH 2
No ratings yet
Mental Health CH 2
6 pages
Measuring Usability With The USE Questionnaire: by Arnold M. Lund
No ratings yet
Measuring Usability With The USE Questionnaire: by Arnold M. Lund
4 pages
The Gingerbread Man EFL Lesson Plan
No ratings yet
The Gingerbread Man EFL Lesson Plan
5 pages
Strategic Alignment ITIL Perspective
No ratings yet
Strategic Alignment ITIL Perspective
6 pages
Bab 15 Foundations of Organization Structure
No ratings yet
Bab 15 Foundations of Organization Structure
14 pages
Diass Grade 11 Humss
100% (1)
Diass Grade 11 Humss
8 pages
ESC3701 Assignment 1 2024F3
No ratings yet
ESC3701 Assignment 1 2024F3
14 pages
Example Making Outline Chapter 1
No ratings yet
Example Making Outline Chapter 1
7 pages
The Shame and Guilt Inventory: Development of A New Scenario-Based Measure of Shame-And Guilt-Proneness
No ratings yet
The Shame and Guilt Inventory: Development of A New Scenario-Based Measure of Shame-And Guilt-Proneness
11 pages
Wilfrid Sellars - Empiricism and The Philosophy of Mind-Harvard University Press (1997)
100% (1)
Wilfrid Sellars - Empiricism and The Philosophy of Mind-Harvard University Press (1997)
185 pages
My Experience in Teaching English
No ratings yet
My Experience in Teaching English
4 pages
Motivational and Emotional Influences On Learning: By: Teresita L. Caballero Educ 1 Sunday 7am-10am
No ratings yet
Motivational and Emotional Influences On Learning: By: Teresita L. Caballero Educ 1 Sunday 7am-10am
20 pages
Paper 3 (2020.2.3) Digital Amnesia The Smart Phone and The Modern Indian Student
No ratings yet
Paper 3 (2020.2.3) Digital Amnesia The Smart Phone and The Modern Indian Student
9 pages

Unit 6. Ethical Issues in Data Science PDF

Uploaded by

Unit 6. Ethical Issues in Data Science PDF

Uploaded by

Ethical Issues in Data Science

You might also like