Week 7
Week 7
Analytics
Week 7
Data life cycle
“Data Life Cycle refers to the journey of data from its inception
to its eventual destruction.”
Phases:
● Plan: Decide what kind of data is needed, how it will be
managed, and who will be responsible for it.
● Capture: Collect from a variety of different sources.
● Manage: Care for and maintain the data. (storage too)
● Analyze: Use the data to solve problems, make
decisions, and support business goals.
● Archive: Keep relevant data stored for long-term and
future reference.
● Destroy: Remove data from storage and delete any
shared copies of the data.
01
Data Bias
Data bias is a type of error that systematically skews
results in a certain direction.
Sampling Bias
Sampling bias is when a sample isn't representative of the population as a
whole.
● Randomize in choosing sample to avoid
Observer Bias
Observer bias, which is sometimes referred to as experimenter bias or
research bias. Basically, it's the tendency for different people to observe
things differently
How to minimize observer bias
Interpretation Bias
The tendency to always interpret ambiguous situations in a positive, or
negative way
● Interpretation bias, can lead to two people seeing or hearing the exact
same thing, and interpreting it in a variety of different ways, because
they have different backgrounds, and experiences
Confirmation Bias
Tendency to search for, or interpret information in a way that confirms
preexisting beliefs
● People see what they want to see.
02
Good Data
Quality Standard
If you follow the "R-O-C-C-C" method, you will have an organized strategy for
locating and selecting acceptable data sources, which may improve your decision-
making and analysis.
R O C C C
● Ownership,
● Transaction transparency
● Consent
● Currency
● Privacy
● Openness.
Ownership:
who owns data?
It isn't the organization that invested time and money collecting, storing,
processing, and analyzing it. It's individuals who own the raw data they provide, and
they have primary control over its usage, how it's processed and how it's shared
Transaction Transparency:
The idea that all data processing activities and algorithms should be completely
explainable and understood by the individual who provides their dat a and how it's
shared
Consent:
This is an individual's right to know explicit details about how and why their data
will be used before agreeing to provide it.
They should know answers to questions like
● why is the data being collected?
● How will it be used?
● How long will it be stored?
Currency:
Individuals should be aware of financial transactions resulting from the use of their
personal data and the scale of these transactions. If your data is helping to fund a
company's efforts, you should know what those efforts are all about and be given
the opportunity to opt out
Privacy:
Privacy means preserving a data subject's information and activity any time a data
transaction occurs.
This means someone like you or me should have protection from unauthorized
access to our private data, freedom from inappropriate use of our data, the right to
inspect, update, or correct our data, ability to give consent to use our data, and
legal right to access our data.
Openness:
When referring to data, openness refers to free access, usage and sharing
of data.
Data Anonymization
Data anonymization is the process of protecting people's private or sensitive data
by eliminating personally identifiable information
● Telephone numbers
● Names
● License plates and license
numbers
● Social security numbers
● IP addresses
● Medical records
● Email addresses
● Photographs
● Account numbers
Balancing Data Security and Access
● Encryption
● Tokenization
● Version Control
04
Exploratory Data
Analysis
EDA
Exploratory Data Analysis (EDA) is an analysis approach that identifies
general patterns in the data ,investigate data sets and summarize their
main characteristics, often employing data visualization methods
● Distribution of Data
● Graphical Representations
● Outlier Detection
● Correlation Analysis
● Handling Missing Values
● Summary Statistics
● Testing Assumptions
Univariate Analysis:
Focuses on analyzing a single variable at a time.
● Purpose: To understand the variable's distribution, central tendency, and
spread.
● Techniques:
○ Descriptive statistics (mean, median, mode, variance, standard
deviation).
○ Visualizations (histograms, bar charts).
Bivariate Analysis:
Examines the relationship between two variables.
● Purpose: To understand how one variable affects or is associated with another.
● Techniques:
○ Correlation coefficients (Pearson, Spearman).
○ Cross-tabulations and contingency tables.
○ Visualizations (line plots, scatter plots, pair plots).
Multivariate Analysis