0% found this document useful (0 votes)

3 views

Exploratory Data Analysis unit 2

Exploratory Data Analysis (EDA) is a method for analyzing data sets to uncover patterns, anomalies, and insights through descriptive statistics and visualization techniques. It emphasizes understanding the data's structure and assumptions, which are crucial for ensuring valid analysis and guiding further statistical modeling. EDA differs from classical data analysis by focusing on data exploration rather than imposing predetermined models, utilizing graphical techniques, and being less formal in approach.

Uploaded by

chitransh04shukla

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Exploratory Data Analysis unit 2

Uploaded by

chitransh04shukla

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Exploratory Data

Analysis
Understanding your data through visualization and statistics

Presented By:
Prof. Tithirupa Tapaswini
Assistant Professor
Department of CSE
Medicaps University Indore
What is EDA?
Exploratory Data Analysis (EDA) is an approach to analyzing and understanding
data sets with the goal of discovering patterns, identifying anomalies, testing
hypotheses, and checking assumptions. It involves summarizing the main
characteristics of the data, often using visual methods.
Here are some key components of EDA:
1.Descriptive Statistics: Summarizing data through measures such as mean,
median, standard deviation, and variance.
2.Data Visualization: Using graphs and plots to visualize distributions,
relationships, and trends.
3.Data Cleaning: Identifying and addressing missing values, outliers, and
inconsistencies in the data.
4.Univariate Analysis: Examining individual variables to understand their
distribution and central tendency.
5.Bivariate and Multivariate Analysis: Exploring relationships between two or
more variables to identify correlations and patterns.
6.Feature Engineering: Creating new features or modifying existing ones to
improve the quality of the data or make it more suitable for modeling.

EDA is often a preliminary step before more formal statistical analysis or machine
learning, helping to guide the analysis and ensure that any modeling is based on a
sound understanding of the data.
Assumptions in EDA:
The gamut of scientific and engineering experimentation is virtually limitless. In this
sea of diversity is there any common basis that allows the analyst to systematically
and validly arrive at supportable, repeatable research conclusions.
Underlying Assumptions There are four assumptions that typically underlie all
measurement processes; namely, that the data from the process at hand "behave
like":
1. random drawings;
2. from a fixed distribution;
3. with the distribution having fixed location;
4. with the distribution having fixed variation.
Understanding and addressing the underlying assumptions in Exploratory Data
Analysis (EDA) is crucial for several reasons:
1.Validity of Analysis:
Importance: Assumptions help ensure that the methods and techniques used
during EDA are valid and appropriate for the data. If assumptions are violated, the
results of the analysis may be misleading or incorrect.
Example: If you assume data independence but your observations are correlated
(e.g., time series data), statistical tests that assume independence might produce
unreliable results.

2. Guidance for Further Analysis:

Importance: Addressing assumptions helps in cleaning and preparing data
effectively. This includes handling missing values, correcting errors, and
transforming variables to meet assumptions.
Example: Knowing that data should follow a normal distribution helps in choosing
appropriate statistical tests, such as parametric tests, or deciding if transformations
are needed.

Data Preparation:
•Importance: Addressing assumptions helps in cleaning and preparing data
effectively. This includes handling missing values, correcting errors, and transforming
variables to meet assumptions.
•Example: If the data is not normally distributed, you might need to apply
transformations or use non-parametric methods.
Accuracy of Insights:
•Importance: Assumptions impact the accuracy and reliability of the insights derived
from EDA. If the assumptions are not met, the insights may be flawed or incomplete.
•Example: In linear regression, if the relationship between variables is not linear, the
model's predictions and interpretations could be inaccurate.
Detecting Problems:
•Importance: Checking assumptions helps in identifying and diagnosing potential
problems or limitations in the data or analysis process.
•Example: Detecting multicollinearity (high correlation between independent
variables) can prevent issues in regression analysis, such as inflated standard
errors.
•Improving Model Performance:
•Importance: Properly addressing assumptions can improve the performance of
statistical models and machine learning algorithms. It ensures that models are
well-suited to the data, which enhances their predictive accuracy and
generalizability.
•Example: Addressing heteroscedasticity (non-constant variance of errors) in
regression analysis can lead to more reliable coefficient estimates and confidence
intervals.
Communicating Results:
Importance: Being aware of and communicating the assumptions made during EDA
helps stakeholders understand the context and limitations of the analysis. This
transparency builds trust in the findings and helps in making informed decisions.
Example: Clearly stating that the data was assumed to be normally distributed
helps stakeholders understand the basis of the statistical tests used and their
appropriateness.

In summary, understanding and verifying the underlying assumptions in EDA is

essential for conducting robust and credible data analysis. It ensures that the results
are reliable, the models are appropriate, and the insights derived are meaningful
and actionable.
Importance of Assumptions in EDA:
In EDA, assumptions are important because they guide the choice of analytical
techniques, help validate the results, and ensure accurate interpretations. They
inform whether data needs transformation and influence model selection,
impacting the overall reliability of insights drawn from the data.
1.Guiding Analysis
2. Model Building
3. Interpreting Results
4. Data Transformation
5. Validation and Robustness
6.Insight Generation
What are the EDA Goals?
The primary goal of EDA is to maximize the analyst's insight into a data set and
into the underlying structure of a data set, while providing all of the specific items
that an analyst would want to extract from a data set, such as:
1.a good-fitting, parsimonious model
2. a list of outliers
3. a sense of robustness of conclusions
4. estimates for parameters
5. uncertainties for those estimates
6. a ranked list of important factors
7. conclusions as to whether individual factors are statistically significant
8. optimal settings
Insight into the Data:
Insight implies detecting and uncovering underlying structure in the data.
To get a "feel" for the data, it is not enough for the analyst to know what is in the
data; the analyst also must know what is not in the data, and the only way to do
that is to draw on our own human pattern-recognition and comparative abilities in
the context of a series of judicious graphical techniques applied to the data.

The Role of Graphics

Statistics and data analysis procedures can broadly be split into two parts:
1.Quantitative
2.Graphical
1.Quantitative techniques are the set of statistical procedures that yield numeric
or tabular output.
Examples of quantitative techniques include:
hypothesis testing analysis of variance
Point Estimates and Confidence Intervals
Least square regression

2.Graphical :On the other hand, there is a large collection of statistical tools that
we generally refer to as graphical techniques.
These include:
scatterplots
histograms
probability plots
residual plots
box plots and block plots
An EDA/Graphics
Example Data:

X Y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
The goal of the analysis is to compute summary statistics plus determine
the best linear fit for Y as a function of X, the results might be given as:
N=8
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816
The above quantitative analysis, although valuable, gives us only limited
insight into the data. Scatter Plot- In contrast, the following simple scatter
plot of the data
Suggests the following:
1. The data set "behaves like" a linear curve with some scatter;
2. there is no justification for a more complicated model (e.g., quadratic);
3. there are no outliers;
4. the vertical spread of the data appears to be of equal height irrespective of the
X-value; this indicates that the data are equally-precise throughout and so a
"regular" (that is, equi-weighted) fit is appropriate.

This kind of characterization for the data serves as the core for getting insight/feel
for the data.
Conclusions from the scatter plots are:
1. data set 1 is clearly linear with some scatter.
2. data set 2 is clearly quadratic.
3. data set 3 clearly has an outlier.
4. data set 4 is obviously the victim of a poor experimental design with a
single point far removed from the bulk of the data "wagging the dog".
Techniques for Testing Assumption
The following EDA techniques are simple, efficient, and powerful for the
routine testing of underlying assumptions:
1. run sequence plot (Yi versus i)
2. lag plot (Yi versus Yi-1)
3. histogram (counts versus subgroups of Y)
4. normal probability plot (ordered Y versus theoretical ordered Y)
Sample Plot: Assumptions Do Not Hold
If one or more of the four underlying assumptions do not hold, then it will
show up in the various plots as demonstrated in the following example
Interpretation of 4-Plot
The four EDA plots discussed on the previous page are used to
test the underlying assumptions:
1. Fixed Location: If the fixed location assumption holds, then
the run sequence plot will be flat and non-drifting.
2. Fixed Variation: If the fixed variation assumption holds,
then the vertical spread in the run sequence plot will be the
approximately the same over the entire horizontal axis.
3. Randomness: If the randomness assumption holds, then the
lag plot will be structureless and random.
4. Fixed Distribution: If the fixed distribution assumption
holds, in particular if the fixed normal distribution holds, then
1. the histogram will be bell-shaped,
and 2. the normal probability plot will be linear.
If all four of the assumptions hold, then the process is said definitionally to
be "in statistical control".

Consequences:
The following sections discuss in more detail the
consequences of invalid assumptions:
1. Consequences of non-randomness
2. Consequences of non-fixed location parameter
3. Consequences of non-fixed variation
4. Consequences related to distributional assumptions
Consequences of Non-Randomness
The randomness assumption is the most critical but the least tested.
If the randomness assumption does not hold, then
1. All of the usual statistical tests are invalid.
2. The calculated uncertainties for commonly used statistics
become meaningless.
3. The calculated minimal sample size required for a pre-
specified tolerance becomes meaningless.
4. The simple model: y = constant + error becomes invalid.
5. The parameter estimates become suspect and non
supportable
Autocorrelation:
One of the common problem in non randomness is autocorrelation.
Autocorrelation is the correlation between Yt and Yt-k, where k is an integer that
defines the lag for the autocorrelation. That is, autocorrelation is a time dependent
non-randomness.

Autocorrelation is typically detected via an autocorrelation plot or

a lag plot. If the data are not random due to autocorrelation, then
1. Adjacent data values may be related.
2. There may not be n independent snapshots of the phenomenon
under study.
3. There may be undetected "junk"-outliers.
4. There may be undetected "information-rich"- outliers.
Consequences of Non-Fixed Location Parameter
The usual estimate of location is the mean
from N measurements Y1, Y2, ... , YN.
If the run sequence plot does not support the assumption of fixed location, then

1. The location may be drifting.

2. The single location estimate may be meaningless (if the process is drifting).

3. The choice of location estimator (e.g., the sample mean) may be sub-optimal.

4. The usual formula for the uncertainty of the mean:

5. The location estimate may be poor.

6. The location estimate may be biased
Consequences of Non-Fixed Variation Parameter
The usual estimate of variation is the standard deviation

If the run sequence plot does not support the assumption of

fixed variation, then
1. The variation may be drifting.
2. The single variation estimate may be meaningless (if the
process variation is drifting).
3. The variation estimate may be poor.
4. The variation estimate may be biased
Consequences Related to Distributional Assumptions
Distribution 1. The distribution may be changing.
2. The single distribution estimate may be meaningless
(if the process distribution is changing).
3. The distribution may be markedly non-normal.
4. The distribution may be unknown.
5. The true probability distribution for the error may remain
unknown.
Model 1. The model may be changing.
2. The single model estimate may be meaningless.
3. The default model Y = constant + error may be invalid.
4. If the default model is insufficient, information about a better
model may remain undetected.
5. A poor deterministic model may be fit.
6. Information about an improved model may go undetected
How Does Exploratory Data Analysis differ from
Classical Data Analysis?
EDA is a data analysis approach. What other data analysis
approaches exist and how does EDA differ from these other
approaches? Three popular data analysis approaches are:
1.Classical
2.Exploratory
3.Bayesian

The difference is the sequence and focus of the intermediate

steps.
For classical analysis, the sequence is

Problem => Data => Model => Analysis =>

Conclusions

For EDA, the sequence is

Problem => Data => Analysis => Model =>

Conclusions

For Bayesian, the sequence is

Problem => Data => Model => Prior Distribution =>

Analysis => Conclusions
Focusing on EDA versus classical, these two approaches
differ as follows:
1.Model
2.Focus
3.Techniques
4.Rigor
5.Data Treatment
6.Assumptions
1.Model
Classical : The classical approach imposes models (both
deterministic and and probabilistic) on the data.
Deterministic models include, for example, regression
models and analysis of variance (ANOVA) models. The
most common probabilistic model assumes that the errors
about the deterministic model are normally distributed--this
assumption affects the validity of the ANOVA F tests.

Exploratory :The Exploratory Data Analysis approach does not

impose deterministic or probabilistic models on the data.
On the contrary, the EDA approach allows the data to
suggest admissible models that best fit the data.
2.Focus

Classical The two approaches differ substantially in focus.

For classical analysis, the focus is on the model--
estimating parameters of the model and generating
predicted values from the model.

Exploratory For exploratory data analysis, the focus is on the

data--its structure, outliers, and models suggested by
the data.
3.Techniques

Classical Classical techniques are generally quantitative in

nature. They include ANOVA, t tests, chi-squared tests,
and F tests.

Exploratory EDA techniques are generally graphical. They

include scatter plots, character plots, box plots,
histograms, bihistograms, probability plots, residual plots,
and mean plots.
4.Rigor

Classical Classical techniques serve as the probabilistic

foundation of science and engineering; the most
important characteristic of classical techniques is that
they are rigorous, formal, and "objective".

Exploratory EDA techniques do not share in that rigor or

formality. EDA techniques make up for that lack of rigor
by being very suggestive, indicative, and insightful about
what the appropriate model should be.EDA techniques
are subjective and depend on interpretation which may
differ from analyst to analyst, although experienced
analysts commonly arrive at identical conclusions.
5.Data Treatment

Classical Classical estimation techniques have the

characteristic of taking all of the data and mapping the
data into a few numbers ("estimates"). This is both a
virtue and a vice. The virtue is that these few numbers
focus on important characteristics (location, variation, etc.)
of the population. The vice is that concentrating on these
few characteristics can filter out other characteristics
(skewness, tail length, autocorrelation, etc.) of the same
population. In this sense there is a loss of information due
to this "filtering" process.

Exploratory The EDA approach, on the other hand, often

makes use of (and shows) all of the available data. In
this sense there is no corresponding loss of information
6.Assumptions

Classical The "good news" of the classical approach is that

tests based on classical techniques are usually very
sensitive--that is, if a true shift in location, say, has
occurred, such tests frequently have the power to detect
such a shift and to conclude that such a shift is
"statistically significant". The "bad news" is that classical
tests depend on underlying assumptions (e.g., normality),
and hence the validity of the test conclusions becomes
dependent on the validity of the underlying assumptions.
Exploratory Many EDA techniques make little or no
assumptions--they present and show the data--all of the
data--as is, with fewer encumbering assumptions.
How Does Exploratory Data Analysis Differ from Summary Analysis?
Summary A summary analysis is simply a numeric reduction
of a historical data set. It is quite passive. Its focus is in
the past. Quite commonly, its purpose is to simply arrive
at a few key statistics (for example, mean and standard
deviation) which may then either replace the data set or be
added to the data set in the form of a summary table.

Exploratory In contrast, EDA has as its broadest goal the desire

to gain insight into the engineering/scientific process
behind the data. Whereas summary statistics are passive
and historical, EDA is active and futuristic. In an attempt
to "understand" the process and improve it in the future,
EDA uses the data as a "window" to peer into the heart of
the process that generated the data.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (78)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
Sample Mental Health Progress Note
96% (47)
Sample Mental Health Progress Note
3 pages
2025 MandateForLeadership FULL
70% (10)
2025 MandateForLeadership FULL
920 pages
How To Kiss A Woman's Breast
60% (114)
How To Kiss A Woman's Breast
14 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Advanced Statistics in Quantitative Research
No ratings yet
Advanced Statistics in Quantitative Research
21 pages
AdU Thesis Format - Archi 2222
No ratings yet
AdU Thesis Format - Archi 2222
41 pages
Module 2
No ratings yet
Module 2
81 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Role of Statistics in Data Science
No ratings yet
Role of Statistics in Data Science
8 pages
PRactical Research 2
No ratings yet
PRactical Research 2
32 pages
Research Methodology Overview
No ratings yet
Research Methodology Overview
21 pages
Unit 3
No ratings yet
Unit 3
47 pages
Unit 3
No ratings yet
Unit 3
31 pages
Business Research CH-6
No ratings yet
Business Research CH-6
28 pages
Unit 5 Exploratory Data Analysis (EDA)
100% (1)
Unit 5 Exploratory Data Analysis (EDA)
41 pages
EDA Unit-2
No ratings yet
EDA Unit-2
24 pages
Statistical Modeling For Data Analysis
100% (1)
Statistical Modeling For Data Analysis
24 pages
DA notes
No ratings yet
DA notes
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Interview Preparation Data Science Analyse
No ratings yet
Interview Preparation Data Science Analyse
9 pages
Research Report
No ratings yet
Research Report
47 pages
Unit Ii-Ds
No ratings yet
Unit Ii-Ds
12 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
Week 2 Notes
No ratings yet
Week 2 Notes
11 pages
STATISTICAL-TREATMENT-OF-DATA-WRITING-GUIDE (1)
No ratings yet
STATISTICAL-TREATMENT-OF-DATA-WRITING-GUIDE (1)
4 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
big-data-imp-notes-of-big-dats (1)
No ratings yet
big-data-imp-notes-of-big-dats (1)
17 pages
Crack_Data_Science_Interview_�_1731300339
No ratings yet
Crack_Data_Science_Interview_�_1731300339
132 pages
CAMAD- Data Analysis
No ratings yet
CAMAD- Data Analysis
21 pages
ASL QA
No ratings yet
ASL QA
5 pages
Unit 1 SPSS
No ratings yet
Unit 1 SPSS
9 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Statistics Notes
No ratings yet
Statistics Notes
28 pages
ML_EXP_NO_1
No ratings yet
ML_EXP_NO_1
8 pages
Document 1
No ratings yet
Document 1
8 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
ITRM Indigo Anugerah Sembada 190221612444
No ratings yet
ITRM Indigo Anugerah Sembada 190221612444
3 pages
EDA IMPORTANT TWO MARKS & 16 MARKS
No ratings yet
EDA IMPORTANT TWO MARKS & 16 MARKS
17 pages
SECTION A (10-12)
No ratings yet
SECTION A (10-12)
7 pages
UNIT V STATISTICAL DATA ANALYSIS (1)
No ratings yet
UNIT V STATISTICAL DATA ANALYSIS (1)
72 pages
Eda
No ratings yet
Eda
6 pages
Analysis of Data
No ratings yet
Analysis of Data
4 pages
Unit 3
No ratings yet
Unit 3
77 pages
E Data Analysis
No ratings yet
E Data Analysis
2 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
AA MDM MST
No ratings yet
AA MDM MST
8 pages
EDA 2
No ratings yet
EDA 2
69 pages
Research Skills 2
No ratings yet
Research Skills 2
5 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Unit 1 TE Honours
No ratings yet
Unit 1 TE Honours
22 pages
BA Unit 1 Question Bank
No ratings yet
BA Unit 1 Question Bank
8 pages
Telyu-05.pptx
No ratings yet
Telyu-05.pptx
53 pages
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
No ratings yet
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
6 pages
Investment Analysis Documentation
No ratings yet
Investment Analysis Documentation
70 pages
Analysis and Data Interpretation
No ratings yet
Analysis and Data Interpretation
21 pages
UNIT4
No ratings yet
UNIT4
8 pages
Assignment 2 Eng
No ratings yet
Assignment 2 Eng
3 pages
Lab 1-9
No ratings yet
Lab 1-9
21 pages
2 (Unit 1)
No ratings yet
2 (Unit 1)
15 pages
Rma Midterm Reviewer
No ratings yet
Rma Midterm Reviewer
11 pages
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Digital Image Processing Unit 1 ppt
No ratings yet
Digital Image Processing Unit 1 ppt
38 pages
Digital Image Processing Unit 2 ppt
No ratings yet
Digital Image Processing Unit 2 ppt
82 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
No ratings yet
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
3 pages
NLP 2K22 MAY CS3EA06 Natural Language Processing
No ratings yet
NLP 2K22 MAY CS3EA06 Natural Language Processing
2 pages
RA3CO42 Digital Image Processing QP
No ratings yet
RA3CO42 Digital Image Processing QP
2 pages
Chapter 9,10,11,12 - Công TH C
No ratings yet
Chapter 9,10,11,12 - Công TH C
9 pages
Software Testing and Important Questions TYBCS
No ratings yet
Software Testing and Important Questions TYBCS
2 pages
Sr. Analyst (Provider) JD - Gridlex
No ratings yet
Sr. Analyst (Provider) JD - Gridlex
4 pages
Unit 1
No ratings yet
Unit 1
37 pages
From Genreal To Sepcific English: A Case Study of NUML
No ratings yet
From Genreal To Sepcific English: A Case Study of NUML
15 pages
Effect Size Estimation
No ratings yet
Effect Size Estimation
18 pages
Subtitle
No ratings yet
Subtitle
2 pages
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
27 pages
marketing Analytics assignment
No ratings yet
marketing Analytics assignment
8 pages
NTCC S
No ratings yet
NTCC S
35 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
1 page
Final2017 Solution PDF
No ratings yet
Final2017 Solution PDF
14 pages
19 Assessing Model Accuracy
No ratings yet
19 Assessing Model Accuracy
16 pages
Data Quality For Analytics
No ratings yet
Data Quality For Analytics
9 pages
Quiz Week 5 - Attempt Review
No ratings yet
Quiz Week 5 - Attempt Review
6 pages
OPM 501 Assignment 1
No ratings yet
OPM 501 Assignment 1
16 pages
Anova Mcqs
100% (5)
Anova Mcqs
4 pages
Himanshi Meshram DA Resume
No ratings yet
Himanshi Meshram DA Resume
2 pages
OdinSchool Brochure
No ratings yet
OdinSchool Brochure
13 pages
Effects of Bullying
80% (5)
Effects of Bullying
31 pages
OCS353 - Data Science Manual-FULL
No ratings yet
OCS353 - Data Science Manual-FULL
64 pages
An Investigation Into The Problems and P
No ratings yet
An Investigation Into The Problems and P
47 pages
All The Mu-Sigma Eligible Students Are Directed To Go Through The Pre-Link Before Attending The Muapt Online Test Scheduled On 1 August'2018
No ratings yet
All The Mu-Sigma Eligible Students Are Directed To Go Through The Pre-Link Before Attending The Muapt Online Test Scheduled On 1 August'2018
1 page

Exploratory Data Analysis unit 2

Uploaded by

Exploratory Data Analysis unit 2

Uploaded by

Exploratory Data

2. Guidance for Further Analysis:

In summary, understanding and verifying the underlying assumptions in EDA is

The Role of Graphics

Autocorrelation is typically detected via an autocorrelation plot or

1. The location may be drifting.

4. The usual formula for the uncertainty of the mean:

5. The location estimate may be poor.

If the run sequence plot does not support the assumption of

The difference is the sequence and focus of the intermediate

Problem => Data => Model => Analysis =>

For EDA, the sequence is

Problem => Data => Analysis => Model =>

For Bayesian, the sequence is

Problem => Data => Model => Prior Distribution =>

Exploratory :The Exploratory Data Analysis approach does not

Classical The two approaches differ substantially in focus.

Exploratory For exploratory data analysis, the focus is on the

Classical Classical techniques are generally quantitative in

Exploratory EDA techniques are generally graphical. They

Classical Classical techniques serve as the probabilistic

Exploratory EDA techniques do not share in that rigor or

Classical Classical estimation techniques have the

Exploratory The EDA approach, on the other hand, often

Classical The "good news" of the classical approach is that

Exploratory In contrast, EDA has as its broadest goal the desire

You might also like