0% found this document useful (0 votes)

32 views24 pages

EDA Unit-2

Exploratory Data Analysis (EDA) is a data analysis philosophy that emphasizes graphical techniques to gain insights, uncover structures, and identify important variables in a dataset. It differs from classical data analysis by focusing on data exploration without imposing models upfront, while classical analysis emphasizes model parameter estimation. EDA aims to maximize insight into data, detect outliers, and suggest appropriate models, making it an active and insightful approach compared to passive summary analysis.

Uploaded by

chitransh04shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views24 pages

EDA Unit-2

Uploaded by

chitransh04shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT-2

What is EDA?
Approach
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
1. maximize insight into a data set;
2. uncover underlying structure
3. extract important variables;
4. detect outliers and anomalies;
5. test underlying assumptions;
6. develop parsimonious models
7. determine optimal factor settings.
Focus
The EDA approach is precisely that--an approach--not a set of techniques, but an attitude/philosophy about how a data analysis should be
carried out.
Most EDA techniques are graphical in nature with a few quantitative techniques. The reason for the
heavy reliance graphics is that by its very nature the main role of EDA is to open-mindedly explore,
and graphics gives the analysts unparalleled power to do so, enticing the data to reveal its structural
secrets, and being always ready to gain some new, often unsuspected, insight into the data. In
combination with the natural pattern-recognition capabilities that we all possess, graphics provides,
of course, unparalleled power to carry this out The particular graphical techniques employed in EDA
areoften quite simple, consisting of various techniques of:

Plotting the raw data (such as data traces, histograms, bi histograms, probability plots, lag plots, block
plots, and Youden plots.
Plotting simple statistics such as mean plots, standard deviation plots, box plots, and main effects plots
of theraw data.

Positioning such plots so as to maximize our naturalpattern-recognition abilities, such as using

multiple plots per page.

How Does Exploratory Data Analysis differ from Classical

Data Analysis?
EDA is a data analysis approach. What other data analysis approaches exist and how does EDA
differ from these other approaches? Three popular data analysis approaches are:

1. Classical
2. Exploratory (EDA)
3. Bayesian
These three approaches are similar in that they all start witha general science/engineering problem
and all yield science/engineering conclusions. The difference is the sequence and focus of the
intermediate steps.
For classical analysis, the sequence is

Problem => Data => Model => Analysis =>Conclusions

For EDA, the sequence is

Problem => Data => Analysis => Model =>Conclusions

For Bayesian, the sequence is

Problem => Data => Model => Prior Distribution =>Analysis => Conclusions

Thus for classical analysis, the data collection is followed bythe imposition of a model (normality,
linearity, etc.) and the analysis, estimation, and testing that follows are focused on the parameters of that
model. For EDA, the data collection is not followed by a model imposition; rather it is followed
immediately by analysis with a goal of inferring what model would be appropriate. Finally, for a
Bayesian analysis, the analyst attempts to incorporate scientific/engineering knowledge/expertise into the
analysis by imposing a data- independent distribution on the parameters of the selected model; the
analysis thus consists of formally combining boththe prior distribution on the parameters and the
collected data to jointly make inferences and/or test assumptions aboutthe model parameters.

In the real world, data analysts freely mix elements of all ofthe above three approaches (and other
approaches). The above distinctions were made to emphasize the major differences among the three
approaches

Focusing on EDA versus classical, these two approaches differ as follows:

1.Model
2.Focus
3.Techniques
4.Rigor
5.Data Treatment
6.Assumptions

1.Model
Classical The classical approach imposes models (both deterministic and and
probabilistic) on the data.
Deterministic models include, for example, regression models and analysis of
variance (ANOVA) models. The most common probabilistic model assumes that the
errors about the deterministic model are normally distributed--this assumption
affects the validity of the ANOVA F tests.

Exploratory The Exploratory Data Analysis approach does not imposedeterministic or

probabilistic models on the data. On the contrary, the EDA approach allows the
data to suggest admissible models that best fit the data.
2.Focus

Classical The two approaches differ substantially in focus. For classicalanalysis, the focus is on
the model--estimating parameters of the model and generating predicted values from
the model.

Exploratory For exploratory data analysis, the focus is on the data--itsstructure, outliers, and
models suggested by the data.

3.Techniques

Classical Classical techniques are generally quantitative in nature. Theyinclude ANOVA, t tests,
chi-squared tests, and F tests.

Exploratory EDA techniques are generally graphical. They include scatterplots, character plots,
box plots, histograms, bihistograms, probability plots, residual plots, and mean plots.
4.Rigor

Classical Classical techniques serve as the probabilistic foundation of science and engineering;
the most important characteristic ofclassical techniques is that they are rigorous,
formal, and "objective".

Exploratory EDA techniques do not share in that rigor or formality. EDAtechniques make up for
that lack of rigor by being very suggestive, indicative, and insightful about what the
appropriate model should be.

EDA techniques are subjective and depend on interpretationwhich may differ from
analyst to analyst, although experienced analysts commonly arrive at identical
conclusions.

5.Data Treatment

Classical Classical estimation techniques have the characteristic of taking all of the data and
mapping the data into a few numbers ("estimates"). This is both a virtue and a vice.
The virtue is that these few numbers focus on important characteristics (location,
variation, etc.) of the population. The vice is that concentrating on these few
characteristics can filter out other characteristics (skewness, tail length,
autocorrelation, etc.) of the same population. In this sense there is a loss of
information due to this "filtering" process.

Exploratory The EDA approach, on the other hand, often makes use of (and shows) all of the
available data. In this sense there is no corresponding loss of information
6.Assumptions

Classical The "good news" of the classical approach is that tests based on classical techniques
are usually very sensitive--that is, if atrue shift in location, say, has occurred, such
tests frequently have the power to detect such a shift and to conclude that such a shift
is "statistically significant". The "bad news" is that classical tests depend on
underlying assumptions (e.g., normality), and hence the validity of the test
conclusions becomes dependent on the validity of the underlying assumptions. Worse
yet, the exact underlying assumptions may be unknown to the analyst, or if known,
untested. Thus the validity of the scientific conclusions becomes intrinsicallylinked to
the validity of the underlying assumptions. In practice, if such assumptions are
unknown or untested, the validity of the scientific conclusions becomes suspect.

Exploratory Many EDA techniques make little or no assumptions--they present and show the
data--all of the data--as is, with fewerencumbering assumptions.
How Does Exploratory Data Analysis Differ from
Summary Analysis?

Summary A summary analysis is simply a numeric reduction of a historical data set. It is quite
passive. Its focus is in the past. Quite commonly, its purpose is to simply arrive at a
few keystatistics (for example, mean and standard deviation) which may then either
replace the data set or be added to the data set in the form of a summary table.

Exploratory In contrast, EDA has as its broadest goal the desire to gain insight into the
engineering/scientific process behind the data. Whereas summary statistics are passive
and historical, EDA is active and futuristic. In an attempt to "understand" the process
and improve it in the future, EDA uses the data as a "window" to peer into the heart
of the process that generated the data. There is an archival role in the research and
manufacturing world for summary statistics, but there is an enormously larger role for
the EDA approach.

Insight implies detecting and uncovering underlying structure in the data. Such underlying structure
may not be encapsulated in the list of items above; such items serve as the specific targets of an
analysis, but the real insight and "feel" for a data set comes as the analyst judiciously probes and
explores the various subtleties of the data. The "feel" for the data comes almost exclusively from the
application of various graphical techniques, the collection of which serves as the window into the
essence of the data. Graphics are irreplaceable--there are no quantitative analogues that will give the
same insight as well-chosen graphics.

To get a "feel" for the data, it is not enough for the analyst toknow what is in the data; the analyst
also must know what isnot in the data, and the only way to do that is to draw on our own human
pattern-recognition and comparative abilities in the context of a series of judicious graphical
techniques applied to the data.

What are the EDA Goals?

Primary and Secondary Goals
The primary goal of EDA is to maximize the analyst's insight into a data set and into the underlying
structure of a data set, while providing all of the specific items that an analyst wouldwant to extract
from a data set, such as:

1. a good-fitting, parsimonious model

2. a list of outliers
3. a sense of robustness of conclusions
4. estimates for parameters
5. uncertainties for those estimates
6. a ranked list of important factors
7. conclusions as to whether individual factors arestatistically significant
8. optimal settings

Insight into theData

Insight implies detecting and uncovering underlying structure in the data. Such underlying
structure may not be encapsulated in the list of items above; such items serve as the specific targets of
an analysis, but the real insight and "feel" for a data set comes as the analyst judiciously probes and
explores the various subtleties of the data. The "feel" for the data comes almost exclusively from the
application of various graphical techniques, the collection of which serves as the window into the
essence of the data. Graphics are irreplaceable--there are no quantitative analogues that will give the
same insight as well-chosen graphics.

The Role of Graphics

Statistics and data analysis procedures can broadly be split into two parts:
1.Quantitative
2.Graphical

1.Quantitative techniques are the set of statistical proceduresthat yield numeric or tabular output.
Examples of quantitative techniques include:
hypothesis testing
analysis of variance
Point Estimates and Confidence Intervals
Least square regression

2.Graphical
On the other hand, there is a large collection of statistical tools that we generally refer to as graphical
techniques. These include:
scatterplots histograms probability plots
residual plots
box plots
block plots
The EDA approach relies heavily on these and similar graphical techniques. Graphical procedures are
not just toolsthat we could use in an EDA context, they are tools that we must use. Such graphical tools
are the shortest path to gaining insight into a data set in terms of:
testing assumptions
model selection
model validation
estimator selection
relationship identification
factor effect determination
outlier detection

If one is not using statistical graphics, then one is forfeiting insight into one or more aspects of the
underlying structure of the data.
An EDA/Graphics Example
Data:

X Y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68

If the goal of the analysis is to compute summary statistics plus determine the best linear fit for Y as a
function of X, the results might be given as:
N = 11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816
The above quantitative analysis, although valuable, gives us only limited insight into the data.
Scatter Plot-
In contrast, the following simple scatter plot of the data

suggests the following: 1. The data set "behaves like" a linear curve with some scatter;
2. there is no justification for a more complicated model (e.g., quadratic);
3. there are no outliers;
4. the vertical spread of the data appears to be of equal height irrespective of the X-value; this indicates
that the data are equally-precise throughout and so a "regular" (that is, equi-weighted) fit is appropriate.

This kind of characterization for the data serves as the core for getting insight/feel for the data. Such
insight/feel does not come from the quantitative statistics; on the contrary, calculations of
quantitative statistics such as intercept and slope should be subsequent to the characterization and
will make sense only if the characterization is true. To illustrate the loss of information that results
when the graphics insight step is skipped, consider the following three data sets .

X2 Y2 X3 Y3 X4 Y4
10.00 9.14 10.00 7.46 8.00 6.58
8.00 8.14 8.00 6.77 8.00 5.76
13.00 8.74 13.00 12.74 8.00 7.71
9.00 8.77 9.00 7.11 8.00 8.84
11.00 9.26 11.00 7.81 8.00 8.47
14.00 8.10 14.00 8.84 8.00 7.04
6.00 6.13 6.00 6.08 8.00 5.25
4.00 3.10 4.00 5.39 19.00 12.50
12.00 9.13 12.00 8.15 8.00 5.56
7.00 7.26 7.00 6.42 8.00 7.91
5.00 4.74 5.00 5.73 8.00 6.89

A quantitative analysis on data set 2 yields

N = 11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816 which is identical to the analysis for data set 1.
One might naively assume that the two data sets are "equivalent" since that is what the statistics tell
us; but what do the statistics not tell us?
Remarkably, a quantitative analysis on data sets 3 and 4 also yields N = 11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.236
Correlation = 0.816 (0.817 for data set 4) which implies that in some quantitative sense, all four of
the data sets are "equivalent". In fact, the four data sets are far from "equivalent" and a scatter plot of
each data set, which would be step 1 of any EDA approach, would tell us that immediately.

Conclusions from the scatter plots are: 1. data set 1 is clearly linear with some scatter.
2. data set 2 is clearly quadratic.
3. data set 3 clearly has an outlier.
4. data set 4 is obviously the victim of a poor experimental design with a single point far removed
from the bulk of the data "wagging the dog".
Importance:
These points are exactly the substance that provide and define "insight" and "feel" for a data set.
They are the goals and the fruits of an open exploratory data analysis (EDA) approach to the data.
Quantitative statistics are not wrong per se, but they are incomplete. They are incomplete because
they are numeric summaries which in the summarization operation do a good job of focusing on a
particular aspect of the data (e.g., location, intercept, slope, degree of relatedness, etc.) by judiciously
reducing the data to a few numbers. Doing so also filters the data, necessarily omitting and screening
out other sometimes crucial information in the focusing operation. Quantitative statistics focus but
also filter; and filtering is exactly what makes the quantitative approach incomplete at best and
misleading at worst. The estimated intercepts (= 3) and slopes (= 0.5) for data sets 2, 3, and 4 are
misleading because the estimation is done in the context of an assumed linear model and that
linearity assumption is the fatal flaw in this analysis. The EDA approach of deliberately postponing
the model selection until further along in the analysis has many rewards, not the least of which is the
ultimate convergence to a much-improved model and the formulation of valid and supportable
scientific and engineering conclusions.

EDA Assumptions:
The gamut of scientific and engineering experimentation is virtually limitless. In this sea of diversity
is there any common basis that allows the analyst to systematically and validly arrive at supportable,
repeatable research conclusions? Fortunately, there is such a basis and it is rooted in the fact that
every measurement process, however complicated, has certain underlying assumptions. This section
deals with what those assumptions are, why they are important, how to go about testing them, and
what the consequences are if the assumptions do not hold.
Underlying Assumptions
There are four assumptions that typically underlie all measurement processes; namely, that the data
from the process at hand "behave like":
1. random drawings;
2. from a fixed distribution;
3. with the distribution having fixed location; and
4. with the distribution having fixed variation.
The "fixed location" referred to in item 3 above differs for different problem types. The simplest
problem type is univariate; that is, a single variable.
For the univariate problem, the general model
response = deterministic component + random component becomes
response = constant + error
or this case, the "fixed location" is simply the unknown constant. We can thus imagine the process at
hand to be operating under constant conditions that produce a single column of data with the
properties that the data are uncorrelated with one another;
the random component has a fixed distribution;
the deterministic component consists of only a constant;
and the random component has fixed variation
Importance
Predictability is an all-important goal in science and engineering. If the four underlying assumptions
hold, then we have achieved probabilistic predictability--the ability to make probability statements
not only about the process in the past, but also about the process in the future. In short, such
processes are said to be "in statistical control".
Moreover, if the four assumptions are valid, then the process is amenable to the generation of valid
scientific and engineering conclusions. If the four assumptions are not valid, then the process is
drifting (with respect to location, variation, or distribution), unpredictable, and out of control. A
simple characterization of such processes by a location estimate, a variation estimate, or a
distribution "estimate" inevitably leads to engineering conclusions that are not valid, are not
supportable (scientifically or legally), and which are not repeatable in the laboratory.
Techniques for Testing Assumption
The following EDA techniques are simple, efficient, and powerful for the routine testing of
underlying assumptions:
1. run sequence plot (Yi versus i)
2. lag plot (Yi versus Yi-1)
3. histogram (counts versus subgroups of Y)
4. normal probability plot (ordered Y versus theoretical ordered Y)
The four EDA plots can be juxtaposed for a quick look at the characteristics of the data. The plots
below are ordered as follows:
1. Run sequence plot - upper left
2. Lag plot - upper right
3. Histogram - lower left
4. Normal probability plot - lower right.

This 4-plot reveals a process that has fixed location, fixed variation, is random, apparently has a fixed approximately normal distribution,
and has no outliers.
Sample Plot: Assumptions Do Not Hold
If one or more of the four underlying assumptions do not hold, then it will show up in the various plots as demonstrated in the following
example
This 4-plot reveals a process that has fixed location, fixed variation, is non-random (oscillatory), has a
non-normal, U-shaped distribution, and has several outliers.

Interpretation of 4-Plot
The four EDA plots discussed on the previous page are used to test the underlying assumptions:
1. Fixed Location: If the fixed location assumption holds, then the run sequence plot will be flat and non-
drifting.
2. Fixed Variation: If the fixed variation assumption holds, then the vertical spread in the run sequence plot
will be the approximately the same over the entire horizontal axis.
3. Randomness: If the randomness assumption holds, then the lag plot will be structureless and random.
4. Fixed Distribution: If the fixed distribution assumption holds, in particular if the fixed normal distribution
holds, then
1. the histogram will be bell-shaped,
and 2. the normal probability plot will be linear.
Conversely, the underlying assumptions are tested using the EDA plots:
Run Sequence Plot: If the run sequence plot is flat and non-drifting, the fixed-location assumption holds. If
the run sequence plot has a vertical spread that is about the same over the entire plot, then the fixed-variation
assumption holds.
Lag Plot: If the lag plot is structureless, then the randomness assumption holds.
Histogram: If the histogram is bell-shaped, the underlying distribution is symmetric and perhaps
approximately normal
Normal Probability Plot: If the normal probability plot is linear, the underlying distribution is approximately
normal.

If all four of the assumptions hold, then the process is said definitionally to be "in statistical control".

Consequences
The primary goal is to have correct, validated, and complete scientific/engineering conclusions flowing from
the analysis. This usually includes intermediate goals such as the derivation of a good-fitting model and the
computation of realistic parameter estimates. It should always include the ultimate goal of an understanding
and a "feel" for "what makes the process tick". There is no more powerful catalyst for discovery than the
bringing together of an experienced/expert scientist/engineer and a data set ripe with intriguing "anomalies"
and characteristics.
The following sections discuss in more detail the consequences of invalid assumptions:
1. Consequences of non-randomness
2. Consequences of non-fixed location parameter
3. Consequences of non-fixed variation
4. Consequences related to distributional assumptions

Consequences of Non-Randomness
There are four underlying assumptions:
1. randomness;
2. fixed location;
3. fixed variation; and
4. fixed distribution.
The randomness assumption is the most critical but the least tested.
Consequeces of NonRandomness
If the randomness assumption does not hold, then
1. All of the usual statistical tests are invalid.
2. The calculated uncertainties for commonly used statistics become meaningless.
3. The calculated minimal sample size required for a pre-specified tolerance becomes meaningless.
4. The simple model: y = constant + error becomes invalid.
5. The parameter estimates become suspect and nonsupportable
One specific and common type of non-randomness is autocorrelation. Autocorrelation is the correlation
between Yt and Yt-k, where k is an integer that defines the lag for the autocorrelation. That is,
autocorrelation is a time dependent non-randomness.
This means that the value of the current point is highly dependent on the previous point if k = 1 (or k points
ago if k is not 1). Autocorrelation is typically detected via an autocorrelation plot or a lag plot. If the data are
not random due to autocorrelation, then
1. Adjacent data values may be related.
2. There may not be n independent snapshots of the phenomenon under study.
3. There may be undetected "junk"-outliers.
4. There may be undetected "information-rich"- outliers.

Consequences of Non-Fixed Location Parameter

The usual estimate of location is the mean
from N measurements Y1, Y2, ... , YN.

If the run sequence plot does not support the assumption of fixed location, then

1. The location may be drifting.

2. The single location estimate may be meaningless (if the process is drifting).

3. The choice of location estimator (e.g., the sample mean) may be sub-optimal.

4. The usual formula for the uncertainty of the mean:

5. The location estimate may be poor.

6. The location estimate may be biased

Consequences of Non-Fixed Variation Parameter

The usual estimate of variation is the standard deviation
from N measurements Y1, Y2, ... , YN.
Consequences of Non-Fixed Variation If the run sequence plot does not support the assumption of fixed
variation, then
1. The variation may be drifting.
2. The single variation estimate may be meaningless (if the process variation is drifting).
3. The variation estimate may be poor.
4. The variation estimate may be biased
Consequences Related to Distributional Assumptions
For certain distributions, the mean is a poor choice. For any given distribution, there exists an optimal
choice-- that is, the estimator with minimum variability/noisiness. This optimal choice may be, for example,
the median, the midrange, the midmean, the mean, or something else. The implication of this is to "estimate"
the distribution first, and then--based on the distribution--choose the optimal estimator. The resulting
engineering parameter estimators will have less variability than if this approach is not followed.
Other consequences that flow from problems with distributional assumptions are:
Distribution 1. The distribution may be changing.
2. The single distribution estimate may be meaningless (if the process distribution is
changing).
3. The distribution may be markedly non-normal.
4. The distribution may be unknown.
5. The true probability distribution for the error may remain unknown.
Model 1. The model may be changing.
2. The single model estimate may be meaningless.
3. The default model Y = constant + error may be invalid.
4. If the default model is insufficient, information about a better model may remain undetected.
5. A poor deterministic model may be fit.
6. Information about an improved model may go undetected

Exploratory Data Analysis
100% (3)
Exploratory Data Analysis
791 pages
Exploratory Data Analysis: A First Look at The Data
No ratings yet
Exploratory Data Analysis: A First Look at The Data
9 pages
Exploratory Data Analysis-Engineering Statistics Handbook NIST 2002
100% (1)
Exploratory Data Analysis-Engineering Statistics Handbook NIST 2002
804 pages
FDS Unit I
No ratings yet
FDS Unit I
404 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
18 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
Data Science - Module 2 (Updated)
No ratings yet
Data Science - Module 2 (Updated)
94 pages
Eda Important Two Marks & 16 Marks
0% (1)
Eda Important Two Marks & 16 Marks
17 pages
Exploratory Data Analysis-Engineering Statistics Handbook
100% (2)
Exploratory Data Analysis-Engineering Statistics Handbook
790 pages
Exploratory Data Analysis Engineering Statistics Handbook PDF
100% (1)
Exploratory Data Analysis Engineering Statistics Handbook PDF
790 pages
Eda 1
No ratings yet
Eda 1
25 pages
Digital Image Processing Unit 1
No ratings yet
Digital Image Processing Unit 1
38 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
1 Eda
100% (2)
1 Eda
822 pages
Engineering Statistics Handbook 2003
No ratings yet
Engineering Statistics Handbook 2003
1,522 pages
10 A)
No ratings yet
10 A)
2 pages
Digital Image Processing Unit 2
No ratings yet
Digital Image Processing Unit 2
82 pages
EDA Introduction
No ratings yet
EDA Introduction
1 page
Best Journal
No ratings yet
Best Journal
11 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
1 Exploratory Data Analysis
100% (1)
1 Exploratory Data Analysis
786 pages
Exploratory Data Analysis Types
No ratings yet
Exploratory Data Analysis Types
14 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
How Does Exploratory Data Analysis Differ From Classical Data Analysis
No ratings yet
How Does Exploratory Data Analysis Differ From Classical Data Analysis
2 pages
Chapter 7 SQQS1033
No ratings yet
Chapter 7 SQQS1033
37 pages
Exploratory Data Analysis (EDA) in Data
No ratings yet
Exploratory Data Analysis (EDA) in Data
12 pages
CHAPTER 1 (Exploratory Data Analysis)
No ratings yet
CHAPTER 1 (Exploratory Data Analysis)
789 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Comparing Tools Provided by Python and R For Exploratory Data Analysis
No ratings yet
Comparing Tools Provided by Python and R For Exploratory Data Analysis
12 pages
Eda 2
No ratings yet
Eda 2
69 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
Unit 1 Eda Qa (2marks)
No ratings yet
Unit 1 Eda Qa (2marks)
4 pages
What Is EDA
No ratings yet
What Is EDA
2 pages
EDA
No ratings yet
EDA
3 pages
Exploratory Data Analysis Stephan Morgenthaler (2009)
100% (2)
Exploratory Data Analysis Stephan Morgenthaler (2009)
12 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
5 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Exploratory Data Analysis Unit 2
No ratings yet
Exploratory Data Analysis Unit 2
39 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Unit 1
No ratings yet
Unit 1
52 pages
Unit 3
No ratings yet
Unit 3
77 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Module 2
No ratings yet
Module 2
81 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
IOT Domain
No ratings yet
IOT Domain
70 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
E Data Analysis
No ratings yet
E Data Analysis
2 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Unit 1
No ratings yet
Unit 1
19 pages
Design Rainfall Data and Analysis
No ratings yet
Design Rainfall Data and Analysis
213 pages
Book 2.0 - Python
100% (1)
Book 2.0 - Python
143 pages
Unit 3
No ratings yet
Unit 3
47 pages
Roofit Tutorial Plot
No ratings yet
Roofit Tutorial Plot
57 pages
LS-LSIT Video Exam Preparation Course Workbook
100% (1)
LS-LSIT Video Exam Preparation Course Workbook
330 pages
1 205
No ratings yet
1 205
205 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
2 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Ebook - Econometrics Handbook PDF
No ratings yet
Ebook - Econometrics Handbook PDF
317 pages
Randomized Complete Block Design: Arif Rahman
No ratings yet
Randomized Complete Block Design: Arif Rahman
34 pages
Quiz2 Mock Solutions
No ratings yet
Quiz2 Mock Solutions
19 pages
Problem in Regression Analysis
No ratings yet
Problem in Regression Analysis
7 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
Pss Stata 2017
No ratings yet
Pss Stata 2017
27 pages
G-01 KAN Guide On Measurement Uncertainty (En)
No ratings yet
G-01 KAN Guide On Measurement Uncertainty (En)
32 pages
MODULE 1 CE 214 Fundamentals of Surveying
No ratings yet
MODULE 1 CE 214 Fundamentals of Surveying
42 pages
Analysis of Variance
No ratings yet
Analysis of Variance
14 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Chapter 14 - Nonlinear Regression Models
No ratings yet
Chapter 14 - Nonlinear Regression Models
20 pages
NLP 2K22 MAY CS3EA06 Natural Language Processing
No ratings yet
NLP 2K22 MAY CS3EA06 Natural Language Processing
2 pages
Model Development For Entry Capacity Estimation of Selected Roundabouts of Nepal
No ratings yet
Model Development For Entry Capacity Estimation of Selected Roundabouts of Nepal
53 pages
Pooled Ols Model
No ratings yet
Pooled Ols Model
3 pages
The Black-Litterman Model in Detail
No ratings yet
The Black-Litterman Model in Detail
56 pages
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
No ratings yet
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
3 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
29 pages
Price Reversals: Bid-Ask Errors or Market Overreaction?
No ratings yet
Price Reversals: Bid-Ask Errors or Market Overreaction?
27 pages
RA3CO42 Digital Image Processing QP
No ratings yet
RA3CO42 Digital Image Processing QP
2 pages
Unit 1
No ratings yet
Unit 1
23 pages
Stat Research
No ratings yet
Stat Research
12 pages
Experiment 1
No ratings yet
Experiment 1
16 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Sperm Competition in A Fish With External Fertilization: The Contribution of Sperm Number, Speed and Length
No ratings yet
Sperm Competition in A Fish With External Fertilization: The Contribution of Sperm Number, Speed and Length
9 pages
Bluetooth Indoor Positioning Using RSSI and Least Square Estimation
No ratings yet
Bluetooth Indoor Positioning Using RSSI and Least Square Estimation
5 pages
2 Quantitative Techniques
No ratings yet
2 Quantitative Techniques
9 pages
Experiment No. 1
No ratings yet
Experiment No. 1
5 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
A Bird's Eye view of Data Visualisation
From Everand
A Bird's Eye view of Data Visualisation
Nisarg Patel
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

EDA Unit-2

Uploaded by

EDA Unit-2

Uploaded by

UNIT-2

Positioning such plots so as to maximize our naturalpattern-recognition abilities, such as using

How Does Exploratory Data Analysis differ from Classical

Problem => Data => Model => Analysis =>Conclusions

For EDA, the sequence is

Problem => Data => Analysis => Model =>Conclusions

For Bayesian, the sequence is

Focusing on EDA versus classical, these two approaches differ as follows:

Exploratory The Exploratory Data Analysis approach does not imposedeterministic or

What are the EDA Goals?

1. a good-fitting, parsimonious model

Insight into theData

The Role of Graphics

A quantitative analysis on data set 2 yields

Consequences of Non-Fixed Location Parameter

1. The location may be drifting.

4. The usual formula for the uncertainty of the mean:

5. The location estimate may be poor.

Consequences of Non-Fixed Variation Parameter

You might also like