Exploratory Data Analysis types
Exploratory Data Analysis types
Definition
• In statistics, Exploratory Data Analysis is an approach of
analyzing data sets to summarize their main
characteristics, often using statistical graphics and
other data visualization methods.
• EDA employs a variety of techniques (mostly graphical)
to maximize insight into a data set;
– uncover underlying structure;
– extract important variables;
– detect outliers and anomalies;
– test underlying assumptions;
– develop parsimonious models; and
– determine optimal factor settings.
https://en.wikipedia.org/wiki/Exploratory_data_analysis
Steps in Data Exploration
• Identification of variables and data types
• Analyzing the basic metrics
• Non-Graphical Univariate Analysis
• Graphical Univariate Analysis
• Bivariate Analysis
• Variable transformations
• Missing value treatment
• Outlier treatment
• Correlation Analysis
• Dimensionality Reduction
https://towardsai.net/p/data-analysis/exploratory-data-analysis-in-python-ebdf643a33f6
Typical Data format
• Generally in csv format – rectangular array
– with one row per experimental subject
– one column for each subject identifier
– outcome variable
– explanatory variable
• Each column contains the numeric values for a
particular quantitative variable or the levels
for a categorical variable.
https://www.stat.cmu.edu/~hseltman/309/Book/chapter4.pdf
Types of EDA The four types of EDA are
univariate non-graphical
multivariate nongraphical
univariate graphical,
multivariate graphical
• Categorical data
– Bar plot of the tabulation of the data
– Pie chart
• Quantitative data
– Histogram
– Stem and leaf plot
– Box plot
– Quantile
• Normal plot
Multivariate non-graphical EDA
• Cross-tabulation
– two-way table with column headings that match the levels
of one variable and row headings that match the levels of
the other variable, then filling in the counts of all subjects
that share a pair of levels.
Multivariate non-graphical EDA
• Correlation
– two-way table with column headings that match the levels
of one variable and row headings that match the levels of
the other variable, then filling in the counts of all subjects
that share a pair of levels.
Multivariate graphical EDA
• Scatter plot - For two quantitative variables