4. Visualization
4. Visualization
n EXPLORATORY DATA
ANALYSIS (EDA)
n Descriptive data
summarization
EDA
n Why EDA -data should first be explored without
assumptions about probabilistic models, error
distributions, relationships between the variables,
etc.
n Purpose -
n discover what the data can tell us about the phenomena we
are investigating.
n visualize the underlying process that generated it
n For example,
n Is the distribution that generated the data is
symmetric or skewed.
n Has one mode or many modes.
n histograms, and
n scatter plots.
n Helpful for the visual inspection of data, which is useful for data
preprocessing. The first three of these show univariate distributions (i.e.,
data for one attribute), while scatter plots show bivariate distributions (i.e.,
involving two attributes).
n A quantile plot is a simple and effective way to have a first look at
a univariate data distribution.
n First, it displays all of the data for the given attribute (allowing the user
z = f(x,y) ,
n Histogram
n Boxplot
n Quantile plot: each value xi is paired with fi indicating
that approximately 100 fi % of data are £ xi
n Quantile-quantile (q-q) plot: graphs the quantiles of one
univariant distribution against the corresponding quantiles
of another
n Scatter plot: each pair of values is a pair of coordinates
and plotted as points in the plane
n Loess (local regression) curve: add a smooth curve to a
scatter plot to provide better perception of the pattern of
dependence