Unit - III Univariate Analysis
Unit - III Univariate Analysis
UNIVARIATE
ANALYSIS
UNIT III UNIVARIATE ANALYSIS
Introduction to Single variable:
Distributions and Variables - Numerical
Summaries of Level and Spread - Scaling
and Standardizing – Inequality - Smoothing
Time Series.
I A T E
I VA R
U N ?
LY S I S
A N A
UNIVARIATE ANALYSIS
• Univariate analysis is a basic kind of analysis technique for statistical data.
• Uni - One, here the data contains just one variable.
• For example consider a survey of a classroom.
• The analysts would want to count the number of boys and girls in the room.
• The data here simply talks about the number which is a single variable and
the variable quantity.
• The main objective of the univariate analysis is to describe the data in order
to find out the patterns in the data.
• This is done by looking at the mean, mode, median, standard deviation,
dispersion, etc.
u t i o n
i s tr i b
D b l e s ?
V a r i a
a n d
VARIABLES ON HOUSEHOLD
SURVEY
• Mean
• Median
• Mode
LEVEL Mean
(CENTRAL • The mean is the average of all values in
[5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,6
1,31]
are 43 or younger.
QUARTILES
• Quartiles measure the center and it’s also great to describe the spread
of the data. Highly useful for skewed data. Quartiles are values that
separate the data into four equal parts.
• Minimum
• 25th percentile (lower quartile)
• 50th percentile (median)
• 75th percentile (upper quartile)
• 100th percentile (maximum)
QUARTILES
• The quartiles (Q0,Q1,Q2,Q3,Q4) are
the values that separate each
quarter.
• Between Q0 and Q1 are the 25%
lowest values in the data. Between
Q1 and Q2 are the next 25%. And so
on.
• Q0 is the smallest value in the data.
• Q1 is the value separating the first
quarter from the second quarter of
the data.
• Q2 is the middle value (median),
separating the bottom from the top
half.
• Q3 is the value separating the third
quarter from the fourth quarter.
• Q4 is the largest value in the data.
• A boxplot is one good way to plot the five-number summary and
explore the data set.
• The bottom end of the boxplot represents the minimum; the first
horizontal line represents the lower quartile; the line inside the
square is the median; the next line is the upper quartile, and the
top is the maximum.
PROPORTION
• It’s often referred to as “percentage”. Defines the percent of
observations in the data set that satisfy some requirements.
CORRELATION
• Defines the strength and direction of the association between two
quantitative variables. It ranges between -1 and 1.
• Positive correlations mean that one variable increases as the other
variable increases.
• Negative correlations mean that one variable decreases as the other
increases.
• When the correlation is zero, there is no correlation at all.
• As closest to one of the extreme the result is, stronger is the
association between the two variables.