0% found this document useful (0 votes)
59 views33 pages

Unit - III Univariate Analysis

Unit 3 focuses on univariate analysis, which examines a single variable to identify patterns through numerical summaries such as mean, median, and mode. It discusses techniques for visualizing data distributions, including bar charts, pie charts, and histograms, as well as measures of spread like range and standard deviation. Additionally, the unit covers the conversion of interval variables to ordinal variables and the importance of quartiles and percentiles in data analysis.

Uploaded by

mk4997320
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views33 pages

Unit - III Univariate Analysis

Unit 3 focuses on univariate analysis, which examines a single variable to identify patterns through numerical summaries such as mean, median, and mode. It discusses techniques for visualizing data distributions, including bar charts, pie charts, and histograms, as well as measures of spread like range and standard deviation. Additionally, the unit covers the conversion of interval variables to ordinal variables and the importance of quartiles and percentiles in data analysis.

Uploaded by

mk4997320
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT - 3

UNIVARIATE
ANALYSIS
UNIT III UNIVARIATE ANALYSIS
Introduction to Single variable:
Distributions and Variables - Numerical
Summaries of Level and Spread - Scaling
and Standardizing – Inequality - Smoothing
Time Series.
I A T E
I VA R
U N ?
LY S I S
A N A
UNIVARIATE ANALYSIS
• Univariate analysis is a basic kind of analysis technique for statistical data.
• Uni - One, here the data contains just one variable.
• For example consider a survey of a classroom.
• The analysts would want to count the number of boys and girls in the room.
• The data here simply talks about the number which is a single variable and
the variable quantity.
• The main objective of the univariate analysis is to describe the data in order
to find out the patterns in the data.
• This is done by looking at the mean, mode, median, standard deviation,
dispersion, etc.
u t i o n
i s tr i b
D b l e s ?
V a r i a
a n d
VARIABLES ON HOUSEHOLD
SURVEY

1 - Hardly drink at 3 - Drink a moderate


5 - Drink
all amount
REDUCING THE NUMBER OF DIGITS
• 134, 121, 167 - Two varying digits
• 0.034, 0.045, 0.062 - Two varying digits
• 0.67, 1.31, 0.92 - Three varying digits
• There are two techniques for reducing the number of
digits
1. Rounding (8.47 = 8.5)
2. Cutting off / Truncating (899.945
= 899)
BAR CHARTS AND PIE CHARTS
• To visualize how any variable is distributed across our
cases.
• How a nominal or ordinal variables, such as drinking
classification represented pictorially?
1. Bar chart
2. Pie Chart
FEATURE VISIBLE IN
HISTOGRAMS
• Histogram allows inspection of four important
aspects of any distribution
1. Level
2. Spread
3. Shape
4. Outliers
• Level - What are typical values in the
distribution?
• Spread - Do the values differ much from one
another?
• Shape - Is the distribution flat or peaked?
• Outliers - Are there any unusual values?
• Unimodel - Distributions with one peak

• Bimodel - Distributions with two peaks


FROM INTERVAL LEVEL TO ORDINAL
LEVEL VARIABLES - RECORDING
• A variable can be recorded in a survey at an interval
level.
• For eg. the maximum recommended weekly intake of
alcohol is 21 units for men and 14 units for women.
• For men, this interval variable is converted to an
ordinal variable as,
• no alcohol drunk (none)
• 1 - 21 units drunk (moderate drinking)
• Over 21 units drunk (heavy drinking)
FROM INTERVAL LEVEL TO ORDINAL
LEVEL VARIABLES - RECORDING
• For women, this interval variable is converted to

an ordinal variable as,

• no alcohol drunk (none)

• 1 - 14 units drunk (moderate drinking)

• Over 14 units drunk (heavy drinking)


NUMERICAL SUMMARIES OF
LEVEL AND SPREAD IN
UNIVARIATE ANALYSIS
• In univariate analysis, numerical summaries are used to
describe the distribution of a single variable.
• Two important aspects of the distribution are its
Central tendency (or level)
Variability (or spread)
LEVEL (CENTRAL TENDENCY)
MEASURES

• Mean

• Median

• Mode
LEVEL Mean
(CENTRAL • The mean is the average of all values in

TENDENCY) the dataset.


• It is calculated by summing up all values
MEASURES and dividing by the number of values.
• The mean represents the center of the
data.
LEVEL Median

(CENTRAL • The median is the middle value of a


dataset when it is sorted in numerical
TENDENCY) order.
• It separates the higher half of the data
MEASURES from the lower half and is less sensitive
to extreme values (outliers) than the
mean.
• For an odd number of observations: The
median is the middle value.
• For an even number of observations: The
median is the average of the two middle
values.
LEVEL Mode

(CENTRAL • The mode is the value that


TENDENCY) appears most frequently in
MEASURES the dataset.
SPREAD (VARIABILITY)
MEASURES
• Range
• Interquartile Range (IQR)
• Variance
• Standard Deviation
SPREAD Range

(VARIABILITY) • The range is the difference


between the maximum and
MEASURES minimum values in the dataset.
• It provides a simple measure of
the spread of the data.
Interquartile Range
SPREAD (IQR)
(VARIABILITY) • The IQR is the range between the
first quartile (Q1) and the third
MEASURES quartile (Q3).
• It is a measure of the dispersion
of the middle 50% of the data and
is less affected by outliers than
the range.
Variance
SPREAD • Variance measures how much each
(VARIABILITY) number in the dataset differs from
the mean.

MEASURES • It involves squaring the differences


from the mean, summing these
squares, and dividing by the
number of observations.
Standard
SPREAD Deviation
(VARIABILITY) • Standard deviation is the
square root of the variance.
MEASURES
• It provides a measure of the
average distance between
each data point and the
mean.
PERCENTILE
• The percent of data that is equal to
or less than a given data point.
• It’s useful for describing where a
data point stands within the data
set.
• If the percentile is close to zero,
then the observation is one of the
smallest.
• If the percentile is close to 100,
then the data point is one of the
largest in the data set.
PERCENTILE
ages =

[5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,6

1,31]

What is the 75. percentile?

• The answer is 43, meaning that 75% of the people

are 43 or younger.
QUARTILES
• Quartiles measure the center and it’s also great to describe the spread
of the data. Highly useful for skewed data. Quartiles are values that
separate the data into four equal parts.
• Minimum
• 25th percentile (lower quartile)
• 50th percentile (median)
• 75th percentile (upper quartile)
• 100th percentile (maximum)
QUARTILES
• The quartiles (Q0,Q1,Q2,Q3,Q4) are
the values that separate each
quarter.
• Between Q0 and Q1 are the 25%
lowest values in the data. Between
Q1 and Q2 are the next 25%. And so
on.
• Q0 is the smallest value in the data.
• Q1 is the value separating the first
quarter from the second quarter of
the data.
• Q2 is the middle value (median),
separating the bottom from the top
half.
• Q3 is the value separating the third
quarter from the fourth quarter.
• Q4 is the largest value in the data.
• A boxplot is one good way to plot the five-number summary and
explore the data set.
• The bottom end of the boxplot represents the minimum; the first
horizontal line represents the lower quartile; the line inside the
square is the median; the next line is the upper quartile, and the
top is the maximum.
PROPORTION
• It’s often referred to as “percentage”. Defines the percent of
observations in the data set that satisfy some requirements.
CORRELATION
• Defines the strength and direction of the association between two
quantitative variables. It ranges between -1 and 1.
• Positive correlations mean that one variable increases as the other
variable increases.
• Negative correlations mean that one variable decreases as the other
increases.
• When the correlation is zero, there is no correlation at all.
• As closest to one of the extreme the result is, stronger is the
association between the two variables.

You might also like