Statistic Part 2
Statistic Part 2
Measure of
Dispersion
Dispersion measures the spread or variability of
data
1. Range
2. Quartiles
Curve
3. Interquartile A
Range
4. Variance Curve
B
5. Standard
Deviation Curve
C
Mean
(A,B,C)
1. Range
• Difference between the highest and the lowest observed values in a
dataset
• Easy to understand and find
• Usefulness as a dispersion measure is limited – only 2 values are
considered
• Heavily influenced by extreme values
• Range values may change from one sample to another
• For open-ended class, there is no range
Values Max Min range
22 90 6 84
49
Example
78
6
78
76
44
90
18
63
49
62
2. Quartiles
• Division of data into 4 segments according to the distribution of Lowest
observation
values Q1
• The width of the four quartiles need not be the same
Q
• Each part contains 25% data S.No Data 2
• Quartiles are the highest values in each of the 4 parts 1
2
10
11 Q
• FormulaQ1 =to[(n+1)/4] value,
calculate the
th lower quartile
quartiles: 3 14 3
Q
Q2 = [(n+1)/2]nd value, middle 1
4
5
16
17 Q4
quartile 6 18 Highest
7 19 observation
Q3 = [3(n+1)/4]th value, upper 8 21
quartile value interpretation
quartile 9 21
1st 18.75 25% values are <= 18.75 Q 10 23
quartile 2 11 24
2nd 27.5 50% values are <= 27.5 12 26
quartile 13 29
3rd 35.25 75% values are <= 35.25 14 30
quartile 15 32
16 33
4th 45 100% values are <= 45 Q
Excel
quartile
17 34
3 18 35
calculation
quartile formula
19 36
1st quartile = 20 37
QUARTILES(<range>,1) 21 39
2nd quartile = Q 22 40
QUARTILES(<range>,2) 4 23 42
3rd quartile = 24 45
QUARTILES(<range>,3)
3. Interquartile Range
• Approximately measures how far from the median on either side to include one-half of
data
• IQR is the difference between the values of the first and third quartiles
Interquartile
Range
Q1 Q2 Q3
Media
n
4. Variance
• Average deviation from some measure of central tendency
• Every population / sample has variance
• Represented by the symbol σ2
• Formula to calculate variance
σ2 = (∑(x - μ)2) / N
• σ2 : population variance
• x : observed value
• μ : population mean
• N : total number of items in population
• Units of variance are squares of units of data – eg: squared miles, squared rupees
etc.
• Not intuitively clear or interpreted in the right way
5. Standard Deviation
• Square root of the average of the squared distances of observation from the
mean
• Represented by the symbol σ
• Formula
σ to calculate Standard
(∑( x - μ)Deviation:
2) /
σ 2
• Units = N same units as that of the data
of SD= are in the
• SD enables to determine, with a high accuracy, the values of the frequency distribution in
relation to
the mean
99 %
95 %
68
% • About 68% data lies within ±1 SD from the
mean
• About 95% data lies within ±2 SD from the
mean
• About 99% data lies within ±3 SD from the
mean
μ- μ - 2σ μ- μ μ+ μ+ μ-
3σ σ
Difference between Standard
Deviation and Variance
Standard deviation and variance are both measures of variability in a distribution,
but they differ in a few ways:
• Definition
Variance is the average of the squared differences between each data point and the
mean. Standard deviation is the square root of the variance.
• Units
Standard deviation is expressed in the same units as the original data, such as
minutes or meters. Variance is expressed in larger units, such as meters squared.
• Interpretation
Standard deviation measures how far apart the numbers in a data set are. A small
standard deviation means the data is tightly grouped around the mean, while a
larger standard deviation means the data is more spread out. Variance gives a value
to how much the numbers in a data set vary from the mean. A significant variance
means the data points are far away from the mean.
• In practice, standard deviation is probably preferred over variance because it has
the same units as the data. Variance is more often used in the background, such as
in theory or deriving something else.
You’re interested in calculating the standard deviation
of the exam scores of a national standardized test to see
if many people scored close to the mean or not. Use the
following dataset.
Test Taker Score
1 20
2 40
3 60
4 60
5 75
6 80
7 70
8 65
9 70
10 90
• In order to solve for the standard deviation, we have to
follow the formula given earlier. Take a look at
the solution below.
Test Taker Score
1 20 -43 1849
2 40 -23 529
3 60 -3 9
4 60 -3 9
5 75 12 144
6 80 17 289
7 70 7 49
8 65 2 4
9 70 7 49
10 90 27 729
63 3660
III. Measure of Association
Measures the relationship (degree and strength) between two variables that are linearly
related
1. Covariance
2. Correlation
3. Coefficient of
Variation
1. Covariance (+ -)
• Covariance is the joint variability of two random variables
• Measures the direction / sign of relationship only (+ or -) and not the
strength
• How X and Y variables are linearly associated, working in tandem
Eg: Weight lifter training time vs Sprinter training time
• Weight lifter trains more and lifts more weight (+)
• Covariance
• Trainer
measured
trainsas positive,
more and runsnegative or zero
in less time (-)
Positive: indicates direct or increase linear relationship
X up - Y up
X down – Y down
Negative: indicates indirect or decrease in linear
relationship
X up - Y down
X down – Y up
• Covariance can be any number and not restricted to 0 and 1
• Formula
• Sample CoVxy
= Ʃ(x-x’)(y-y’) / n-1 x and y are the 2 random variables
where
• Population CoVxy x̄ and ȳ are the means of the 2 random
ȳ̄
= Ʃ(x-x’)(y- )/ n variables
2. Correlation ( ) o
• Measures the degree to which one variable is linearly related to the other
• 2 measures are used to describe correlation
Coefficient of Correlation (r)
• 0 ≤ r ≤ -1 : Inverse relationship -> X-increases, Y-decreases
• 0 ≤ r ≤ 1 : direct relationship -> X-increases, Y-increases
• Measures the strength and direction
• Formula (Karl Pearson’s Coefficient of Correlation / Product
moment) r = covariance of x and y / (SD of x)*(SD of y)
SST=∑(yi−¯y) exp2
Where:
• yi – observed dependent variable
• ¯y – mean of the dependent variable
The SST tells us how close sample values are to the mean. As the SST
increases, so does the variability of the data.
Example of Calculating the SST for a
Sample with Low Variability
Calculate the SST for the following data:
{1, 2, 3}
Step 1: The mean of the sample can be calculated by adding up the values in the sample (1
+ 2 + 3) and dividing this sum by the number of values (3). Thus, the mean of this sample is:
y¯=(1+2+3)/3
=6/3
=2
Step 2: Subtract the calculated mean from each value, and square each difference.
1−2 =−1(−1) =1 =1
2−2 =0-0 =0 =0
3−2 =1-1 =1 =1
Step 3: Sum the differences.
SST =1+0+1 =2
Thus, the total sum of squares for the data {1, 2, 3} is 2.
What is the Absolute Deviation
Formula?
M = Σ (x – x̄ )/n
i
where,
M is the average absolute deviation,
x̄ is the mean of data set,
Σ (x – x̄ ) is the summation of deviations
i
from mean,
n is the number of values in data set.
What Is Average Deviation
Formula?
The formula for average deviation is utilized to determine
how much individual observations differ from the mean of
a data set. Presented below is the formula for computing
the average deviation across n observations:
= [|4 – 2| + |4 – 6| + |4 – 7| + |4 – 4| + |4 – 1|]/5
= (2 + 2 + 3 + 0 + 3)/5
= 10/5
=2