Chapter 1
Chapter 1
Numerical Descriptive
Measures
Book Chapter 3
Slide 1
Flow of Session
Measures of Central
Mean Median Mode Tendency
Covered
in MOP
Slide 3
Define Variables
(Measurement Scales)
Collect Data
(Data Sources)
Frequency Tables
Organizing and
Visualizing
(Tables & Charts)
Contingency Tables
Slide 4
Summary Definitions
▪ The shape is the pattern of the distribution of values from the lowest value
to the highest value.
Slide 5
Measures of Central Tendency: The Mean
◼ The arithmetic mean (often just called the “mean”) is the most common
measure of central tendency.
X i
X1 + X2 + + Xn
X= i=1
=
n n
Sample size Observed values
Slide 6
Measures of Central Tendency: The Mean (con’t)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14
11 + 12 + 13 + 14 + 15 65 11 + 12 + 13 + 14 + 20 70
= = 13 = = 14
5 5 5 5
Slide 7
Measures of Central Tendency: The Median
◼ In an ordered array, the median is the “middle” number (50% above, 50%
below).
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
◼ The location of the median when the values are in numerical order (smallest to largest):
n +1
Median position = position in the ordered data
2
◼ If the number of values is odd, the median is the middle number.
◼ If the number of values is even, the median is the average of the two middle numbers.
Note that n + 1 is not the value of the median, only the position of the median in the
2
ranked data.
Slide 9
Measures of Central Tendency: The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Slide 10
Measures of Central Tendency: Review Example
Slide 11
Measures of Central Tendency: Which Measure to Choose?
▪ The median is often used, since the median is not sensitive to extreme
values. For example, median home prices may be reported for a region; it
is less sensitive to outliers.
▪ In many situations it makes sense to report both the mean and the median.
Slide 12
Measures of Central Tendency: Summary
Central Tendency
X i
XG = ( X1 X2 Xn )1/ n
X= i=1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
Slide 13
Measures of Variation
Variation
Same center,
different variation
Slide 14
Measures of Variation: The Range
▪ Simplest measure of variation.
▪ Difference between the largest and the smallest values:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Slide 15
Measures of Variation: Why The Range Can Be
Misleading
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
▪ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Slide 16
Measures of Variation: The Sample Variance
Sample variance: n
(X
◼
i − X) 2
S =
2 i =1
n -1
n
◼ Sample standard deviation:
i
(X − X ) 2
S= i =1
n -1
Slide 18
Measures of Variation: The Sample Standard Deviation
Slide 19
Measures of Variation: Sample Standard Deviation
Calculation Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10 − X)2 + (12 − X)2 + (14 − X)2 + + (24 − X)2
S=
n −1
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Slide 21
Measures of Variation: Comparing Standard Deviations
Slide 22
Measures of Variation: Summary Characteristics
▪ The more the data are spread out, the greater the range, variance, and
standard deviation.
▪ The more the data are concentrated, the smaller the range, variance, and
standard deviation.
▪ If the values are all the same (no variation), all these measures will be zero.
Slide 23
Session 2
Slide 24
Exploring Numerical Data Using Quartiles
◼ Constructing a boxplot.
Slide 25
Quartile Measures
◼ Quartiles split the ranked data into 4 segments with an equal number of
values per segment.
25% 25% 25% 25%
Q1 Q2 Q3
◼ The first quartile, Q1, is the value for which 25% of the
values are smaller and 75% are larger.
◼ Q2 is the same as the median (50% of the values are
smaller and 50% are larger).
◼ Only 25% of the values are greater than the third quartile.
Slide 26
Quartile Measures: Locating Quartiles
Slide 27
Quartile Measures: Calculation Rules
◼ If the result is a fractional half (e.g., 2.5, 7.5, 8.5, etc.) then average the
two corresponding data values.
◼ If the result is not a whole number or a fractional half, then round the
result to the nearest integer to find the ranked position.
Slide 28
Calculating The Quartiles: Example
◼ The IQR is Q3 – Q1 and measures the spread in the middle 50% of the data.
◼ The IQR is also called the midspread because it covers the middle 50% of
the data.
◼ Measures like Q1, Q3, and IQR that are not influenced by outliers are called
resistant measures.
Slide 30
Calculating The Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Slide 31
The Five Number Summary
▪ The five numbers that help describe the center, spread and shape of data
are:
▪ Xsmallest.
▪ First Quartile (Q1).
▪ Median (Q2).
▪ Third Quartile (Q3).
▪ Xlargest.
Slide 32
Five Number Summary and The Boxplot
DCOVA
◼ The Boxplot: A Graphical display of the data based on the five-number
summary:
Slide 33
Five Number Summary: Shape of Boxplots
◼ If data are symmetric around the median, then the box and central line are
centered between the endpoints.
Slide 34
Distribution Shape and The Boxplot
Q1 Q2 Q 3 Q1 Q2 Q 3 Q1 Q2 Q3
Slide 35
Session 2
Numerical Descriptive
Measures
Book Chapter 3
Slide 36
Box Plot- Key Points
Slide 37
Distribution Shape and The Boxplot
Q1 Q2 Q 3 Q1 Q2 Q 3 Q1 Q2 Q3
Slide 38
Five-Number Summary
◼ Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide 39
Box Plot
Example: Apartment Rents
• A box is drawn with its ends located at the first and
third quartiles.
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Slide 40
Box Plot
Limits are located (not drawn) using the interquartile
range (IQR).
continued
Slide 41
Box Plot
◼ Example: Apartment Rents
• The lower limit is located 1.5(IQR) below Q1.
Slide 42
Box Plot
◼ Example: Apartment Rents
• Whiskers (dashed lines) are drawn from the ends of the box to the
smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
◼ Scatter plots allow you to visually examine the relationship between two
numerical variables and now we will discuss two quantitative measures of
such relationships.
◼ The Covariance.
◼ The Coefficient of Correlation.
Slide 44
The Covariance
◼ The covariance measures the strength of the linear relationship between two numerical
variables (X & Y).
( X − X)(Y − Y)
i i
cov ( X , Y ) = i=1
n −1
Slide 45
Interpreting Covariance
◼ Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction.
cov(X,Y) < 0 X and Y tend to move in opposite directions.
Slide 46
Coefficient of Correlation
◼ Measures the relative strength of the linear relationship between two
numerical variables.
◼ Sample coefficient of correlation:
cov (X, Y)
r=
SX SY
Where,
(X − X)(Y − Y)
n n
i i (X − X)
i
2
(Y − Y)
i
2
Slide 48
Scatter Plots of Sample Data with Various Coefficients of
Correlation
Y Y
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
Slide 49
Session 3
Numerical Descriptive
Measures
Book Chapter 3
Slide 50
Slide 51
APPLICATION
OF
MEASURES OF CENTRAL TENDANCE
AND
DISPERSION
Slide 52
The Coefficient of Variation
S
CV = 100%
X
Slide 53
Measures of Variation: Comparing Coefficients of Variation
◼ Stock A:
◼ Mean price last year = $50.
◼ Stock A:
◼ Mean price last year = $50.
▪ The Z-score is the number of standard deviations a data value is from the mean.
▪ A data value is considered an extreme outlier if its Z-score is less than -3.0 or
greater than +3.0.
▪ The larger the absolute value of the Z-score, the farther the data value is from the
mean.
Slide 56
Locating Extreme Outliers: Z-Score
Where :
X−X
X represents the data value Z=
S
X is the sample mean
S is the sample standard deviation
Slide 57
Locating Extreme Outliers: Z-Score
Slide 58
Numerical Descriptive Measures for a Population
Slide 59
Numerical Descriptive Measures for a Population:
The mean µ
◼ The population mean is the sum of the values in the population
divided by the population size, N.
X i
X1 + X2 + + XN
= i=1
=
N N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Slide 60
Numerical Descriptive Measures For A Population:
The Variance σ2
◼ Average of squared deviations of values from the mean.
◼ Population variance: N
(X − μ)
i
2
σ2 = i=1
N
N
◼ Population standard deviation: i
(X − μ) 2
σ= i =1
N
Slide 62
Sample statistics versus population parameters
Mean X
Variance 2 S2
Standard Deviation S
Slide 63
The Empirical Rule
◼ The empirical rule approximates the variation of data in a
symmetric mound-shaped distribution.
◼ Approximately 68% of the data in a symmetric mound shaped
distribution is within 1 standard deviation of the mean or µ ± 1σ.
68%
µ
µ ± 1σ
Slide 64
The Empirical Rule
◼ Approximately 95% of the data in a symmetric mound-shaped distribution
lies within two standard deviations of the mean, or µ ± 2σ.
95% 99.7%
μ 2σ μ 3σ
Slide 65
Using the Empirical Rule
▪ Suppose that the variable Math SAT scores is bell-shaped with a mean
of 500 and a standard deviation of 90. Then:
▪ Approximately 68% of all test takers scored between 410 and 590, (500 ± 90).
▪ Approximately 95% of all test takers scored between 320 and 680, (500 ± 180).
▪ Approximately 99.7% of all test takers scored between 230 and 770, (500 ±
270).
Slide 66
Chebyshev’s Rule
◼ Regardless of how the data are distributed, at least (1 - 1/k2) x
100% of the values will fall within k standard deviations of the
mean (for k > 1).
◼ Examples:
At least Within
(1 - 1/22) x 100% = 75% ….............. k=2 (μ ± 2σ)
(1 - 1/32) x 100% = 88.89% ……….. k=3 (μ ± 3σ)
Slide 67
Pitfalls in Numerical Descriptive Measures
◼ Data analysis is objective:
◼ Should report the summary measures that best describe and
communicate the important aspects of the data set.
Slide 68
Ethical Considerations
Numerical descriptive measures:
Slide 69
Chapter Summary
In this chapter we have discussed:
◼ Describing the properties of central tendency, variation, and
shape in numerical variables.
◼ Constructing and interpreting a boxplot.
◼ Computing descriptive summary measures for a population.
◼ Calculating the covariance and the coefficient of correlation.
Slide 70
Lunch Time
Slide 71
Shape of a Distribution
◼ Describes how data are distributed.
◼ Two useful shape related statistics are:
◼ Skewness:
◼ Measures the extent to which data values are not symmetrical.
◼ Kurtosis:
◼ Kurtosis measures the peakedness of the curve of the distribution—that is, how
sharply the curve rises approaching the center of the distribution.
Slide 72
Shape of Distribution
Slide 73
Shape of a Distribution (Skewness)
Skewness
Statistic < 0 0 >0
Slide 74
Shape of a Distribution -- Kurtosis measures how sharply the
curve rises approaching the center of the distribution
Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)
Bell-Shaped
(Kurtosis = 0)
Flatter Than
Bell-Shaped
(Kurtosis < 0)
Slide 75