0% found this document useful (0 votes)
11 views

Lecture 3

Uploaded by

Ly Khánh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 3

Uploaded by

Ly Khánh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture 3.

Chapter 5
Numerical descriptive measures
5.1 Measure of central location
5.2 Measure of variability
5.3 Measure of relative standing and box plots
5.4 Approximate descriptive measures for
grouped data
Optional reading: 5.5 & 5.6
1

5.1 Measures of central location:


Arithmetic Mean (or Average)
This is the most popular and useful measure
of central location.

Sum of measurements
Mean =
Number of measurements
Sample mean Population mean

 ni1 x i  Ni1 x i
x 
n N

Sample size Population size


2

1
Example 5.1, page 131
The mean of the sample of 10 measurements (waiting times for a bus),
14, 8, 12, 13, 12, 6, 19, 7, 11, 8, is given by
10 x 14  8  12  ...  11  8
x  i 1 i   11.0
10 10
Example 4.1 (contd.), page 85
Suppose the telephone bills of Example 4.1 represent a population
of measurements. The population mean is

 i200
1 xi 196.65  468.75  ...  270.90  196.65
   238.015
200 200

Example 5.4, page 134


When many of the measurements have the same value, the
measurements can be summarized in a frequency table.
Waist sizes (cm) xi 70 75 77 80 82 85 90 100
# of pairs of trousers fi 2 1 2 1 1 5 1 1

Mean is seriously affected by extreme values called


‘outliers’. E.g. as soon as a pair of trousers of big size
moves into the sample (say, of size 200 cm), the
average waist size increases to 89.7 cm beyond what
it was previously 81.9 cm!
4

2
Median: another measure of central location
Example 5.2, page 132 Example 5.3, page 132
Seven employee salaries were recorded Suppose the director’s salary of $130000
(in $1000): 49, 52, 47, 53, 51, 47, 50. was added to the group recorded before.
Find the median salary. Find the median salary.
Odd number of observations Even number of observations
First, sort the salaries in an order. First, sort the salaries.
Then, locate the value in the middle. Then, locate the values in the middle.
There are two middle values!

47,47,49,50,51,52,53 47,47,49,50,51,52,53,130
47,47,49,50 51,52,53,130
Median is the value that falls in
the middle when the
measurements are arranged in
order of magnitude.
47,47,49,50, 50.5, 51,52,53,130
5

Mode: Another commonly used measure


of central location

The mode of a set of observations is the value that


occurs most frequently. A set of data may have
one mode or two or more modes.

Example 5.4, page 134


Waist sizes (cm) xi 70 75 77 80 82 85 90 100
# of pairs of trousers fi 2 1 2 1 1 5 1 1
The mean is 81.9 cm, the median = (x7 + x8)/2 =
83.5. The mode of this data set is 85 cm is more than
the median (83.5) and mean (81.9)

3
Example 4.1 (contd.), page 85
For large data sets, the modal class is much more
relevant than a single-value mode. The modal class
is the class with the highest frequency. There may be
one modal class or two or more modal classes

Histogram
For large data sets
70 the modal class is
60
much more relevant
50
than the a single-
Frequency

40

30
value mode.
20 In example 4.1: The
10 modal class is
0
100 150 200 250 300 350 400 450 500 515 More
[200, 250] with the
Bin highest frequency 60.

Example 5.6, page 136


The mean provides information
Excel Output about the over-all performance level
Marks of the class. It can serve as a tool for
making comparisons with other
Mean 73.98 classes and/or other exams.
Standard Error 2.1502163
Median 81 The median indicates that half of the
Mode 84 class received a grade below 81%,
Standard Deviation 21.502163 and half of the class received a grade
Sample Variance 462.34303 above 81%.
Kurtosis 0.3936606
Skewness -1.073098 The mode must be used when data is
Range 89
Minimum 11 nominal. If marks are classified by
Maximum 100 letter grade (say, A, B, C, D, F), the
Sum 7398 frequency of each
Count 100 grade can be calculated..
Note: If your data is multi-modal, then
Excel prints the smallest one or N/A. 8

4
Excel Histogram for Example 5.6

Bin Frequency Frequency


10 0
20 3
30 2 30
40 6
50 6 20
60 5
10
70 10
80 16 0
90 28
100 24

e
10

20

30

40

50

60

70

80

90

0
or
10
More 0 The histogram is skewed to the left

M
If the distribution is negatively Modal class is [80, 90)
skewed, then Mean < Median < Mode. with the highest
73.98 < 81 < 84 frequency 28
9

5.2 Measures of Variability


(about the mean): Range, variance …
Range = Largest observation – Smallest observation
The variance of a population of N measurements x1, x2,
…, xN having
N ( x i   ) 2
a mean  is defined as 2  i 1
N
The variance of a sample of n measurements x1, x2, …,
xn having   
2
 n
a mean x is n ( x  x )2

1  n 2  i 1
  i 
x 

defined as
s 2  i 1 i
n 1
 
n  1  i 1
xi 
n 
 
 
𝑛
or 𝑠 2 = 𝑛−1 𝑥 2 − 𝑥 2
The population standard deviation = 
The sample standard deviation = s
10

5
Example 5.8, page 149
Let us use the Excel printout that is run from the
‘Descriptive statistics’ sub-menu
Trust A Trust B
Rates of return over the
past 10 years for two unit Mean 20Mean 15
trusts are shown below. Standard Standard
Error 5.29471435Error 3.152353618
Which one has a higher Median 18.6Median 14.75
level of risk? Mode #N/A Mode #N/A
Trust A: 12.3, -2.2, 24.9, 1.3, Standard Standard
37.6, 46.9, 28.4, 9.2, 7.1, 34.5 Deviation 16.7433569Deviation 9.968617423
Trust B: 15.1, 0.2 , 9.4, 15.2, Sample Sample
Variance 280.34Variance 99.37333333
30.8, 28.3, 21.2, 13.7, 1.7,14.4
Kurtosis -1.3419311Kurtosis -0.46393926
Trust A should be Skewness 0.21697141Skewness 0.106952106
considered Range 49.1Range 30.6
riskier because its standard Minimum -2.2Minimum 0.2
Maximum 46.9Maximum 30.8
deviation is larger.
Sum 200Sum 150
Count 10Count 1110

Interpreting Standard Deviation


in case the histogram is bell – shaped
or mound-shape

Example 5.9, page 151


A statistician wants to describe the way returns on
investment are distributed: the mean return = 10%, the
standard deviation of the return = 3% and the histogram is
bell-shaped.
How can the statistician use the mean and the standard
deviation to describe the distribution?
– Approximately 68% of the returns lie between
𝑥 − 𝑠, 𝑥 + 𝑠 = [7%,13%]
- Approximately 95% of the returns lie between
𝑥 − 2𝑠, 𝑥 + 2𝑠 = [4%,16%]
- Approximately 99.7% of the returns lie between
𝑥 − 3𝑠, 𝑥 + 3𝑠 = [1%,19%]
12

6
Example 5.9 page 151 (contd.):
other conclusions
• By the empirical rule, approximately 95% of the area
under a mound-shaped histogram lies between
( x  2s, x  2s)
95%
of the area
2 4 6 8 10 12 14 16 More
x  2s, x x  2s
• About 95% of all the measurements fall within two
standard deviations around the mean [4%, 16 %] (in
fact, 96 out of 100 call durations fall in this
range: 96%)
• The range = 16.72-3.41 = 13.31
• s  range / 4 = 3.4 (in fact s = 3)
Example 5.10, page 152 : study yourself 13

Interpreting Standard Deviation


Chebyshev’s Theorem
• Given any set of measurements and a number k
(greater than 1), the fraction of these
measurements that lie within k standard deviations
around the mean is at least 1–1/k2. 1–1/22=3/4 or 75%
• This theorem is valid for any set of measurements
(sample, population) of any shape (not only for bell
–shaped populations).
1–1/32=8/9 or 89%

k Interval Chebyshev Empirical rule


1 approx 68%
x  s, x  s
2 x  2s, x  2s at least 75% approx 95%
3 x  3s, x  3s at least 89% approx 100%
14

7
A measure of variability:
Coefficient of Variation
s
Sample coefficient of variation : cv 
x

Population coefficient of variation : CV 

This coefficient provides a proportionate measure
of variation. A standard deviation of 10 may be
perceived as large when the mean value is 100,
but only moderately large when the mean value
is 500.

Example 5.11, page 154 (Example 5.8, contd.):


cvA = 𝑠𝐴 /𝑥𝐴 = 0.837, cvB = 𝑠𝐵 /𝑥𝐵 = 0.665.
15

5.3 Measures of Relative Standing


and Box Plots
Measures of relative standing are designed to
provide information about the position of
particular values relative to the entire data set.

Percentile: the pth percentile is the value for


which p % of values are less than that value and
(100-p)% are greater than that value.

Suppose you scored in the 60th percentile on the


UMAT, that means 60% of the other scores were
below yours, while 40% of scores were above
yours.
16

8
Commonly Used Percentiles…
First (lower) decile = 10th percentile
First (lower) quartile, Q1 = 25th percentile
Second (middle)quartile,Q2 = 50th percentile
Third (upper) quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile
Location of Percentiles

17

Example 5.12, page 160


Calculate the 25th, 50th, and 75th percentile of the
data: 5, 12, 17, 10, 38, 19, 13, 5, 14, 27.
After sorting the data we have
5, 5, 10, 12, 13, 14, 17, 19, 27, 38.
25
L 25  (10  1)  2.75
100

The 2.75th location translates to the value


P25 = 5 + (.75)(10 – 5) = 8.75

2nd observation 3rd observation 2nd observation

18
Similarly, we can have: P50 = 13.5 and P75 =21

9
Quartiles and Variability

• Quartiles can provide an idea about the


shape of a histogram

Q1 Q2 Q3 Q1 Q2 Q3
Positively skewed Negatively skewed
histogram histogram
19

Interquartile Range…
The quartiles can be used to create another
measure of variability, the interquartile range,
which is defined as follows:

Interquartile Range = Q3 – Q1

The interquartile range measures the spread of the


middle 50% of the observations.

Large values of this statistic mean that the 1st and


3rd quartiles are far apart, indicating a high level of
variability.
20

10
Box Plots
Box Plot is a pictorial display that graphs
five main descriptive measures of the
measurement set:
• L – The largest measurement
• Q3 – The upper quartile An adjustment to this general
• Q2 – The median description of a box plot may
be needed in the presence of
• Q1 – The lower quartile outliers. See the next example.
• S – The smallest measurement

S Q1 Q2 Q3 L
21

Box Plots
The box plot is a technique that graphs five
statistics:
• the minimum and maximum observations, and

Whisker Whisker (1.5*(Q3–Q1))


• the first, second, and third quartiles.

22

11
Box Plots

• The lines extending to the left and right


are called whiskers.
• The whiskers extend outward to the
smaller of 1.5 times the interquartile range
or to the most extreme point.
• Any points that lie outside the whiskers are
called outliers

23

Example, page 162: Share value of 11 stocks

-2.75 16.05

S Q1 Q2 Q3 L
0.9 4.3 5.3 9.0 11.4 25.5

IQR = Q3 – Q1 = 9.0 – 4.3 = 4.7


Fences ={Q1 – 1.5(IQR), Q3 + 1.5(IQR)} = {-2.75, 16.05}

Any value outside the interval (-2.75, 16.05) is an outlier.


The only outlier is 25.5. Therefore, the whiskers, emanating
from each end of the box (Q1 & Q3) will extend to the two
extreme values that are not an outlier: 0.9 and 11.4.

24

12
Interpreting the box plot results ???

S Q1 Q2 Q3 L
0.9 4.3 5.3 9.0 25.5

25% 50% 25%


The distribution is positively skewed

50%

25% 25%

0.9 25.5
25

5.4 Approximating Descriptive


Measures for Grouped Data

• Approximating descriptive measures for


grouped data may be needed when
approximated values satisfy the needs when
only secondary grouped data are available.

number  ki1 fi mi midpoint of class i


of classes x fimi is approx.
n frequency of class i equal
n = f1+f2+…+ fk to the number of
1 k (  ki1 fi mi ) 2  measurements
s 
2
 fi m i 
2

n  1  i1 n 

in class i

26

13
Example 5.15, page 169
i61 fimi 312.0 Class Class Frequency Midpoint
x   10.4
30 6 i limits fi mi fimi fimi2
1 2–5 3 3.5 10.5 36.75
1  k 2 (  i1 fimi ) 
k 2
s2   fimi   2 5–8 6 6.5 39.0 253.5
n  1  i1 n  3 8–11 8 9.5 76.0 722.0
1  312 
2 – – – – – –
3,751.5    17.47 6 17–20 2 18.5 37.0 684.5
29  30 
n = 30 312.0 3 751.5

10
Real values :
8 x  10.26 and s2  18.40
6
4 Approximate the mean and
standard deviation of the
2
telephone call durations
0
represented by the
2 5
3.5 6.5 8 11 14 17 20 More
frequency distribution. 27

Summary: page 189

Home assignment:

- Section 5.1 Exercises pages 139-140: 5.6, 5.10

- Section 5.2 Exercises pages 155-157: 5.29, 5.40

- Section 5.3 Exercises page 167: 5.63

- Section 5.4 Exercises page 170: 5.74

28

14

You might also like