Chapter 3 Numerical Technique

Chapter 3
Descriptive
Statistics:
Numerical Methods
1
4.1 Measures of Central Location
The central data point reflects the locations of all
the actual data points.
How?
With two data points,
the central location
With one data point should fall in the middle
clearly the central between them (in order
location is at the point to reflect the location of
itself. both of them).
2
The central data point reflects the locations of all
the actual data points.
How?
But if the third data point
appears
If the third dataon the appears
point left hand-side
in the center
of the midrange,
the measure of central itlocation
should will
“pull”
remain
in the the central
center, but…location
(click)to the left.
3
As more and more data points are added, the

central location moves (left and right) as required
in order to reflect the effects of all the points.
4
The Arithmetic Mean (average)
This is the most popular and useful measure of

central location
Sum of the measurements

Mean =
Number of measurements
5
The Arithmetic Mean
Sample mean Population mean

n
i=i=11xxi i
n N
 i=1 x i
x= =
nn N
Sample size Population size
6
The Arithmetic Mean
Example 1
Find the mean rate of return for a portfolio equally invested in five
stocks having the following annual rate of returns: 11.2%, 8.07%,
5.55%, 13.7%, 21%.
Solution
11.2 + 8.07 + 5.55 + 13.7 + 21
x= = 9.764%
5
7
2 workers make the same product
Worker 1 spends 2m making 1 product
Worker 2 spends 6m making 1 product
Find the average time making 1 product of 2
workers:
a, They work in 6 hrs
b, Each worker works in half of the working day.
c, Each of the worker makes half of the total
products.
8
Find the average price, quantity, exchange rate of the
export product of company X
Month Jan Feb Mar
Price (USD/t) 180 186 185
Quantity(t) 2200 1800 2000
Exchange rate USD/VND 20000 20500 21000

9
Geometric mean
A specialized measure, used to find the average

growth rate, or rate of change of a variable over
time
Example:
The number of students attending the music
class last Tuesday was 160. This Tuesday, the
number is expected to increase by 15%.
How many of them are likely to attend this
Tuesday?
Geometric mean
◼The number of students likely to attend this

Tuesday
Number of students = 160*(100+15)%
= 160*(1+0.15)= 184 (students)
◼Growth rate/rate of change?

15% or 0.15
Geometric mean
Formula:
- Step 1: Express the rate of change (R) as (1+R)
- Step 2: Calculate the geometric mean using the

formula:
Rg = n (1+ R1 )(1+ R2 )...(1+ Rn ) -1

Example
Year 2000 2001 2002 2003 2004
Growth
10 25 15 10 10
rate (%)
Year 2005 2006 2007 2008 2009
Growth
10 10 15 25 15
rate (%)
Example
Average growth rate
Rg = 5+3+2
1.10 ´1.15 ´1.25 -1= 0.14 ~14%
5 3 2
Example
◼The number of employees in a small bank over the
period 2000-2006 is presented in the table below:
Year 2000 2001 2002 2003 2004 2005 2006
No of 200 220 250 262 284 300 312

employees
What is the average rate of change in the

number of employees?
Example
Year 2000 2001 2002 2003 2004 2005 2006
No of 200 220 250 262 284 300 312

employ
ees
- 1.1 1.136 1.048 1.084 1.056 1.04
(1+R)
Example
◼The average rate of change:
Rg = 1.1´1.136´1.048´1.084´1.056´1.04 -1= 0.077 ~ 7.7%

6
Characteristics of the mean
◼A representative of a data set

◼Takes every single value into account so it is
likely to be affected by extreme values
◼Used to compare different-sized data sets.
The Median
The median of a set of measurements is the value that
falls in the middle when the measurements are
arranged in order of magnitude.
When determining the median pay attention to the
number of observations (k).
‘k’ is odd
Median = the number at the (k+1)/2th location of the ordered
array.
‘k’ is Even
Median = the average of the two numbers in the middle
(The number at the (k/2)th and the [(k/2)+1)]th
locations of the ordered array.) 19
The Median
Example 2
The salaries of seven employees Suppose an additional salary of $31,000
were recorded (in 1000s): 28, 60, 26, is added to the group of salaries recorded
32, 30, 26, 29. before. Find the median salary.
Find the median salary.
Odd number of observations Even number of observations
26,26,28,29,30,32,60 26,26,28,29,29.5,
30,32,60,31
There are seven salaries (K = 7). There are eight salaries (K = 8).
The (k+1)/2th salary of the ordered The two salaries in the middle are 29 (in
array is the number at the the (k/2)th =4th location), and 30 (in the
(7+1)/2th = 4th location. [(k/2)+1]th=5th location. 20
The median is 29. The median is the average number – 29.5.
The Mode
The Mode of a set of measurements is the value that
occurs most frequently.
A Set of data may have one mode (or modal class), or
two or more modes.
For large data sets
The modal class the modal class is
much more relevant
than a single-value
mode.
21
The Mode
Example 3
▪ The manager of a men’s clothing store observes the waist
size (in inches) of trousers sold last week: 31, 34, 36, 33, 28,
34, 30, 34, 32, 40.
▪ The mode of this data set is 34 in.
This information seems to be valuable

(for example, for the design of a new
display in the store), much more than
“ the median is 33.5 in.”
22
Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median and
mode coincide
If a distribution is non symmetrical, and skewed to the

left or to the right, the three measures differ.
A positively skewed distribution

(“skewed to the right”)
Mode Mean 23
Median
Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median
and mode coincide
If a distribution is non symmetrical, and skewed

to the left or to the right, the three measures
differ.
A positively skewed distribution A negatively skewed distribution
(“skewed to the right”) (“skewed to the left”)
Mode Mean Mean Mode 24

Median Median
Using the Mean, Median, and Mode
The mean - is very sensitive to extreme values,

is used in most statistical analyses.
The median is not effected by extreme values,
yet, does not reflect all the values included in
the data set, but rather the location of the
observation in the middle.
The mode – should be used mainly for
categorical data.
25
A bank deposit of $1,000 made 4 years ago was worth $1,200
after the first year, $1,200 after the second year, $1,500 after
the third year, and $2,000 today.
a. Compute the annual rates of return.
b. Compute the mean and median of the rates of return.
c. Compute the geometric mean.
d. Discuss whether the mean, median, or geometric mean is
the best measure of the performance of the investment.
26
4.2 Measures of Variability
Measures of central location fail to tell the whole
story about the distribution.
A question of interest still remains unanswered:
How much are the values of a given set spread

out around the mean value?
27
Think of a sample portfolio composed of three stocks.
200 shares
100 shares ARR = 15% 100 shares
ARR = 10% ARR = 20%
A central measure for this portfolio’s ARR for is 15%.

Now observe the following portfolio
A central measure of this portfolio’s ARR for is 15% too.
200 shares
100
100 shares
shares ARR = 15% 100 shares
ARR
ARR == 5%
5% ARR = 25% 28
Considering the average ARR only the two portfolios
are equal. But are they really?
Is the dispersion (variability) of ARR the same for the
two portfolio?
The dispersion is as important as the central location.
29
The Range
The range of a set of measurements is the difference
between the largest and smallest measurements.
Its major
But,advantage is measurements
how do all the the ease withspread
whichout?
it can be
computed.
? ? ?
The range cannot assistRange
in answering this question
Its major shortcoming
Smallest
is its failure to provide
Largest
information measurement
on the dispersion ofmeasurement
the values between
the two end points.
30
Deviation measure dispersion?
Consider two small populations:
9-10= -1
A measure of dispersion
Can the sum of deviations from the mean 11-10= +1
be ashould agree with
good measure this
of dispersion? 8-10= -2
observation.
A 12-10= +2
8 9 10 11 12 Sum = 0
…but
Themeasurements
mean of both in B
4-10 = - 6
arepopulations
more dispersed
is 10...
then those in A. 16-10 = +6
B 7-10 = -3
4 7 10 13 16 13-10 = +3
31
Sum = 0
The Variance
This measure reflects the dispersion of all the
measurement values.
The variance of a population of N measurements
x1, x2,…,xN having a mean  is defined as
N (
i =1 i
x −  ) 2
2 =
N
The variance of a sample of n measurements
x1, x2, …,xn having a mean x is defined as
ni =1( x i − x )2
s2 =
n −1 32
The Variance
The sum of deviations is zero for both

populations, therefore, is not a good
measure of dispersion, since clearly their
dispersion is not equal.
33
The Variance
Let us calculate the variance of the two populations
2 2 2 2 2
( 8 − 10 ) + ( 9 − 10 ) + (10 − 10 ) + (11 − 10 ) + (12 − 10 )
2A = =2
5
2 2 2 2 2
( 4 − 10 ) + ( 7 − 10 ) + (10 − 10 ) + (13 − 10 ) + (16 − 10 )
B2 = = 18
5
34
The Variance
Example 6
Find the variance of the following set of numbers,
representing annual rates of returns for a group of mutual
funds. Assume the set is (i) a sample, (ii) a population: -2, 4,
5, 6.9, 10
Solution
 i6=1 x i − 2 + 4 + 5 + 6.9 + 10 23.9
x= = = = 4.78
5 5 5
 
n 2
2  ( x − x ) 1
s = i =1 i
= ( −2 − 4.78) 2 + ( 4 − 4.78) 2 + ... + (10 − 4.78) 2
n −1 5 −1
= 19.59 percent 2 Assuming a sample 35
The Variance
Example 6 - solution continued
 
n 2
2  ( x − x ) 1
 = i =1 i
= ( −2 − 4.78) 2 + ( 4 − 4.78) 2 + ... + (10 − 4.78) 2
n 5
= 15.6736 percent 2 Assuming a population
36
Variance…
You have to calculate the sample mean (x-bar) in

order to calculate the sample variance.
There is a short-cut formulation:
4.37
Sample: 17, 15, 23, 7, 9, 13
Sample Mean
Sample Variance
Sample Variance (shortcut method)
4.38
Standard Deviation
The standard deviation of a set of measurements is

the square root of the set variance.
2
Sample standard deviation : s = s
2
Population standard deviation :  = 
39
Standard Deviation
The daily percentage of defective items in two weeks of
production (10 working days) were calculated for two
production lines?
Which line provides good items more consistently?
Line 1: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05
Line 2: 12.1, 2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, 1.3, 11.4
40
The Empirical Rule for a Bell Shaped Data Set …
Approximately 68% of all observations fall
within one standard deviation of the mean.
Approximately 95% of all observations fall

within two standard deviations of the mean.
Approximately 99.7% of all observations fall

within three standard deviations of the mean.
4.41
The Chebyshev Theorem -
Describing Any Data Set
The proportion of observations in any sample that lie within
k standard deviations of the mean is at least 1-1/k2
for any k > 1.
This theorem is valid for any set of measurements
(sample, population) of any shape!!
K Interval Minimum %
2 x − 2s, x + 2s at least 75% (1-1/22)
3 x − 3s, x + 3s at least 89% (1-1/32)
4 x − 4s, x + 4s at least 94% (1-1/42)

42
Suppose that the mean and standard deviation of
last year’s midterm test marks are 70 and 5,
respectively.
If the histogram is bell-shaped then we know that
approximately 68% of the marks fell between 65
and 75, approximately 95% of the marks fell
between 60 and 80, and approximately 99.7% of
the marks fell between 55 and 85.
If the histogram is not at all bell-shaped we can say
that at least 75% of the marks fell between 60 and
80, and at least 88.9% of the marks fell between 55
and 85. (We can use other values of k.)
4.43
4.3 Measures of Relative Location
and Box Plots
Percentile
The pth percentile of a set of measurements is the
value for which
• At most p% of the measurements are less than that value
• At most (100-p)% of all the measurements are greater
than that value.
Example
Suppose your score is the 60th percentile of a SAT
test. Then
60% of all the scores lie here 40%
44
Your score
A demostration of Commonly used
percentiles
First (lower) decile = 10th percentile

First (lower) quartile, Q1, = 25th percentile
Median, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile
10% 90%
lie here
Lower decile
45
percentiles - optional
Commonly used percentiles:
First (lower) decile = 10th percentile
10% 25% 90% 75%

lie here
lie here
Lower quartile
46
percentiles
Commonly used percentiles:
First (lower)decile = 10th percentile
25% 50% 75% 50%

lie here lie here
lie here
Middle decile And so on…
47
-Median
Determining Percentiles and their
Location
Find the location of any percentile using the formula
P
L P = (n + 1)
100
where L P is the location of the P th percentile
48
Determining Percentiles and their
Location
Example: Compute 25th percentile of the following data
set: 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9
Finding the location of the 25th percentile:
P 25
LP = (n + 1) = (10 + 1) = 2.75
100 100
▪ Finding the value of the 25th percentile.
The 25th percentile is located at location 2.75,
that is, at .75 the distance from 3.1 to 5.2.
5.2
Therefore,
3.1
P25 = 3.1 + .75(5.2 – 3.1) = 4.675
492.75
2 3
Quartiles and Variability
Quartiles can provide an idea about the shape of

a histogram
Q1 Q2 Q3 Q1 Q2 Q3
Positively skewed Negatively skewed
histogram histogram
50
Inter-quartile Range
This is a measure of the spread of the middle

50% of the observations
Large value indicates a large spread of the
observations
Interquartile range = Q3 – Q1
51
Box Plot
A box plot is a pictorial display that provides the
main descriptive measures of the measurement set:
• L - the largest measurement
• Q3 - The upper quartile An outlier is defined as any value
• Q2 - The median that is more than 1.5(Q3 – Q1)
away from the box.
• Q1 - The lower quartile
• S - The smallest measurement
1.5(Q3 – Q1) 1.5(Q3 – Q1)
Whisker Whisker
S Q1 Q2 Q3 L
52
1. Determine the first, second, and third quartiles of
the following data.
14.7 17.7 15.9 12.2 10.0 14.7 10.5
14.1 13.9 15.3 18.5 13.9 15.1
2. Identify the interquartile range from the following

data.
5 8 14 6 21 11 9 10 18 2
53
A set of data whose histogram is bell shaped has a
standard deviation and mean of 4 and 50, respectively.
Approximately what proportion of observations
a. are less than 46?
b. are less than 58?
c. are greater than 54?
54
Standard deviation and mean of a data set were
30 and 120, respectively. What can you say about
the proportions of observations that lie between:
a. 150 and 90
b. 60 and 180
c. 30 and 210
55
Chose: 500,000 residents pay 350$
No of
Median
Place residents Mean ($) Mode ($) STD
($)
(mill)
A 2.50 450 75 87 75
B 1.75 385 97 109 52
C 0.95 367 358 360 18
D 0.98 365 310 340 20
E 1.35 353 348 352 10
56

Chapter 3 Numerical Technique

Uploaded by

Copyright:

Available Formats

Chapter 3 Numerical Technique

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Numerical Technique

Uploaded by

Copyright:

Available Formats

Chapter 3

As more and more data points are added, the

This is the most popular and useful measure of

Sum of the measurements

Sample mean Population mean

Sample size Population size

Month Jan Feb Mar

Price (USD/t) 180 186 185

Quantity(t) 2200 1800 2000

Exchange rate USD/VND 20000 20500 21000

A specialized measure, used to find the average

◼The number of students likely to attend this

= 160*(1+0.15)= 184 (students)

◼Growth rate/rate of change?

- Step 2: Calculate the geometric mean using the

Rg = n (1+ R1 )(1+ R2 )...(1+ Rn ) -1

Year 2005 2006 2007 2008 2009

Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312

What is the average rate of change in the

No of 200 220 250 262 284 300 312

Rg = 1.1´1.136´1.048´1.084´1.056´1.04 -1= 0.077 ~ 7.7%

◼A representative of a data set

This information seems to be valuable

If a distribution is non symmetrical, and skewed to the

A positively skewed distribution

If a distribution is non symmetrical, and skewed

Mode Mean Mean Mode 24

The mean - is very sensitive to extreme values,

How much are the values of a given set spread

A central measure for this portfolio’s ARR for is 15%.

The sum of deviations is zero for both

You have to calculate the sample mean (x-bar) in

Sample Variance (shortcut method)

The standard deviation of a set of measurements is

Approximately 95% of all observations fall

Approximately 99.7% of all observations fall

4 x − 4s, x + 4s at least 94% (1-1/42)

First (lower) decile = 10th percentile

10% 25% 90% 75%

25% 50% 75% 50%

Quartiles can provide an idea about the shape of

This is a measure of the spread of the middle

2. Identify the interquartile range from the following

You might also like