Chapter 3 Numerical Technique

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Chapter 3

Descriptive
Statistics:
Numerical Methods
1
4.1 Measures of Central Location
The central data point reflects the locations of all
the actual data points.
How?
With two data points,
the central location
With one data point should fall in the middle
clearly the central between them (in order
location is at the point to reflect the location of
itself. both of them).

2
4.1 Measures of Central Location
The central data point reflects the locations of all
the actual data points.
How?
But if the third data point
appears
If the third dataon the appears
point left hand-side
in the center
of the midrange,
the measure of central itlocation
should will
“pull”
remain
in the the central
center, but…location
(click)to the left.

3
4.1 Measures of Central Location

As more and more data points are added, the


central location moves (left and right) as required
in order to reflect the effects of all the points.

4
The Arithmetic Mean (average)

This is the most popular and useful measure of


central location

Sum of the measurements


Mean =
Number of measurements

5
The Arithmetic Mean

Sample mean Population mean


n
i=i=11xxi i
n N
 i=1 x i
x= =
nn N

Sample size Population size

6
The Arithmetic Mean
Example 1
Find the mean rate of return for a portfolio equally invested in five
stocks having the following annual rate of returns: 11.2%, 8.07%,
5.55%, 13.7%, 21%.

Solution
11.2 + 8.07 + 5.55 + 13.7 + 21
x= = 9.764%
5

7
2 workers make the same product
Worker 1 spends 2m making 1 product
Worker 2 spends 6m making 1 product
Find the average time making 1 product of 2
workers:
a, They work in 6 hrs
b, Each worker works in half of the working day.
c, Each of the worker makes half of the total
products.
8
Find the average price, quantity, exchange rate of the
export product of company X

Month Jan Feb Mar

Price (USD/t) 180 186 185

Quantity(t) 2200 1800 2000

Exchange rate USD/VND 20000 20500 21000


9
Geometric mean

A specialized measure, used to find the average


growth rate, or rate of change of a variable over
time
Example:
The number of students attending the music
class last Tuesday was 160. This Tuesday, the
number is expected to increase by 15%.
How many of them are likely to attend this
Tuesday?
Geometric mean

◼The number of students likely to attend this


Tuesday
Number of students = 160*(100+15)%

= 160*(1+0.15)= 184 (students)

◼Growth rate/rate of change?


15% or 0.15
Geometric mean

Formula:
- Step 1: Express the rate of change (R) as (1+R)

- Step 2: Calculate the geometric mean using the


formula:

Rg = n (1+ R1 )(1+ R2 )...(1+ Rn ) -1


Example
Year 2000 2001 2002 2003 2004

Growth
10 25 15 10 10
rate (%)

Year 2005 2006 2007 2008 2009

Growth
10 10 15 25 15
rate (%)
Example
Average growth rate

Rg = 5+3+2
1.10 ´1.15 ´1.25 -1= 0.14 ~14%
5 3 2
Example
◼The number of employees in a small bank over the
period 2000-2006 is presented in the table below:

Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312


employees

What is the average rate of change in the


number of employees?
Example
Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312


employ
ees
- 1.1 1.136 1.048 1.084 1.056 1.04
(1+R)
Example
◼The average rate of change:

Rg = 1.1´1.136´1.048´1.084´1.056´1.04 -1= 0.077 ~ 7.7%


6
Characteristics of the mean

◼A representative of a data set


◼Takes every single value into account so it is
likely to be affected by extreme values
◼Used to compare different-sized data sets.
The Median
The median of a set of measurements is the value that
falls in the middle when the measurements are
arranged in order of magnitude.
When determining the median pay attention to the
number of observations (k).
‘k’ is odd
Median = the number at the (k+1)/2th location of the ordered
array.
‘k’ is Even
Median = the average of the two numbers in the middle
(The number at the (k/2)th and the [(k/2)+1)]th
locations of the ordered array.) 19
The Median

Example 2
The salaries of seven employees Suppose an additional salary of $31,000
were recorded (in 1000s): 28, 60, 26, is added to the group of salaries recorded
32, 30, 26, 29. before. Find the median salary.
Find the median salary.
Odd number of observations Even number of observations

26,26,28,29,30,32,60 26,26,28,29,29.5,
30,32,60,31
There are seven salaries (K = 7). There are eight salaries (K = 8).
The (k+1)/2th salary of the ordered The two salaries in the middle are 29 (in
array is the number at the the (k/2)th =4th location), and 30 (in the
(7+1)/2th = 4th location. [(k/2)+1]th=5th location. 20
The median is 29. The median is the average number – 29.5.
The Mode
The Mode of a set of measurements is the value that
occurs most frequently.
A Set of data may have one mode (or modal class), or
two or more modes.
For large data sets
The modal class the modal class is
much more relevant
than a single-value
mode.

21
The Mode

Example 3
▪ The manager of a men’s clothing store observes the waist
size (in inches) of trousers sold last week: 31, 34, 36, 33, 28,
34, 30, 34, 32, 40.
▪ The mode of this data set is 34 in.

This information seems to be valuable


(for example, for the design of a new
display in the store), much more than
“ the median is 33.5 in.”
22
Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median and
mode coincide

If a distribution is non symmetrical, and skewed to the


left or to the right, the three measures differ.

A positively skewed distribution


(“skewed to the right”)

Mode Mean 23
Median
Relationship among Mean, Median, and Mode
If a distribution is symmetrical, the mean, median
and mode coincide

If a distribution is non symmetrical, and skewed


to the left or to the right, the three measures
differ.
A positively skewed distribution A negatively skewed distribution
(“skewed to the right”) (“skewed to the left”)

Mode Mean Mean Mode 24


Median Median
Using the Mean, Median, and Mode

The mean - is very sensitive to extreme values,


is used in most statistical analyses.
The median is not effected by extreme values,
yet, does not reflect all the values included in
the data set, but rather the location of the
observation in the middle.
The mode – should be used mainly for
categorical data.
25
A bank deposit of $1,000 made 4 years ago was worth $1,200
after the first year, $1,200 after the second year, $1,500 after
the third year, and $2,000 today.
a. Compute the annual rates of return.
b. Compute the mean and median of the rates of return.
c. Compute the geometric mean.
d. Discuss whether the mean, median, or geometric mean is
the best measure of the performance of the investment.

26
4.2 Measures of Variability
Measures of central location fail to tell the whole
story about the distribution.
A question of interest still remains unanswered:

How much are the values of a given set spread


out around the mean value?

27
Think of a sample portfolio composed of three stocks.

200 shares
100 shares ARR = 15% 100 shares
ARR = 10% ARR = 20%

A central measure for this portfolio’s ARR for is 15%.


Now observe the following portfolio
A central measure of this portfolio’s ARR for is 15% too.
200 shares
100
100 shares
shares ARR = 15% 100 shares
ARR
ARR == 5%
5% ARR = 25% 28
Considering the average ARR only the two portfolios
are equal. But are they really?
Is the dispersion (variability) of ARR the same for the
two portfolio?
The dispersion is as important as the central location.

29
The Range
The range of a set of measurements is the difference
between the largest and smallest measurements.

Its major
But,advantage is measurements
how do all the the ease withspread
whichout?
it can be
computed.
? ? ?
The range cannot assistRange
in answering this question
Its major shortcoming
Smallest
is its failure to provide
Largest
information measurement
on the dispersion ofmeasurement
the values between
the two end points.
30
Deviation measure dispersion?
Consider two small populations:
9-10= -1
A measure of dispersion
Can the sum of deviations from the mean 11-10= +1
be ashould agree with
good measure this
of dispersion? 8-10= -2
observation.
A 12-10= +2

8 9 10 11 12 Sum = 0
…but
Themeasurements
mean of both in B
4-10 = - 6
arepopulations
more dispersed
is 10...
then those in A. 16-10 = +6
B 7-10 = -3

4 7 10 13 16 13-10 = +3
31
Sum = 0
The Variance
This measure reflects the dispersion of all the
measurement values.
The variance of a population of N measurements
x1, x2,…,xN having a mean  is defined as
N (
i =1 i
x −  ) 2
2 =
N
The variance of a sample of n measurements
x1, x2, …,xn having a mean x is defined as
ni =1( x i − x )2
s2 =
n −1 32
The Variance

The sum of deviations is zero for both


populations, therefore, is not a good
measure of dispersion, since clearly their
dispersion is not equal.

33
The Variance
Let us calculate the variance of the two populations
2 2 2 2 2
( 8 − 10 ) + ( 9 − 10 ) + (10 − 10 ) + (11 − 10 ) + (12 − 10 )
2A = =2
5

2 2 2 2 2
( 4 − 10 ) + ( 7 − 10 ) + (10 − 10 ) + (13 − 10 ) + (16 − 10 )
B2 = = 18
5

34
The Variance
Example 6
Find the variance of the following set of numbers,
representing annual rates of returns for a group of mutual
funds. Assume the set is (i) a sample, (ii) a population: -2, 4,
5, 6.9, 10
Solution
 i6=1 x i − 2 + 4 + 5 + 6.9 + 10 23.9
x= = = = 4.78
5 5 5

 
n 2
2  ( x − x ) 1
s = i =1 i
= ( −2 − 4.78) 2 + ( 4 − 4.78) 2 + ... + (10 − 4.78) 2
n −1 5 −1
= 19.59 percent 2 Assuming a sample 35
The Variance
Example 6 - solution continued

 
n 2
2  ( x − x ) 1
 = i =1 i
= ( −2 − 4.78) 2 + ( 4 − 4.78) 2 + ... + (10 − 4.78) 2
n 5
= 15.6736 percent 2 Assuming a population

36
Variance…

You have to calculate the sample mean (x-bar) in


order to calculate the sample variance.
There is a short-cut formulation:

4.37
Sample: 17, 15, 23, 7, 9, 13
Sample Mean

Sample Variance

Sample Variance (shortcut method)

4.38
Standard Deviation

The standard deviation of a set of measurements is


the square root of the set variance.

2
Sample standard deviation : s = s
2
Population standard deviation :  = 

39
Standard Deviation
The daily percentage of defective items in two weeks of
production (10 working days) were calculated for two
production lines?
Which line provides good items more consistently?

Line 1: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05

Line 2: 12.1, 2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, 1.3, 11.4

40
The Empirical Rule for a Bell Shaped Data Set …
Approximately 68% of all observations fall
within one standard deviation of the mean.

Approximately 95% of all observations fall


within two standard deviations of the mean.

Approximately 99.7% of all observations fall


within three standard deviations of the mean.
4.41
The Chebyshev Theorem -
Describing Any Data Set
The proportion of observations in any sample that lie within
k standard deviations of the mean is at least 1-1/k2
for any k > 1.
This theorem is valid for any set of measurements
(sample, population) of any shape!!
K Interval Minimum %
2 x − 2s, x + 2s at least 75% (1-1/22)
3 x − 3s, x + 3s at least 89% (1-1/32)

4 x − 4s, x + 4s at least 94% (1-1/42)


42
Suppose that the mean and standard deviation of
last year’s midterm test marks are 70 and 5,
respectively.
If the histogram is bell-shaped then we know that
approximately 68% of the marks fell between 65
and 75, approximately 95% of the marks fell
between 60 and 80, and approximately 99.7% of
the marks fell between 55 and 85.
If the histogram is not at all bell-shaped we can say
that at least 75% of the marks fell between 60 and
80, and at least 88.9% of the marks fell between 55
and 85. (We can use other values of k.)
4.43
4.3 Measures of Relative Location
and Box Plots
Percentile
The pth percentile of a set of measurements is the
value for which
• At most p% of the measurements are less than that value
• At most (100-p)% of all the measurements are greater
than that value.
Example
Suppose your score is the 60th percentile of a SAT
test. Then
60% of all the scores lie here 40%
44
Your score
A demostration of Commonly used
percentiles

First (lower) decile = 10th percentile


First (lower) quartile, Q1, = 25th percentile
Median, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile

10% 90%
lie here
Lower decile
45
A demostration of Commonly used
percentiles - optional
Commonly used percentiles:
First (lower) decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Median, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile

10% 25% 90% 75%


lie here
lie here
Lower quartile
46
A demostration of Commonly used
percentiles
Commonly used percentiles:
First (lower)decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Median, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile

25% 50% 75% 50%


lie here lie here
lie here
Middle decile And so on…
47
-Median
Determining Percentiles and their
Location
Find the location of any percentile using the formula

P
L P = (n + 1)
100
where L P is the location of the P th percentile

48
Determining Percentiles and their
Location
Example: Compute 25th percentile of the following data
set: 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9
Finding the location of the 25th percentile:
P 25
LP = (n + 1) = (10 + 1) = 2.75
100 100
▪ Finding the value of the 25th percentile.
The 25th percentile is located at location 2.75,
that is, at .75 the distance from 3.1 to 5.2.
5.2
Therefore,
3.1
P25 = 3.1 + .75(5.2 – 3.1) = 4.675
492.75
2 3
Quartiles and Variability

Quartiles can provide an idea about the shape of


a histogram

Q1 Q2 Q3 Q1 Q2 Q3
Positively skewed Negatively skewed
histogram histogram
50
Inter-quartile Range

This is a measure of the spread of the middle


50% of the observations
Large value indicates a large spread of the
observations

Interquartile range = Q3 – Q1

51
Box Plot
A box plot is a pictorial display that provides the
main descriptive measures of the measurement set:
• L - the largest measurement
• Q3 - The upper quartile An outlier is defined as any value
• Q2 - The median that is more than 1.5(Q3 – Q1)
away from the box.
• Q1 - The lower quartile
• S - The smallest measurement
1.5(Q3 – Q1) 1.5(Q3 – Q1)
Whisker Whisker
S Q1 Q2 Q3 L
52
1. Determine the first, second, and third quartiles of
the following data.
14.7 17.7 15.9 12.2 10.0 14.7 10.5
14.1 13.9 15.3 18.5 13.9 15.1

2. Identify the interquartile range from the following


data.
5 8 14 6 21 11 9 10 18 2

53
A set of data whose histogram is bell shaped has a
standard deviation and mean of 4 and 50, respectively.
Approximately what proportion of observations
a. are less than 46?
b. are less than 58?
c. are greater than 54?

54
Standard deviation and mean of a data set were
30 and 120, respectively. What can you say about
the proportions of observations that lie between:
a. 150 and 90
b. 60 and 180
c. 30 and 210

55
Chose: 500,000 residents pay 350$
No of
Median
Place residents Mean ($) Mode ($) STD
($)
(mill)
A 2.50 450 75 87 75
B 1.75 385 97 109 52
C 0.95 367 358 360 18
D 0.98 365 310 340 20
E 1.35 353 348 352 10

56

You might also like