0% found this document useful (0 votes)
10 views

Notes

Uploaded by

sithaarthun-wp21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Notes

Uploaded by

sithaarthun-wp21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 195

BAMS1424 BASIC STATISTICAL METHODS FOR SCIENTIFIC ANALYSIS

Statistics
Descriptive statistics, probability, probability distributions, estimation and
hypothesis testing, further topics.

Operations Research
Linear programming, stock control.

Reference Books

Statistics
1. Roxy Peck, Tom Short, Chris Olsen. 2020. Introduction to statistics and data
analysis, 6th Edition, Cengage.
2. Neil A. Weiss. 2017. Introductory Statistics, 10th Edition, Pearson.

Operations Research
1. Taha Handy A. 2017. Operations Research – An Introduction, 10th Edition,
Pearson.
2. Michael W. Carter, Camille C. Price, Ghaith Rabadi. 2019. Operations
research: a practical introduction, 2nd Edition, Boca Raton, FL.

1
An Introduction to Operations Research

Operations Research is the application of scientific method to complex


problems arising in the direction and management of large systems of men,
machines, materials and money in industry, business, government and
defenses.

Briefly, operations researchers are expected by managers to analyze


managerial problems which involve the operations of systems, to gather
essential date, to interpret those data, to build one or more models, to
manipulate and experiment with those models and finally to predict and make
recommendations about future operations.

Operations research is a quantitative technique which involves the following


steps:
1. Define the problem.
2. Collect relevant data.
3. Model construction.
4. Model solution.
5. Model validation and sensitivity analysis.
6. Interpretation of results and implications.
7. Decision making, implementation and modification.

2
Linear Programming (L.P.)
It is a special case of mathematical programming in which the objective
function (e.g. maximize profit function or a minimize cost function) and the
constraints (resource limitations) can be expressed by linear mathematical
relationships.

Examples of L. P. (maximization problem)


Consider a company which manufactures 2 kinds of cloth (A and B) and uses 3
different colours of wool. The material required to make a unit length of each
type of cloth and the total amount of wool of each colour that is available are
shown in the table below:
Requirements for a unit length of cloth of type Colour of Wool
A B wool available
4 kg 4 kg Red 1400 kg
6 kg 3 kg Green 1800 kg
2 kg 6 kg Yellow 1800 kg

The manufacturer can make a profit of $12 on a unit length of cloth A and $8 on
a unit length of cloth B. What product mix will make the largest possible profit?

Model formulation
Let x be the number of unit length of cloth A produced
and y be the number of unit length of cloth B produced.
(x and y are called decision variables)
Our objective is to maximize total profit P =
subject to the following constraints:

(red wool)
(green wool)
(yellow wool)
(non-negativity)

3
(x,y) satisfying (*) is called a feasible solution. A feasible solution which also
attains the maximum value for P is called an optimal solution. All feasible
points form a feasible region. (**) is the standard form of maximization
problem.

Graphical method
The presence of 2 decision variables x and y allows graphical method to be
used. A constraint is drawn as a line with an arrow indicating the specific
region.

The red wool constraint 4x + 4y  1400 x


y

The green wool constraint 6x + 3y  1800


x
y

The yellow wool constraint 2x + 6y  1800


x
y

Non-negativity constraints x  0 and y  0 must be included.

4
y
700

600 Optimal point B (250, 100)

500

400

300

200

100

0 I______I______I______I______I______I______I______I______I______I____ x
100 200 300 400 500 600 700 800 900

5
The optimal point will always be found at one of the corners of the feasible
region. To determine the required corner, the objective function P = 12x + 8y is
plotted on the graph by choosing a suitable value of P so that it can be easily
plotted on the graph and it will lie inside the feasible region.

Choose P = 12x8xk x
y
i.e. 12x8xk = 12x + 8y
Take k = 10

Shifting this line parallel to itself as high up as possible (for maximization


problem) but without leaving the feasible region until it touches the most outer
corner. In this case is point _____ which is the optimal point. The coordinates
of B is obtained by solving

6x + 3y = 1800 --------(1)
4x + 4y = 1400 --------(2)

(1)x2 12x + 6y = 3600 --------(3)


(2)x3 12x + 12y = 4200 --------(4)

(4) – (3) 6y = 600


 y = 100
Substituting y = 100 into (1), 6x + 3(100) = 1800
x = 250
The coordinates of B is (250, 100)

The optimum product mix is to produce 250 unit lengths of cloth A and 100 unit
lengths of cloth B. The maximum profit = max. P = ______________
= 3800 ($)

6
Slack resources (unused resources)
Consider the red wool constraint 4x + 4y  1400
LHS = 4x + 4y = amount of red wool used
= ________________
RHS = max. availability of red wool = _______
→slack red wool = ________ (fully utilized)

Consider the green wool constraint 6x + 3y  1800


LHS = 6(250) + 3(100) = 1800 = RHS
→slack green wool = _______(fully utilized)

Consider the yellow wool constraint 2x + 6y  1800


LHS =
→slack yellow wool = _______________ (unused resources)

As the red wool and green wool are fully utilized, they are called binding
constraints. Yellow wool is a non-binding constraint.

Note: (1) Binding constraint has 0 slack resource.


(2) The constraints which are used to solve for optimal point are the
binding constraints.

Shadow price
The change in the value of the objective function resulting from 1 unit increase
in the value of the RHS of a constraint is called shadow price.

(I) Consider the red wool constraint 4x + 4y  1400


If an additional kg of red wool is available then the new red wool constraint
becomes 4x + 4y  1401.

x 0 1401/4=350.25
y 1401/4=350.25 0

7
The green and yellow wool constraints are still the same. The objective
function P = 12x + 8y remains the same.

y
700

600

500

400

300

200

100

0 I______I______I______I______I______I______I______I______I______I____ x
100 200 300 400 500 600 700 800 900

The new optimal point is obtained by solving

4x + 4y = 1401---------(5)
6x + 3y = 1800---------(6)

(5)x3 12x + 12y = 4203-------(7)


(6)x2 12x + 6y = 3600-------(8)

(7)-(8) 6y = 603 y = 100 1


2

8
Substituting y = 100 1 into (5), we get 4x + 4 ( 100 1 ) = 1401
2 2
3
 x= 249
4
Max. P = 12x + 8y = 12( 249 3 ) + 8( 100 1 ) = 3801 ($)
4 2
An additional kg of red wool results in an increase of profit = 3801 – 3800
= 1($)
= shadow price of 1 kg of red wool
= max. amount of money spent to get an additional kg of red wool.

(II) Consider the green wool constraint 6x + 3y  1800


If an additional kg of green wool is available then the new green wool
constraint becomes 6x + 3y  1801.
The other constraints and the objective function remain the same.
In this case, the optimal point if obtained by solving
4x + 4y = 1400--------(9)
6x + 3y = 1801--------(10)

(9)x3 12x + 12y = 4200-----(11)


(10)x2 12x + 6y = 3602-----(12)

2
(11)-(12) 6y = 598  y= 99
3
2
Substituting y = 99 into (9), we get 4x + 4( 99 2 ) = 1400
3 3
1
 x= 250
3

9
(II) Consider the yellow wool constraint 2x + 6y  1800
If an additional kg of yellow wool is available then the new yellow wool
constraint becomes 2x + 6y  1801. The new optimal point remains the
same as the original L.P. i.e. x = 250, y = 100, max. P = 3800($)
Therefore, shadow price of 1 kg of yellow wool = 0.

Note: The yellow wool is a non-binding constraint with 700 kg of slack(unused)


yellow wool, thus it is not worthwhile to get additional amount of yellow wool.

Example (Minimization problem)


A manufacturer has decided to market a new fertiliser. The fertiliser is to be a
mixture of 2 ingredients. The properties of the 2 ingredients are:-
Ingredient Quantities per kg
Bone meal Nitrogen Lime Phosphate Cost
A 0.2 0.3 0.4 0.1 $1.2
B 0.4 0.1 0.45 0.05 $0.8
It has been decided that the new fertilizer will have the following properties:
(a) It will be sold in bags weighing at least 100 kg
(b) It must contain at least 15 kg of nitrogen
(c) It must contain at least 8 kg of phosphate
(d) It must contain at least 25 kg of bone meal
What is the cheapest mixture he can make?

Model Formulation
Let x be the number of kg of ingredient A produced
Let y be the number of kg of ingredient B produced
Our objective is to minimize total cost C = _________________
subject to the following constraints:
(weight) x
y

(nitrogen) x
y
10
(phosphate) x
y

(bone meal) x
y
(non-negativity)
No limitation for lime.
(*) is the standard form of minimization problem.

The shaded region _________ is


y the feasible region. It is unbounded.

180

160

140

120

100

80

60

40

20

0 I______I______I______I______I______I______I______I______I______I____ x
20 40 60 80 100 120 140 160

11
Note: the nitrogen constraint 0.3x + 0.1y  15 is a redundant constraint i.e. it
can be omitted without affecting the feasible region.

For the objective function C = 1.2x + 0.8y


Choose C = 1.2x0.8xk
1.2x0.8xk = 1.2x + 0.8y x
y
Take an appropriate k value so that the objective function plotted is inside the
feasible region. In this case , take k = 150.

For minimization problem, the objective function is shifted parallel to itself as


downward as possible but without leaving the feasible region until it touches the
lowest most point. In this case, the optimal point is B which can be obtained by
solving
0.1x + 0.05y = 8 --------(1) → phosphate
x + y = 100 ------(2) → weight

(1)x10 x + 0.5y = 80 ------(3)


(2)-(3) 0.5y = 20
y = 40
Substituting y = 40 into (2), x = 100-40 =60
Minimum cost = min. C = 1.2(60) + 0.8(40) = 104 ($)

Surplus resources (excess resources)


Consider the weight constraint x + y  100
LHS = weight allocated = x + y
RHS = min. weight requirement = 100
→ surplus weight = _________

Consider the nitrogen constraint 0.3x + 0.1y  15


0.3(60) + 0.1(40) = 22  15
→ surplus nitrogen = _________

12
Consider the phosphate constraint 0.1x + 0.05y  8
0.1(60) + 0.05(40) = 8 = RHS
→ surplus phosphate = ________

Consider the bone meal constraint 0.2x + 0.4y  25


0.2(60) + 0.4(40) = 28  25
→ surplus bone meal = __________

The binding constraints are _______________ and _______________.

Shadow cost
Consider the weight constraint x + y  100
If an additional kg is allowed in the requirement of weight then the new weight
constraint becomes x + y  101
The optimal point is obtained by solving x + y = 101
0.1x + 0.05y = 8
x = 59 , y = 42 and min. C = 1.2(59) + 0.8(42) = 104.4

Increase in cost for an additional kg of weight = 104.4 – 104 = 0.4($)


= shadow cost of 1 kg of weight

Consider the phosphate constraint 0.1x + 0.05y  8


If an additional kg is allowed in the requirement of phosphate then the new
phosphate constraint becomes 0.1x + 0.05y  9
The optimal point is obtained by solving x + y = 100
0.1x + 0.05y = 9
x = 80 , y = 20 and min. C = 1.2(80) + 0.8(20) = 112

Increase in cost for an additional kg of phosphate = 112 – 104 = 8($)


= shadow cost of 1 kg of phosphate

13
Similarly, shadow cost of 1 kg of nitrogen = __________
and shadow cost of 1 kg of bone meal = ___________

Simplex Method
It is an iterative process. We start with a feasible solution which is then
improved step by step until an optimal solution is obtained.
Simplex method to solve standard maximization problem is illustrated by the
following example on the manufacturing of cloth.

Example
Maximize total profit P = 12x + 8y
subject to 4x + 4y  1400 (red wool)
6x + 3y  1800 (green wool)
2x + 6y  1800 (yellow wool)
x  0, y  0 (non-negativity)

Solution:
Step 1 Rewrite the objective function as

Step 2 Convert the inequalities into equalities for the constraints by introducing
non-negative slack variables.

where r, s and t represent unused (slack) resources in each constraint.

14
Step 3 Set up the initial simplex tableau which requires
(a) a column for each variable, including slack variables, objective function
and a ‘quantity’ column.
(b) a row for each constraint and a row for the objective function. Row titles
are r, s, t and P.

Title x y r s t P Quantity
r
s
t
P

(c) the coefficients of the variables are now entered into the tableau in the
order in which they appear in the equations of the model.

(d) the initial simplex tableau represents the first feasible solution to the
problem.
r = ________ , s = _________ , t = ___________
x = ________ , y = _________ , P = ___________

Note: an identity matrix for the coefficient of the variables is always found and
the solution can be read directly from the quantity column.

15
Step 4 Iteration
(a) Identify the pivotal column
Consider the objective function
-12x -8y + P = 0
When x increases by 1 unit, P increases by 12
When y increases by 1 unit, P increases by 8
Thus, column x is chosen since it can increase P faster which is called
‘pivotal column’.
Note: the column in the P-row with the most negative entry is
the pivotal column.

(b) Identify the pivotal row (carry out after pivotal column has already
chosen, in this case column x)
Consider the red wool constraint 4x + 4y +r = 1400
1400
Set y = r = 0, x = = 350. This solution cannot be used as the
4
green wool constraint is violated.

Consider the green wool constraint 6x + 3y +s = 1800


1800
Set y = s = 0, x = = 300. This solution can be used as the red
6
and yellow wool constraints are not violated.

Consider the yellow wool constraint 2x + 6y +t = 1800


1800
Set y = t = 0, x = = 900. This solution cannot be used as the red
2
and green wool constraint are violated.

16
Thus, s-row is chosen since the calculated x-value is feasible. The s-row
is called ‘pivotal row’.

Note: Calculate the ratio of the quantity element in each row to the
element in the pivotal column. The row in which this ratio is the least
non-negative number is the pivotal row.

(c) Circle the pivotal element


The pivotal element is the number at the intersection of the pivotal row
and pivotal column.

(d) Form a new tableau


The title of the old pivotal column (____ column) is entered into the title of
the new row (previous ___row) . The title of all other rows stay the same
as previously.

(e) Divide all the elements in the pivotal row by the pivotal element and enter
into the new tableau.

(f) Enter 0 for all the elements in the pivotal column except the pivotal
element.

(g) To adjust the remaining elements. The replacement algorithm is used.

m x n
New element = old element -
pivotal element

where m = element at the intersection of the pivotal column and the row
in which the old element lies
and n = element at the intersection of the pivotal row and the column
in which the old element lies.

17
Remarks:
(i) When choosing the pivotal element, 2 columns tied by having the most
negative P entry, we may choose either column first.
(ii) Choose the element for which one obtained the least non-negative ratios.
If there is a tie for the lowest ratio, we may also choose either row.

(h) When there are no more negative entry in the P-row. Stop. The last
tableau gives the optimal solution.
x = ______ → 250 unit lengths of cloth A
y = ______ → 100 unit lengths of cloth B
r = ______ → slack red wool =0 i.e. red wool is fully used
s = ______ → slack green wool = 0 i.e. green wool is fully used
t = ______ → slack yellow wool = 700 i.e. 700 kg of unused yellow
wool .
Maximum P = 3800($)

At the P-row,
under r column gives the shadow price of 1 kg of red wool = ________
under s column gives the shadow price of 1 kg of green wool = ________
under t column gives the shadow price of 1 kg of yellow wool = ________

18
Title x y r s t P Quantity Ratio
r 4 4 1 0 0 0 1400 1400/4 = 350
s 6 3 0 1 0 0 1800 1800/6 = 300
t 2 6 0 0 1 0 1800 1800/2 = 900
P -12 -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 200 200/2 = 100
x 1 ½ 0 1/6 0 0 300 300/1/2= 600
t 0 5 0 -1/3 1 0 1200 1200/5 = 240
P 0 -2 0 2 0 1 3,600
y 0 1 ½ -1/3 0 0 100
x 1 0 -1/4 1/3 0 0 250
t 0 0 -5/2 4/3 1 0 700
P 0 0 1 4/3 0 1 3,800

Example
A company can produce 3 products A, B and C. The products yield a profit of
$8, $5 and $10 respectively. The products use a machine which has 400 hours
capacity in the next period. Each unit of the products uses 2, 3 and 1 hour
respectively of the machine capacity. There are only 150 units of a component
used by A and C. 200 kg of material Y are available, A uses 2 kg per unit and C
uses 4kg per unit. The marketing department states that no more than 50 units
of B can be sold.
The company wishes to maximize total contribution.
(a) Formulate the linear program.
(b) Solve the linear program.
(c) Interpret the final solution by giving the optimal product mix, slack
resources and shadow prices.

Solution:
(a) Let x1 , x2 and x3 be the no. of units of product A, B and C produced
respectively.
19
Our objective is to maximize contribution P =
subject to
(machine hour)

(component)

(material Y)

(sales of B)

(non-negativity)

(b) Rewrite the objective function -8x1 – 5x2 – 10x3 + P = 0


Add non-negative slack variables S1 , S2, S3 and S4 to the constraints,

2x1 + 3x2 + x3 + S1 = 400


x1 + + x3 + S2 = 150
2x1 + 4x3 + S3 = 200
x2 + S4 = 50

20
Title x1 x2 x 3 S1 S2 S3 S4 P Qty Ratio

S1 3/2 0 0 1 0 -1/4 -3 0 200 400/3


S2 ½ 0 0 0 1 -1/4 0 0 100 200
x3 ½ 0 1 0 0 ¼ 0 0 50 100
x2 0 1 0 0 0 0 1 0 50
P -3 0 0 0 0 5/2 5 1 750
S1 0 0 -3 1 0 -1 -3 0 50
S2 0 0 -1 0 1 -1/2 0 0 50
x1 1 0 2 0 0 ½ 0 0 100
x2 0 1 0 0 0 0 1 0 50
P 0 0 6 0 0 4 5 1 1,050

(c) Interpretation
Product mix : x1 = _________, x2 = ___________, x3 = __________
max . P = ___________

Shadow price of 1 hr. of machine capacity = __________


Shadow price of 1 unit of component = _____________
Shadow price of 1 kg of material Y = _____________
Shadow price of 1 unit of sales of B = _____________

21
Slack resources : S1 = _________unused machine capacity
S2 = _________unused component
S3 = _________ → material Y is fully utilized
S4 = _________ → production of B is just enough
to meet its sales

Products not being produced


A product which is not produced will have a positive figure under its title
column in the P-row. This indicates the total profit that would be lost if
one unit of that product is produced.
In this example, x3 = 0
Under x3 – column in the P-row, we have a positive figure 6. This means
that if one unit of product C is produced, then, after adjusting the product
mix to provide the capacity, the total profit would decrease by $6.

22
Sensitivity Analysis
The study of the effect of discrete changes in the problem’s parameter on the current
optimal solution is referred to as sensitivity analysis.

Example (Manufacturing of clothes)


Given that profit on a unit length of cloth A = $12 and profit on a unit length of cloth B =
$8.
The problem is to maximize P = 12x + 8y
subject to the following constraints:

(red wool) 4x + 4y  1400


(green wool) 6x + 3y  1800
(yellow wool) 2x + 6y  1800
(non-negativity) x  0, y  0

y
700

600 Optimal point B (250, 100)

500

400
(0, 350)

300 (75, 275)


(180, 240)
200
2𝑥 + 6𝑦 ≤ 1800
100 (250,100)

(350, 0)
0 I______I______I______I______I______I______I______I______I______I____
(300, 0) x
100 200 300 400 500 600 700 800 900
4𝑥 + 4𝑦 ≤ 1400
6𝑥 + 3𝑦 ≤ 1800

1
Sensitivity analysis using graphical approach
(I) Sensitivity analysis on the coefficient of the objective function
By how much can the profit from one-unit length of cloth A or cloth B change
before it becomes profitable to change the optimal mix?

Note: A change in the coefficient of the variable in the objective function changes
the slope / gradient of the objective function.

The slope of the green wool binding constraint = ________


The slope of the red wool binding constraint = _________

Now, let the profit of 1 unit length of cloth A be a, then the new objective function
becomes P’ = ax + 8y
The slope of the new objective function = _________

As long as the slope of the new objective function lies between the slope of the
two binding constraints, the optimal mix is still at point B.

a
i.e. −2  −  −1
8
−16  −𝑎  −8

8  𝑎  16

Similarly, let the profit of 1 unit length of cloth B be b, then the new objective
function becomes P” = 12x + by
The slope of the new objective function = ___________

12
The optimal mix is still at B, if −2  −  −1
b
1 𝑏
− ≥ − ≥ −1
2 12
−6 ≥ −𝑏 ≥ −12
6  𝑏  12

2
(II) Sensitivity analysis on the RHS constants of the constraints
This form of sensitivity analysis is concerned with the range over which the
RHS constant of a constraint can fluctuate such that the optimal solution
remains feasible (before the binding constrains become non-binding).

Note: A change in the RHS constant of a linear equation in 2 variables


causes changes in the x-intercept and y-intercept but no change in the slope
of the line.

(a) By how much can the amount of red wool available be changed before
the green and red wool constraints become non-binding?

Let b1 denote the amount of red wool available. The red wool constraint
becomes 4x + 4y  b1

If b1 > 1400 then the red wool constraint is shifted upwards parallel to the
old red wool constraint. The optimal point B will correspondingly shift
along the green wool constraint to a new optimal point of intersection. At
point E(180, 240), the green and red wool constraints are still binding.
Above E, they become non-binding and the yellow and green wool
constraints become binding. Therefore, the maximum value that b1 can
take is _________________ = 1680.

If b1 < 1400 then the red wool constraint shifts downwards. The optimal
B moves down the green wool constraint until point A(300, 0). Beyond A,
the red and green wool constraints become non-binding. Therefore, the
minimum value that b1 can take is ________________ = 1200.

Hence, if 1200  b1  1680, then the green and red wool constraints
will still be binding i.e. the optimal solution remains feasible.

3
(b) By how much can the amount of green wool available be changed
before the green and red wool constraints become non-binding?

Let b2 denote the amount of green wool available. The green wool
constraint becomes 6x + 3y  b2

If b2 > 1800 then the green and red wool constraints will be binding until
point ( , ) . Therefore, the maximum value that b2 can take is
_________________ = 2100.

If b2 < 1800 then the green and red wool constraints will be binding until
point ( , ) . Therefore, the minimum value that b 2 will take is
_________________ = 1275.

Hence, as long as 1275  b2  2100 then the green and red wool
constraints will still be binding.

(c) By how much can the amount of yellow wool available be changed
before the green and red wool constraints become non-binding?

Let b3 denote the amount of yellow wool available. The yellow wool
constraint becomes 2x + 6y  b3

The yellow wool constraint is non-binding and the shifting of b3 will not
affect the optimal point at point B.

If b3 > 1800, the yellow wool constraint will not touch the optimal point
thus the maximum value of b3 = .

If b3 < 1800, the yellow wool constraint can be shifted downwards until
point B. Therefore, the minimum value of b3 = _______________ = 1100.

Hence, as long as 𝑏3 ≥ 1100 then the green and red wool constraints
will still be binding.

4
Sensitivity analysis using Simplex method
Example (Manufacturing of clothes) :The simplex tableau
Title x y r s t P Quantity
r 4 4 1 0 0 0 1400
s 6 3 0 1 0 0 1800
t 2 6 0 0 1 0 1800
P -12 -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 200
x 1 ½ 0 1/6 0 0 300
t 0 5 0 -1/3 1 0 1200
P 0 -2 0 2 0 1 3600
y 0 1 ½ -1/3 0 0 100
x 1 0 -1/4 1/3 0 0 250
t 0 0 -5/2 4/3 1 0 700
P 0 0 1 4/3 0 1 3800

(I) Sensitivity analysis on the coefficient of the objective function


Consider the profit for 1 unit length of cloth A
If  is added to the profit of 1 unit length of cloth A i.e. P = (12 +  )x + 8y. The
only effect on the simplex is some changes in the entries of the P-row for each
tableau.
Title x y r s t P Quantity
r 4 4 1 0 0 0 1400
s 6 3 0 1 0 0 1800
t 2 6 0 0 1 0 1800
P -(12+  ) -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 200
x 1 ½ 0 1/6 0 0 300
t 0 5 0 -1/3 1 0 1200
P 0 -2+  /2 0 2+  /6 0 1 3600+300 
y 0 1 ½ -1/3 0 0 100
x 1 0 -1/4 1/3 0 0 250
t 0 0 -5/2 4/3 1 0 700
P 0 0 1-  /4 4/3+  /3 0 1 3800+250 

5
The same optimal solution will remain if
 4 
1− ≥0 𝑎𝑛𝑑 + ≥0
4 3 3
  4
1≥ 𝑎𝑛𝑑 ≥−
4 3 3

∴ −4    4
Coefficient function:
Add 12 through out (12 +  ): 12 − 4  (12 +  )  12 + 4
8  (12 +  )  16

Consider the profit for 1 unit length of cloth B


If  ’ is added to the profit of 1 unit length of cloth B i.e. P = 12 x + (8+  ’ )y. Rework
the P-row for each tableau.
Title x y r s t P Quantity
r 4 4 1 0 0 0 1400
s 6 3 0 1 0 0 1800
t 2 6 0 0 1 0 1800
P -12 -(8+  ’) 0 0 0 1 0
r 0 2 1 -2/3 0 0 200
x 1 ½ 0 1/6 0 0 300
t 0 5 0 -1/3 1 0 1200
P 0 -2-  ’ 0 2 0 1 3600
y 0 1 ½ -1/3 0 0 100
x 1 0 -1/4 1/3 0 0 250
t 0 0 -5/2 4/3 1 0 700
P 0 0 1+  ’/2 4/3-  ’/3 0 1 3800+100  ’

The same optimal solution will remain if


’ 4 ’
1 + ≥ 0 𝑎𝑛𝑑 − ≥0
2 3 3
’ 4 ’
≥ −1 𝑎𝑛𝑑 ≥
2 3 3
−2   ’  4
Add 8 through out: 6  (8 +  ’ )  12

6
(II) Sensitivity analysis on the RHS constants
If the amount of red wool available becomes 1400 + b. The only effect on the
tableau is some changes in the entries of the quantity column for each tableau.

Title x y r s t P Quantity
r 4 4 1 0 0 0 1400+b
s 6 3 0 1 0 0 1800
t 2 6 0 0 1 0 1800
P -12 -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 (1400+b)-4(300)=200+b
x 1 ½ 0 1/6 0 0 300
t 0 5 0 -1/3 1 0 1200
P 0 -2 0 2 0 1 3600
y 0 1 ½ -1/3 0 0 100+b/2
x 1 0 -1/4 1/3 0 0 250-b/4
t 0 0 -5/2 4/3 1 0 700-5b/2
P 0 0 1 4/3 0 1 3800+b

y, x and t will remain as basic variables if


𝑏
100 + ≥ 0; 𝑏 ≥ −200
2
𝑏
250 − ≥ 0; 𝑏 ≤ 1000
4
5𝑏
700 − ≥ 0; 𝑏 ≤ 280
2

−200  𝑏  280

Add 1400 through out (1400+b) : -1200  1400 + 𝑏  1680

7
If the amount of green wool available becomes 1800+b’ and rework the
quantity column of each tableau.

Title x y r s t P Quantity
r 4 4 1 0 0 0 1400
s 6 3 0 1 0 0 1800+b’
t 2 6 0 0 1 0 1800
P -12 -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 200- 2b’/3
x 1 ½ 0 1/6 0 0 300+b’/6
t 0 5 0 -1/3 1 0 1200-b’/3
P 0 -2 0 2 0 1 3600+2b’
y 0 1 ½ -1/3 0 0 100-b’/3
x 1 0 -1/4 1/3 0 0 250+b’/3
t 0 0 -5/2 4/3 1 0 700+4b’/3
P 0 0 1 4/3 0 1 3800+4b’/3

y, x and t will remain as basic variables if


𝑏’
100 − ≥ 0 ; 𝑏’ ≤ 300
3
𝑏’
250 + ≥ 0 ; 𝑏’ ≥ −750
3
4𝑏’
700 + ≥ 0; 𝑏’ ≥ −525
3

−525  𝑏’  300

Add 1800 through out:


1275  (1800 + 𝑏’)  2100

8
If the amount of yellow wool available becomes 1800 + b”. The only effect on
the tableau is some changes in the entries of the quantity column for each tableau.

Title x y r s t P Quantity
r 4 4 1 0 0 0 1400
s 6 3 0 1 0 0 1800
t 2 6 0 0 1 0 1800+b”
P -12 -8 0 0 0 1 0
r 0 2 1 -2/3 0 0 200
x 1 ½ 0 1/6 0 0 300
t 0 5 0 -1/3 1 0 1200+b”
P 0 -2 0 2 0 1 3600
y 0 1 ½ -1/3 0 0 100
x 1 0 -1/4 1/3 0 0 250
t 0 0 -5/2 4/3 1 0 700+b”
P 0 0 1 4/3 0 1 3800

y, x and t will remain as basic variables if


700 + 𝑏” ≥ 0;
𝑏” ≥ −700

Add 1800 through out : (1800 + 𝑏”) ≥ 1100

9
Inventory Planning and control
Stock comprises a very large part of a business’s working capital and therefore it is
very important to control it effectively.

Functions of inventory ( or keeping stocks)


(a) It acts as a buffer in times when there is an unusually high rate of consumption
(b) It enables the business to take advantage of quantity discount by buying in
large quantity.
(c) The business can take advantage of seasonal and other price fluctuations.
(d) It can be used as a reserve on those occasions when the time between
placing and receiving of order (called the lead time) is longer than average.
(e) Any delay in production caused by lack of parts e.g. raw material is kept to a
minimum.
Inventory costs
Inventory costs can be classified into 4 groups:
(a) Holding costs ( or carrying costs) which may comprise of
(i) cost of capital (money) tied up
(ii) warehousing and handling costs
(iii) deterioration
(iv) obsolescence ( going out of date)
(v) insurance
(vi) pilferage ( stealing in small quantities)
(b) Procuring costs which depend on how the stock is obtained.
(i) ordering costs for goods purchased from outside suppliers e.g. clerical
costs, telephone charges etc.
(ii) production set up costs for goods manufactured internally e.g. costs of
lost production and any other variable costs associated with production
planning.
(c) Shortage cost or stock-out cost which may include
(i) the loss of a customer/sale
(ii) the extra cost of having to buy an emergency supply of stocks at a
higher price
(iii) the cost of lost production and sales where the stock-out brings the
production process to a stop.
1
(d) Item cost i.e. the supplier’s price or the direct cost per unit of production.
This cost must always be considered when
(i) the supplier offers quantity discount
(ii) saving in the direct cost of production in longer ‘batch run’.

Total relevant inventory costs (or total operating costs, TOC) are costs relevant to the
derivation of economic order quantity (EOQ) or economic batch quantity (EBQ).

TOC = holding cost + ordering cost/ set up cost

Economic order quantity (EOQ) is the quantity ordered which minimizes total
operating costs of holding and ordering stocks.

Economic batch quantity (EBQ) is the quantity manufactured which minimizes total
operating costs of holding and setting up of production.

Inventory control

The objective is to maintain stock levels so that the total relevant inventory costs ( or
total operating costs TOC) of the company are at a minimum.

Two basic inventory decisions are to made


(i) the quantity to order/ manufacture at a time
(ii) when to order/ manufacture this quantity

In approaching these 2 decisions, we have to consider 2 paths :


(a) ordering/ manufacturing large quantity to minimize ordering/ set up costs
(b) ordering/ manufacturing small quantity to minimize holding costs

2
Deterministic models
It is one in which all the parameters are known with certainty i.e. demand rate and lead
time are known.

Assumptions
(a) the demand rate is constant
(b) the unit cost is constant
(c) lead time is zero or constant
(d) the cost of holding stock is proportional to the quantity of stock held
(e) no shortage

(I) Purchase model (buying the quantity from outside supplier)


to derive EOQ formula
Notations:
Let S = cost of one purchase order
R = annual demand
C = unit cost
I = holding or carrying cost expressed in % per annum of the
value of average stock held
Q = quantity ordered
TOC = total operating costs (or total relevant inventory costs)

𝑑𝑒𝑚𝑎𝑛𝑑 𝑟𝑎𝑡𝑒 𝑖𝑠 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑜𝑣𝑒𝑟𝑡𝑖𝑚𝑒,


𝑛𝑜 𝑑𝑒𝑙𝑎𝑦 𝑖𝑛 𝑙𝑒𝑎𝑑 𝑡𝑖𝑚𝑒, 𝑛𝑜 𝑠ℎ𝑜𝑟𝑡𝑎𝑔𝑒

𝑄
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑡𝑜𝑐𝑘 =
2

3
Annual ordering costs:
𝐴𝑛𝑛𝑢𝑎𝑙 𝐷𝑒𝑚𝑎𝑛𝑑 𝑅𝑆
𝐶𝑂 = × 𝑜𝑟𝑑𝑒𝑟 𝑐𝑜𝑠𝑡 =
𝑂𝑟𝑑𝑒𝑟 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑄
Annual holding costs:
𝑄
𝐶𝐻 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑡𝑜𝑐𝑘 × ℎ𝑜𝑙𝑑𝑖𝑛𝑔 𝑐𝑜𝑠𝑡 = × 𝐶𝐼
2
Total Inventory costs:
𝑅𝑆 𝑄𝐶𝐼
𝑇𝑂𝐶 = 𝐶𝑂 + 𝐶𝐻 = +
𝑄 2
Differentiate TOC with respect to Q :

𝑑𝑇𝑂𝐶 𝑅𝑆 𝐶𝐼
=− 2+
𝑑𝑄 𝑄 2

𝑑𝑇𝑂𝐶
First order differentiation is max/min when = 0:
𝑑𝑄

𝐶𝐼 𝑅𝑆 2𝑅𝑆
= 2 ∴ 𝑄2 =
2 𝑄 𝐶𝐼

Since Quantity is non negative,


2𝑅𝑆
𝑄=√
𝐶𝐼

𝑑2 𝑇𝑂𝐶 2𝑅𝑆
Second order differentiation, = > 0 hence Q is minimum.
𝑑𝑄 2 𝑄3

2𝑅𝑆
𝐸𝑂𝑄 = 𝑄 = √
𝐶𝐼
TOC is minimum when Q= EOQ.
𝑅𝑆 2𝑅𝑆 𝐶𝐼
𝑇𝑂𝐶 = +√ ∙
𝐶𝐼 2
√2𝑅𝑆
𝐶𝐼
∴ 𝑇𝑂𝐶𝑚𝑖𝑛𝑖𝑚𝑢𝑚 = √2𝑅𝑆𝐶𝐼
4
Graphical approach to derive EOQ

√2𝑅𝑆𝐶𝐼

2𝑅𝑆

𝐶𝐼

Minimum TOC occurs when annual ordering cost equals annual holding cost.

RS QCI 2 RS
= → Q= = EOQ
Q 2 CI

5
E.g. The annual demand forecast for a particular product sold by a retail store is
12,000 units, the cost is $60 per unit. The cost of ordering and receiving
delivery is $30 on each occasion. Stock holding costs are 30% per annum of
stock value. No shortage is allowed.
(a) What is the optimal order size and how many orders should be placed in
a year?
(b) What are the ordering and holding costs and hence what is the total
relevant inventory cost per annum?

6
E.g. Metrobus Company is a city-owned transit company which operates a fleet of
400 buses. The fleet includes buses used for public transit as well as school
buses. Metrobus Company is interested in establishing an inventory cost. All
buses use the same type of tyre and the annual requirements are estimated at
5000 tyres. Ordering cost per order is $125 and cost carrying a tyre in
inventory for one year is estimated at $20. Assuming that lead time is zero
and 300 working days in a year.

(i) Determine the optimal order quantity, minimum total annual relevant
inventory cost and the time between orders.

(ii) If Metrobus Company must order tyres by the dozen, what would be the
percentage change in the total annual relevant inventory cost as
compared to the answers in (i)?

7
(II) Manufacturing model to derive EBQ formula

In purchase model, all items purchased were treated as being received into
inventory at one time. However, when a company manufactures the items, there
is a continuous flow of stock into the inventory as items are completed. The
inventory of finished items does not build up immediately to its maximum point
but it builds up gradually since items are being produced faster than they are
being sold.
Notation
Let Q = no. of units produced per production run
R = annual demand
S = set up cost per production run
V = usage/ sales rate in units per day
P = production rate in units per day
C = unit cost
I = inventory holding or carrying cost expressed
as a % of value of average stock held.
D = no. of days in actual production.

𝐷(𝑃 − 𝑉)
(𝑃 − 𝑉)
𝑉

𝐷(𝑃 − 𝑉)
Average Stock =
2

8
When there is production, P units are produced per day and V units are
demanded per day. There is a net increase of (𝑃 − 𝑉) units per day.
This reached a peak at the end of the actual production (i.e. D days).
𝑇ℎ𝑒 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑠𝑡𝑜𝑐𝑘 = 𝐷(𝑃 – 𝑉)
𝑇ℎ𝑒 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑠𝑡𝑜𝑐𝑘 = 0

𝐷(𝑃 − 𝑉) 𝑄 𝑄 𝑉
𝑇ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑡𝑜𝑐𝑘 = = (𝑃 − 𝑉) = (1 − )
2 2𝑃 2 𝑃

𝐴𝑛𝑛𝑢𝑎𝑙 ℎ𝑜𝑙𝑑𝑖𝑛𝑔 𝑐𝑜𝑠𝑡:


𝑄 𝑉
𝐶𝐻 = (1 − ) × 𝐶𝐼
2 𝑃
𝐴𝑛𝑛𝑢𝑎𝑙 𝑠𝑒𝑡 𝑢𝑝 𝑐𝑜𝑠𝑡:
𝐴𝑛𝑛𝑢𝑎𝑙 𝑑𝑒𝑚𝑎𝑛𝑑 𝑅𝑆
𝐶𝑆 = × 𝑠𝑒𝑡𝑢𝑝 𝑐𝑜𝑠𝑡 𝑝𝑒𝑟 𝑢𝑛𝑖𝑡 =
𝑝𝑟𝑜𝑑𝑢𝑐𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑄

𝑇𝑜𝑡𝑎𝑙 𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑛𝑔 𝐶𝑜𝑠𝑡:


𝑅𝑆 𝑄𝐶𝐼 𝑉
𝑇𝑂𝐶 = 𝐶𝑆 + 𝐶𝐻 = + (1 − )
𝑄 2 𝑃

Differentiate TOC with respect to Q :

𝑑𝑇𝑂𝐶 𝐶𝐼 𝑉 𝑅𝑆
= (1 − ) − 2
𝑑𝑄 2 𝑃 𝑄

𝑑𝑇𝑂𝐶
First order differentiation is max/min when = 0:
𝑑𝑄

𝐶𝐼 𝑉 𝑅𝑆 2𝑅𝑆
(1 − ) = 2 ∴ 𝑄2 =
2 𝑃 𝑄 𝑉
𝐶𝐼 (1 − )
𝑃
Since Quantity is non negative,
2𝑅𝑆
𝑄=√
𝑉
𝐶𝐼 (1 − )
𝑃

9
𝑑2 𝑇𝑂𝐶 2𝑅𝑆
Second order differentiation, = > 0 hence Q is minimum.
𝑑𝑄 2 𝑄3
2𝑅𝑆
𝐸𝐵𝑄 = 𝑄 = √
𝑉
𝐶𝐼 (1 − )
𝑃
TOC is minimum when Q= EBQ.
𝑉
𝑅𝑆 𝐶𝐼 (1 − )
2𝑅𝑆
𝑇𝑂𝐶 = +√ ∙ 𝑃
2𝑅𝑆 𝑉 2
𝐶𝐼 (1 − )
√ 𝑉 𝑃
𝐶𝐼 (1 − )
𝑃
𝑉
∴ 𝑇𝑂𝐶𝑚𝑖𝑛𝑖𝑚𝑢𝑚 = √2𝑅𝑆𝐶𝐼 (1 − )
𝑃

E.g. ABC Company manufactures pencil boxes. The estimated annual demand is
9000 boxes. The set up costs of each production run is $50 and the current
rate of production is 1000 boxes per month. Cost of each box is $4 and the
cost of holding one box in stock for one year is $0.40.

(a) What is the optimal production batch?

(b) What are the set up cost and holding cost and hence what is the total
relevant inventory cost per annum?

10
Quantity Discount (Assume constant demand & lead time = 0)
Advantages of buying in large quantities
(i) lower unit cost
(ii) lower ordering cost
(iii) fewer stock out
(iv) lower transportation cost
Disadvantages of buying in large quantities
(i) higher inventory carrying or holding costs
(ii) more capital required
(iii) Obsolete stock or older stock
(iv) greater risk of deterioration and depreciation of the stock.

E.g. A merchant has an annual demand for a product of 600 items. He buys from a
supplier at a cost of $6 per item and the cost of ordering is $10 per order. The
inventory holding costs are 20% p. a. of stock value. If the supplier offers a 5%
discount on orders of between 200 and 999 items, and a 10% discount on
orders of 1,000 or more. Can the merchant reduce his costs by taking
advantage of either of these discounts?

11
A curve showing annual total cost plotted against ordering size.

Stochastic model
A stochastic model is one in which the rate of demand or lead time is not known with
certainty. In this case, the demand or lead time follows a known probability distribution
(probably constructed from a historical analysis of demand or lead time in the past).
Since demand or lead time is not constant, it is necessary to keep safety stock and set
a reorder point to minimize the risk of stock out.

Reorder point is defined as a condition that signals someone that a purchase order
should be placed to replenish the inventory stock of some items.

Safety stock is the extra inventory held as a buffer or protection against the possibility
of a stock out.

𝑅𝑒𝑜𝑟𝑑𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 = (𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑎𝑖𝑙𝑦 𝑢𝑠𝑎𝑔𝑒 × 𝑙𝑒𝑎𝑑 𝑡𝑖𝑚𝑒 𝑖𝑛 𝑑𝑎𝑦) + 𝑠𝑎𝑓𝑒𝑡𝑦 𝑠𝑡𝑜𝑐𝑘
𝑅𝑒 = 𝑉𝐿 + 𝐵

Note:-
For deterministic model, demand is known with certainty and so safety stock is not
required.
12
Reorder level Reorder point

Lead time

E.g. Find the reorder level for a company which has constant demand 100
units/day, ordering cost $10/order, lead time = 3 days, holding cost/ unit/ year
= $2 and 1 year has 250 working days.
Draw a diagram to show the stock level.

13
Statistics - Introduction
Definition
1. Statistics is commonly regarded as a collection of numerical
facts which are expressed in terms of summarizing
statements and which have been collected either from several
observations or from other numerical data.

2. Statistics may also be regarded as a body of methods and


techniques which has been developed for handling numerical
data. This definition stresses the view that statistics is a tool
concerned with the collection, organization, analysis and
interpretation of numerical facts, observations, and the
drawing of conclusions in situations where uncertainties and
variations are present.

3. Statistics is used to denote a particular quantity that has been


calculated from sample data.

Types of variables
Quantitative variables
- These are values/data which can be quantified. These can be
discrete or continuous.
E.g. no. of people, height of student, weight of student
Qualitative variables
- These are variables which cannot be quantified e.g. beauty,
intelligence, aggressiveness etc. but they can be classified or
ranked.
E.g. Faculty ranking: Lecturer, Senior Lecturer, Assc. Prof.,
Prof.
Discrete variable
- a variable which can take on a finite number of values, usually
occur through the process of counting and usually integer-
valued.
E.g. no. of people
1
Continuous variable
- a variable which can take a value at any fractional point along
a specified interval of finite values, usually generated by the
process of measurement.
E.g. height of student

Sampling
The term population refers to the set of all the item under
consideration (not necessarily human) in a particular enquiry.
A sample is a group of items drawn from the population on which
observations are made. A sample is a part of the population. We
hope to draw some conclusions about the population by studying
the sample.

The purpose of sampling is to get as much information as possible


about the population by observing a small proportion of that
population, i.e. by observing a sample.

Sampling is necessary because:


1. Testing every item is tedious (lots of labor), extremely costly
in time and money.
2. The testing process can be destructive, e.g. testing breaking
strength of rubber bands.
3. The whole population may not be known to us, e.g. surveying
the opinion of TV viewers, not known exactly who watched a
particular program.

2
Useful sample
Conclusion can be drawn about the population if the sample is
1. of the proper size; the larger the sample, the more reliable the
results.
2. the sample should be random to avoid bias in the results.

Sampling frame is a list of all members of the whole population from


which the sample is selected. For a sample to be truly
representative, it is important that the sampling frame should be
complete, up to date and adequate for the purpose.

Methods of presentation of data


1. Tabular presentation - in the form of tables
2. Charts and diagrams : e.g. bar charts, histograms, line charts,
pie charts, ogive etc.

Frequency distribution
A frequency distribution is a grouping of data into classes, showing
the number or frequency of data in each class. A frequency
distribution can be presented in tabular form called frequency
distribution table.
Basic rules of construction
1. Pick out highest and lowest values.
2. Find range (=highest value - lowest value)
3. Divide into class intervals. Normally 5 to 15 classes.
Preferably of equal width (not necessary all the time). Classes
must be chosen so that all the data can be included and each
item of data can only go into one class. Relatively few classes
so that information can be easily grouped to show the pattern
but not too few to lose too much detail.
4. For each figure in the raw data, insert a tally mark against the
appropriate class, e.g. llll llll ll .
5. Total the tally marks.
6. Tabulate the frequency distribution.
3
Some terminology used in the construction of frequency distribution
1. The largest and smallest values that can go into any given
class are referred to as class limits.
E.g. 5 - 6 has lower class limit 5 and upper class limit 6
10 -  20 has lower class limit 10 and upper class limit 20
A class which has either no upper class limit or lower class
limit is called an open ended class.
E.g.  30 is an open ended class with no lower class limit
 50 is an open ended class with no upper class limit
In further calculation, assume open ended class to be of the
same size as the immediate neighboring class.

2. The dividing lines between successive classes are the class


boundaries.

Class intervals class boundaries class size class mark


3 − 4 2.5 − 4.5 4.5 − 2.5 = 2 (4 + 3)⁄2 = 3.5
5 − 6 4.5 − 6.5 2 5.5
7 − 8 6.5 − 8.5 2 7.5

Class intervals class boundaries class size class mark


3 ≤ 8 3−8 8 − 3 = 5 (8 + 3)/2 = 5.5
8 ≤ 13 8 − 13 5 10.5
13 ≤ 18 13 − 18 5 15.5

3. Class size
= 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 − 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
Preferably ( but not necessarily ) equal.

4. Class mark or class mid point


= ½[𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 + 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡]
or
= ½[𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 + 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦]
4
Graphical presentation of frequency distribution

1. Histogram
A histogram has values of the variable on the horizontal
scale and frequencies on the vertical scale. For each class,
a rectangle is constructed with base equal to the class size
and height determined from the class frequency. The areas
of the rectangles must be proportional to the frequencies. If
equal class size for all classes, then the heights of the
rectangles will be proportional to the class frequencies. If
unequal class size, then the height of the rectangle will be
proportional to the class frequency divided by the class size.
The class boundaries are usually marked on the horizontal
scale.

E.g. A random sample of 50 houses is selected and the number


of occupants in each house is noted as follows:

3 2 3 6 3 6 2 4 7 4
6 4 1 5 8 3 4 10 1 5
4 7 11 4 15 3 2 6 3 2
13 1 5 8 4 1 3 2 10 8
3 9 6 3 1 14 4 5 3 1

(i) Compile a frequency distribution of the number of occupants


using 1 - 2 as the first class;
(ii) Construct a histogram for the frequency distribution in part (i)
(iii) Using the frequency distribution in part (i), group the third &
fourth classes and also the last 3 classes, then construct a
histogram.

5
Solution:
(i)
No. of occupants frequency class boundaries class size
1 − 2 11 0.5 − 2.5 2.5 − 0.5 = 2
3 − 4 18 2.5 − 4.5 4.5 − 2.5 = 2
5 − 6 9 4.5 − 6.5 6.5 − 4.5 = 2
7 − 8 5 7.5 − 8.5 2
9 − 10 3 8.5 − 10.5 2
11 − 12 1 10.5 − 12.5 2
13 − 14 2 12.5 − 14.5 2
15 − 16 1 14.5 − 16.5 2

(ii)
Histogram of the number of occupants of
Frequency 50 houses ( equal class size)
20

15

10

class boundaries
0
Number of occupants
(iii)
No. of count class class Adjusted freq.
occupants frequency boundaries size
1 − 2 11 0.5 - 2.5 2 (11 × 2)⁄2 = 11
3 − 4 18 2.5 - 4.5 2 (18 × 2)⁄2 = 18
5 − 8 9+5=14 4.5 - 8.5 4 (14 × 2)⁄4 = 7
9 − 10 3 8.5 - 10.5 2 (3 × 2)⁄2 = 3
11 − 16 1+2+1=4 10.5 -16.5 6 (4 × 2)⁄6 = 1.33
6
2. Frequency Polygon
It is a line graph of class frequency plotted against class mark.
It can be obtained by connecting mid-points of the tops of the
rectangles in the histogram. It is customary to join the points
at each end of the diagram to the base line at the centres of
the adjoining class intervals (i.e. as if there are 2 classes with
0 frequency).The area under the frequency polygon is equal
to the area of the histogram.

3. Cumulative frequency polygon or frequency ogive


A graph showing the cumulative frequency less than any
upper class boundary plotted against the upper class
boundary is called a cumulative frequency polygon or ‘less
than’ ogive.

A graph showing the cumulative frequency more than any


lower class boundary plotted against the lower class boundary
is called a ‘more than’ ogive.
7
Note: The ‘less than’ ogive is implied whenever we refer to
cumulative frequency polygon or ogive.
“Less than” cumulative table:

No of freq boundaries <upper boundary Cum. freq


occupants
< 0.5 0
1 − 2 11 0.5 − 2.5 < 2.5 11
3 − 4 18 2.5 − 4.5 < 4.5 11+18=29
5 − 6 9 4.5 − 6.5 < 6.5 29+9=38
7 − 8 5 7.5 − 8.5 < 8.5 43
9 − 10 3 8.5 − 10.5 < 10.5 46
11 − 12 1 10.5 − 12.5 < 12.5 47
13 − 14 2 12.5 − 14.5 < 14.5 49
15 − 16 1 14.5 − 16.5 < 16.5 50

'<' Ogive of number of occupants of 50 houses

cum. freq.
50
45
40
35
30
25
20
15
10
5
0 class boundaries
0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5

8
4. Frequency curve
As the class intervals are made smaller and smaller and the
number of observations (frequency) increases, the histogram
or frequency polygon will closely approximate curve which is
called frequency curve. A smooth curve is drawn instead of
straight line segments. Smoothed ogive is obtained by
smoothing the ogive.

Some common frequency curves

9
STATISTICAL MEASURES

1.Introduction

Tabular and graphical presentation give very little accurate


quantitative information about set of data. It is often desirable to
condense the data into two or three quantities or measures which
can be used to describe the distribution i.e. to give arithmetic
description of the data.

Statistical Measures Characterising the Distribution

(I) `Average’ or a measure of location is a single value within the


range of data that is used to represent all the values in the
series. An average is sometimes called measure of central
tendency.

(II) A measure of dispersion or measure of variation gives some


indication of the way in which the data vary from one another
and from some average.

(III) A measure of skewness shows the tendency of data to spread


out more on one side of the average than the other.

2.The  (Sigma) notation for summation

If x1 , x2 , x3 , ….., xn are measurements of the variable X, then

(i)  X = x1 + x2 + x3 + …..+ xn

(ii)  X2 = x12 + x22 + x32 + ….. + xn2

(iii) ( X)2 = (x1 + x2 + x3 + …..+ xn)2

1
(iv)  (X + Y) = X + Y where Y is another variable

(v) k = nk where k is a constant

(vi) kX = k  X

(vii) XY = x1 y1 + x2 y2 + x3 y3 + …..+ xn yn


𝑛

∑ 𝑋 = ∑ 𝑥𝑖
𝑖=1

3. Types of Measures of location

i. Arithmetic mean,𝑿 ̅
It is defined as the value each item in the distribution would
have if all the value are shared out equally among all the
items.
𝑥 +𝑥 +⋅⋅⋅+𝑥𝑛
a. For ungrouped data, x1, x2, x3, …, xn 𝑋̅ = 1 2𝑛

E.g. Five operators in a small factory are receiving


$1.78, $1.80, $1.89, $1.83, $1.93 per hour.

The arithmetic mean,

X = $1.78 + $1.80 + $1.89 + $1.83 + $1.93


5
= $1.85 per hour

2
b. For grouped data of a frequency distribution
The arithmetic mean, X = f1 x1 + f 2 x2 + ...... + f n xn
f1 + f 2 + ...... + f n

E.g. calculate the mean score for the following frequency


distribution.
Scores Number of candidates
10 - 19 1
20 - 29 6
30 - 39 9
40 - 49 31
50 - 59 42
60 - 69 32
70 - 79 17
80 - 89 10
90 - 99 2

Solution:
Scores Number of candidates 𝑿 𝒇𝒙 𝒙𝟐 𝒇𝒙𝟐
10 - 19 1 14.5 14.5 210.3 210.25
20 - 29 6 24.5 147 600.3 3601.5
30 - 39 9 34.5 310.5 1190 10712
40 - 49 31 44.5 1379.5 1980 61388
50 - 59 42 54.5 2289 2970 124751
60 - 69 32 64.5 2064 4160 133128
70 - 79 17 74.5 1266.5 5550 94354
80 - 89 10 84.5 845 7140 71403
90 - 99 2 94.5 189 8930 17861
150 8505 517408
Mean score X =
 fx =
8505
=
f 150

3

ii. Mode X
The mode of a set of data, is that value which occurs with the
greatest frequency i.e. the most common value in the
distribution. The mode may not exist and even if it does , it
may not be unique.

Unimodal distribution Bimodal distribution

a. For ungrouped data

E.g. { 1, 2, 3, 4, 5, 8 } has no mode

{ 1, 2, 2, 3, 4, 5, 5, 8 } has 2 modes :2 & 5

{1, 2, 2, 3, 3, 3, 4, 5, 5, 8 } has 1 mode : 3

b. For grouped data

From the histogram, the mode is located by the


intersection of the dotted lines as shown in the following
diagram or use interpolation.

4
frequency

~ 1
Mode, X = L X~ + ( ) x C X~ where
1 +  2
L X~ = lower boundary of modal class
1 = excess of modal frequency over the frequency of the
preceding class.
 2 = excess of modal frequency over the frequency of next class.
C X~ = class size of modal class.
E.g. Find the modal height for the distribution of height of 200
trees
Class ( metress ) No. of trees ( freq. )
60 - 62 10
63 - 65 36
66 - 68 84
69 - 71 54
72 - 74 16

84 − 36
𝑀𝑜𝑑𝑒 = 65.5 + ( ) (68.5 − 65.5) = 67.3𝑚
(84 − 36) + (84 − 54)

5
~
iii. Median X

The median is defined as the value of the middle item when the
data are arranged in an increasing or descending order of
magnitude. It divides a set of data such that at least half of data
items are as large as or larger than it is , and at least half of the
data items are as small as or smaller than it is .

a. For ungrouped data x1 , x2 , x3 , ….., xn


~ n +1
If n is odd, then X is the value of the ( ) th item.
2

~ n
If n is even , then X is the mean of the values of the ( ) th
2
n
item and ( +1) th items.
2

E.g. Find the median of the following data


50, 84, 91, 72, 68, 87, 78 n = 7 is odd.

Arranging the data in ascending order of magnitude.


50, 68, 72, 78, 84, 87, 91.
~
X = item 4th = 78

E.g. Find the median of the following data


7, 10, 12, 15, 18, 20, 25, 38, 45, 64 n = 10 is even

The data is in ascending order of magnitude


~ (18+20)
X = item 5.5th = = 19
2

6
b. For grouped data

The steps in computation


1. The first step is to calculate the cumulative frequency.

2. Position of
~
X = f is calculated.
2

~
3. To locate the X class : -

4. Starting with the first cumulative frequency , look for a


value of the cumulative frequency which is either equal
to the value of  f or higher than the value of  f .
2 2
The class corresponding to the so obtained cumulative
~
frequency is in the X class.

[
f − ( f ) X~ ]
~ 2
X = LX~ + x C X~
f X~

~
where LX~ = lower boundary of the X class
~
( f ) X~ = cumulative frequency preceding the X class
f X~ = frequency of the median class
~
C X~ = class size of the X class

7
E.g. Find the median height of the following frequency distribution

Class (meters) number of trees


60 - 62 10
63 - 65 36
66 - 68 84
69 - 71 54
72 - 74 16
200

Solution:
Class Cumulative
number of trees Class boundaries
(meters) frequency
60 - 62 10 59.5 - 62.5 10
63 - 65 36 62.5 - 65.5 46
66 - 68 84 65.5 - 68.5 130
69 - 71 54 68.5 - 71.5 184
72 - 74 16 71.5 - 74.5 200

~ 200
Position of X = = 100
2
~
X class boundaries : L=65.5 C=68.5-65.5=3
100−46
𝑚𝑒𝑑𝑖𝑎𝑛 = 65.5 + ( ) × 3 = 67.4𝑚
84

The median height is 67.4 meters shows that 50% of the


trees have height  67.4 meters and the other 50% of the
trees have height  67.4 meters.

8
The fractiles
The median belongs to a general class of statistical
description called fractiles. A fractile is a value below which
lies a given percentage of the data ; for the median this
percentage is 50%. The quartiles divide the data into 4 equal
parts , the deciles divide the data into 10 equal parts, the
percentiles divide the data into 100 equal parts , etc.

Formulae of fractile
(a) Position of Qi = n xi ; i = 1, 2, 3
4

n
[ xi − ( f ) Qi ]
Qi = LQi + 4 xC Qi
f Qi

n
(b) Position of Di = xi ; i = 1, 2, …, 9
10

n
[ xi − ( f ) Di ]
Di = LDi + 10 xC Di
f Di

n
(c) Position of Pi = xi ; i = 1, 2, …,100
100

n
[ xi − ( f ) Pi ]
Pi = LPi + 100 xC Pi
f Pi

9
Locating Fractiles from the ‘<’ ogive.

(Cum. freq.)

3n
4

n
2

n
4

class boundaries

Q1 ~ Q3
X

E.g. Find Q 3 , D 7 and P 20 of the distribution on height of trees

Class ( metress ) No. of trees ( freq. ) cumulative frequency


60 - 62 10 10
63 - 65 36 46
66 - 68 84 130
69 - 71 54 184
72 - 74 16 200
200

10
No. of trees
'<' ogive of height of trees
(Cum. freq.)

200

150

100

50

class boundaries
0
59.5 62.5 65.5 68.5 71.5 74.5

3×200
Position of Q3= =150 Q3= 70 meter
4
75% of the trees have height < 70meter.
7×200
Position of D7= = D7= 69 meter
10
70% of the trees have height < 69 meter.
20 ×200
Position of P20= = P20= 65 meter
100
20% of the trees have height < 65meter.

(40−10)
Using formula: 𝑃20 = 62.5 + × 3 = 65
36

11
Relationship between mean, median and mode

For unimodal distributions which are moderately skewed, i.e. not


too far from symmetry, then

Mean - mode  3 ( mean - median )


~
X − Xˆ  3 ( X − X )

Skewness : asymmetry ( not symmetrical )

a. Symmetry, not skewed


~
X = X̂ = X

~
X = X̂ = X

b. Positively skewed

12
c. Negatively skewed

~
X  X  X̂

4. Measure of dispersion (spread)


Measures of dispersion ( or variability: how much the data differ
from each other) are concerned with describing a group of values
in terms of variability among the data items included within the
group whereas measures of central tendency / location are
concerned with identifying the ‘typical’ value in a group of data.

Small spread

Big spread

13
The extent of variability or the ‘spread’ of the data may be
described by the range, the quartile deviation and the standard
deviation.

Small dispersion big dispersion


-values are concentrated -data differ more
around the ‘typical’ value from each other

(i) Range
a. For ungrouped data
Range = maximum value – minimum value
E.g. {2, 2, 3, 4, 5} ; range = 5 – 2 = 3

{2, 5, 7, 10, 100} ; range = 100 – 2 = 98

b. For grouped data


Range is taken to be the difference between the upper
boundary of the highest class and the lower boundary of the
lowest class.
E.g. 20 – 29
30 – 39 range = 59.5 – 19.5 = 40
40 – 49
50 – 59

Range is simple and easy to calculate but sensitive to


extreme values.

14
(ii) Quartile deviation, Q.D. or semi-interquartile range
It gives the average amount by which the two quartiles Q1
and Q3 differ from the median.

Q3 − Q1
Q. D. =
2
a. For ungrouped data
E.g. Find the Q.D. of the following data:
2, 2, 2, 3, 4, 5, 5, 6, 6, 6

b. For grouped data


Use formulae to calculate Q1 and Q3 or estimating Q1 and Q3
from the ‘<’ ogive then calculate Q.D.

Advantages
1. It is not affected by extreme values because it is calculated
on the central values of the distribution.
2. Calculation is unaffected by open-ended classes.

Disadvantage
It is not fully representative of a set of measurements as it is
not based on all the information available.

15
(3) Standard deviation
This is the most important measure of dispersion. It can be
used for further statistical analysis.
Standard deviation is the root-mean-square deviation
between the individual values and the mean in a distribution.

Consider a set of data: x1 , x2 , …, xn

Let mean of the data be: x

The deviations of each value x1 − x , x2 − x , …, xn − x


from the mean:

square deviations:
(x1 − x )2 , (x2 − x )2 , …, (xn − x )2
mean-square deviation: ( x − x )
2

( x − x )
root-mean-square deviation: 2

16
Computation of the standard deviation:

a. Ungrouped data

Population standard deviation, Sample standard deviation,


( x −  ) ( x − x )
2
2
= s=
N n −1

Alternatively:

Population Standard deviation, Sample standard deviation,


 x2   x 
2
2
x x
2
= −  s= −  x
n
N  N  n  n  n −1

• The standard deviation computed from population data is denoted by


the symbol  (pronounced as sigma); the standard deviation computed
from sample data is denoted by s.

Variance

• Variance is the mean-square deviation between the individual values


and the mean in a distribution.

• Variance is also called the square of the standard deviation in a


distribution.
In general, it is difficult to interpret the meaning of the value of variance
because the units are squared values. Hence, standard deviation is
more frequently used.

17
E.g. Find the standard deviation and variance for the following data:
2, 12, 7, 5, 9
N=5
∑x = 2 + 12 + 7 + 5 + 9 = 35
∑x2 = 22 + 122 + 72 + 52 + 92 = 303
Population standard deviation,
 x2   x 
2 2
303  35 
= −  = −   = 11.6 = 3.41
N  N  5  5 
Population variance,  = 3.41 = 11.6
2 2

E.g. During a particular summer month, the number of central air-


conditioning units sold by a random sample of 5 salespersons from a
heating and air-conditioning firm were as follows:
8, 11, 5, 12, 8
Find the sample standard deviation and the sample variance.
n=5
∑x = 8 + 11 + 5 + 12 + 8 = 44
∑x2 = 82 + 112 + 52 + 122 + 82 = 418

Sample standard deviation,


2 2
x x 418  44 
2
n 5
s= −  x = −  x = 2.77 units.
n  n  n −1 5  5 4

Sample variance, s2 = 2.772 =7.7 units2

18
b. Grouped data

Population standard deviation, Sample standard deviation,


 f (x −  )  f (x − x )
2
2
= s=
f n −1
Where n =  f
Alternatively:

Population Standard deviation, Sample standard deviation,


2
 fx 2   fx   fx 2   fx 
2
f
= −   s= −   x
f   f  f f   f −1
E.g. Find the mean and standard deviation of the following frequency
distribution.
Class interval Frequency
0–6 2
6 – 12 4
12 – 18 10
18 – 24 12
24 – 30 8
30 – 36 4

Class interval Class mark, x f fx fx2


0–6 3 2 6 18
6 – 12 9 4 36 324
12 – 18 15 10 150 2250
18 – 24 21 12 252 5292
24 – 30 27 8 216 5832
30 – 36 33 4 132 4356
Total 40 792 18072

19
 fx 792
Population mean,  =  f = 40 = 19.8

Population standard deviation,


2
 fx 2   fx 
2
18072  792 
= −   = −  = 59.76 = 7.73
f   f  40  40 

E.g.The output distribution for a sample of 100 workers in BB Company


is shown below:
Output Number of
(units) workers
21 – 25 10
26 – 30 35
31 – 35 16
36 – 40 14
41 – 45 12
46 – 50 10
51 – 55 3
Calculate the mean and the standard deviation.

Output Class f fx fx2


(units) mark, x
21 – 25 23 10 230 5290
26 – 30 28 35 980 27440
31 – 35 33 16 528 17424
36 – 40 38 14 532 20216
41 – 45 43 12 516 22188
46 – 50 48 10 480 23040
51 – 55 53 3 159 8427
Total 100 3425 124025

20
 fx 3425
Sample mean, x = = = 34.25 units
f 100

Sample standard deviation,


2
  fx 
2
 fx
2
f 124025  3425  100
s= −   x = −  x = 8.24
f   f   f − 1 100  100  100 − 1
units.

Remarks:
(1) The larger the value of the standard deviation, the greater the
dispersion or spread of the data from the mean. If the standard
deviation is large, the mean is not really suitable as a
‘representative’ value. If the standard deviation is small then
mean is a more representative value as the data values are
concentrated around the mean.

(2) For normal distribution, it is known that


(a) 68.27% of the values are located within one standard
deviation from the mean ( X - s, X + s).
(b) 95.45% of the values are located within two standard
deviation from the mean ( X - 2s, X + 2s).
(c) 99.73% of the values are located within three standard
deviation from the mean ( X - 3s, X + 3s).

Almost all the data lie in ( X - 3s, X + 3s).

Data range  6s
1
We may estimate s approximately by × 𝑟𝑎𝑛𝑔𝑒.
6

21
<- 68.27%->

--------95.45%----------→

X - 2s X -s X X +s X +2s

5. Skewness
It is the degree of asymmetry or departure from symmetry of a
distribution. A measure of skewness indicates not only the amount
of ‘asymmetry’ but also its direction. A set of data is said to be
skewed in the direction of the extreme values, or speaking in
terms of frequency curves, in the direction of the ‘tail’.

5.1 The Pearson’s coefficient of skewness , Sk


~
3( X − X ) X − Xˆ
Sk = or
s s
Remarks:
(1) Range for Sk : -3  Sk  +3
(2) For symmetrical distribution, Sk = 0.
If Sk is positive then the distribution is skewed to the
right.
If Sk is negative then the distribution is skewed to the
left.
X − Xˆ
(3) is not normally used because certain distribution
s
has no or more than 1 mode.

22
E.g. The mean and median of a distribution are found to be
30.9 and 28.8 respectively. The standard deviation is
13.23.
~
3( X − X ) 3(30.9 − 28.8)
Sk = = = 0.48 shows positive skewness
s 13.23
(i.e. skewed to the right) and the degree of skewness is
moderate.

6. Rough guide for the choice and pairing of measure of central


tendency and dispersion.

(1) For almost symmetrical distribution, use mean and standard


deviation - take into account of every item.
- convenient for further statistical analysis and
mathematical treatment.
-
(2) For highly skewed distribution, use median and quartile deviation
- less affected by extreme values.

(3) Between mode and median – depends on degree of


concentration of data around the mode.

(4) Overall, depend on purpose .


E.g. sizes of shoes to be manufactured ------ X̂
E.g. wage negotiation by employees ---------- X~

(5) In the presence of open-ended class, value of ( X~ , Q.D.) is


definite, whereas ( X , S.D.) depend on assumptions on the class
marks of the open-ended classes.

23
CORRELATION AND LINEAR REGRESSION
Correlation
▪ measures the strength of the relationship between two
variables
▪ involves a bivariate data / distribution

Regression
▪ a study to identify the relationship between two or more
variables using a mathematical equation
▪ is normally used for estimation purposes

Independent Variable / Explanatory Variable (X)


▪ the variable that is used to explain the variation in the
dependent variable
▪ the variable that is used as a basis for prediction
Dependent Variable (Y)
▪ the variable to be predicted or explained

Example:
A study on relationship between the sales of ice cream and the
temperatures
▪ temperature is an independent variable since it can be
used to explain the sales of ice cream
▪ sales is a dependent variable since the sales depends on
temperature

Univariate distribution
▪ data of single characteristic is grouped together
▪ Example: height of student, price of item etc

Bivariate distribution
▪ data of two characteristics are grouped together
▪ Example: sales of ice cream and temperature, sales of good
and advertisement expenses.

1
Scatter Diagram
▪ a plot of paired observations ( X, Y )
▪ illustrates whether
 any relationship between the dependent and independent
variables exists
 the relationship is positive or negative
 the relationship is linear or non-linear
▪ A positive relationship exists when both variables ↑ (or ↓)
at the same time
▪ In a negative relationship, as one variable ↑, the other
variable will ↓, and vice versa

Example:
The data below relates the weekly maintenance cost ($) to the
age (in months) of ten machines of similar type in a
manufacturing company.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395
Construct a scatter diagram and comment on it.
Solution:
Scatter Diagram of Weekly Maintenance
Cost and Age of Machine
400
Maintenance cost ($)

350
300
250
200
150
0 10 20 30 40 50 60
Age of machine (months)

Comment:

2
Two types of correlation
1. Linear correlation
✓ correlation is said to be linear if the relationship can be
represented by a straight line
2. Non-linear correlation (or curvilinear correlation)
 correlation is said to be non-linear if the relationship can be
represented by a curve
Positive linear correlation
▪ An increase in the independent variable (X) will result an
increase in the dependent variable (Y)

Negative linear correlation


▪ An increase in the independent variable (X) will result a
decrease in the dependent variable (Y)

Correlation Coefficient ( r )
▪ measure the strength of linear relationship between two
variables
▪ has a range of values from –1 to +1 i.e. − 1  r  +1
▪ if r = 0 , then there is no linear relationship between the
two variables
▪ words of difference strength are used to describe the
degree of correlation, rough guides are listed in the
following table for interpretation purpose

Degree of correlation Positive linear Negative linear


correlation correlation
Perfect +1 -1
Strong very 0.9  r  1.0 -1.0  r  -0.9
fairly 0.8  r  0.9 -0.9  r  -0.8
Moderate 0.6  r  0.8 -0.8  r  -0.6
Weak fairly 0.3  r  0.6 -0.6  r  -0.3
very 0.0  r  0.3 -0.3  r  0.0
Absent 0

3
r  Correlation coefficient
▪ The degree of strength of the relationship does not depend
on the sign of the coefficient of correlation.
▪ E.g. Coefficient of – 0.92 and + 0.92 have equal strength,
both indicate very strong correlation between the two
variables.

Examples of scatter diagram:

Perfect positive linear correlation Perfect negative linear correlation


(r = +1) (r = -1)
10
y y
10

5 5

x 0
x
0
0 2 4 6 8 10 0 2 4 6 8 10

Strong positive linear correlation Weak negative linear correlation


(x and y strongly linearly related) (x and y somewhat linearly related)
10
y y
10

5 5

x x
0
0
0 2 4 6 8 10
0 2 4 6 8 10

No linear correlation (r=0)


(x and y not linearly related)
10
y

x
0
0 2 4 6 8 10

4
Product Moment Correlation Coefficient, r

The product moment correlation coefficient provides a


measure of the strength of the linear relationship that exists
between two variables, X and Y.
nXY − (X )(Y )
r=
[ nX 2 − (X ) 2 ][nY 2 − (Y ) 2 ]
where n is the number of pair bivariate ( X , Y ) values.

Example:
Calculate the product moment correlation coefficient for the
following data. What does the value of the coefficient indicate?

X 5 6 7 9 8
Y 8 9 9 11 13

Solution:
X Y XY X2 Y2
5 8 40 25 64
6 9 54 36 81
7 9 63 49 81
9 11 99 81 121
8 13 104 64 169
X = 35 Y = 50 XY = 360 X 2 = 255 Y 2 = 516

nXY − (X )(Y )


r=
[ nX 2 − (X ) 2 ][nY 2 − (Y ) 2 ]
5(360) − (35)(50)
=
[5(255) − (35 ) 2 ][5(516) − (50) 2 ]
= 0.7906
r = 0.7906 indicates that there is a moderately high positive
linear correlation between X and Y. As X increases, Y would
also increase.

5
Example:
Refer to the data given in the previous example, calculate the
product moment correlation coefficient between age and
maintenance cost. Hence, find the coefficient of determination
and comment on the results.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395

Solution:
X Y XY X2 Y2
5 190 5(190)= 52 = 1902=
10 240 2400 100 57600
15 250 3750 225 62500
20 300 6000 400 90000
30 310 9300 900 96100
30 335 10050 900 112225
30 300 9000 900 90000
50 300 15000 2500 90000
50 350 17500 2500 122500
60 395 23700 3600 156025
∑X= ∑Y= ∑ XY = ∑ X2 = ∑ Y2 =
300 2970 97650 12050 913050

nXY − (X )(Y )


r=
[ nX 2 − (X ) 2 ][nY 2 − (Y ) 2 ]

Comment:

About % of the variation in Cost (Y) can be explained by


the variation in Age (X). (100%- %) is explained by other
factors.

6
Correlation and Causation

✓ Causation  Correlation

E.g. Age of machine causes the maintenance cost to


increase; therefore there is causation between age and the
maintenance of the machine. Since there is causation,
definitely there is a correlation between the age and the
maintenance of the machine.

✓ Correlation not  Causation


E.g. There might be a strong positive correlation between ice
cream sales and umbrella sales but this does not show that if
you want to increase umbrella sales, you should start a sales
drive or promotion on ice cream, there is a common cause
here, namely the changes in season and weather.
Thus, care must be taken as not to interpret a high correlation
between 2 variables into a cause and effect relationship
unless the relationship is meaningful.

Spearman’s Rank Correlation Coefficient, r s


▪ measures the correlation based on the ranks of two sets of
data (X and Y)
▪ as an approximation to the product moment correlation
coefficient
▪ can be used even though the variables to be correlated are
not represented in numeric form (qualitative data)

▪ Example: 1. Discipline and exam marks.


2. Job performance and qualification.

▪ Ranking are usually allocated in ascending order; rank 1


to the smallest item; rank 2 to the next larger and so on,
although it is perfectly feasible to allocate in descending
order. However, which method is selected must be used on
both variables.

7
The procedure for obtaining r s is given as follows:
STEP 1 Rank the X values (to give R 1 values)
STEP 2 Rank the Y values (to give R 2 values)
STEP 3 For each pair of ranks, calculate d 2 = (R1 – R2)2
and then calculate  d 2
STEP 4 The value of the Spearman’s rank correlation
coefficient can be found using the following
formula:
6 d 2
rs = 1 − − 1  rs  +1
n(n 2 − 1) with

where rs  rank coefficient of correlation


d  difference between two corresponding ranks
( d = R1 − R2 )
R1  rank of X
R2  rank of Y
n  number of pairs of observations

Example (Data had been ranked)


X and Y were judges at a beauty contest in which there were 10
competitors. Their rankings are shown below.
Competitor A B C D E F G H I J
X 4 9 2 5 3 10 6 7 8 1
Y 6 10 2 8 1 9 7 4 5 3
Calculate a coefficient of rank correlation between these two
sets of rankings and comment briefly on your result.

Solution:
Competitor A B C D E F G H I J
R1 4 9 2 5 3 10 6 7 8 1
R2 6 10 2 8 1 9 7 4 5 3
d = R 1– R 2 – 2 – 1 0 –3 2 1 –1 3 3 –2
d2 4 1 0 9 4 1 1 9 9 4

n = 10, d = 42
2

6d 2 6(42)
rs = 1 − = 1 − = 0.7455
n(n − 1)
2
10(10 − 1)
2

8
Comment:
Spearman’s coefficient of rank correlation for the data is 0.7455,
indicating that there is a moderately degree of association
between rankings of X and Y i.e. the opinions of the 2 judges
agree moderately well.

Example (Data had not been ranked)


The following data show the average rent and rates (RM per
square feet) for a selection of areas.
Rate 1.68 1.46 1.57 13.37 3.18 1.95 1.07 1.71 1.22 6.46
(X)
Rent 3.81 4.19 4.87 22.85 6.47 6.48 2.66 6.49 5.33 15.23
(Y)

Calculate Spearman’s rank correlation coefficient to access


the degree of correlation between rate and rent. Comment on
your finding.
Solution:

Rate (X) Rank of X Rent (Y) Rank of Y d = R1 – R2 d2


(R 1 ) (R 2)
1.68 5 3.81 2
1.46 3 4.19 3
1.57 4 4.87 4
13.37 10 22.85 10
3.18 8 6.47 6
1.95 7 6.48 7
1.07 1 2.66 1
1.71 6 6.49 8
1.22 2 5.33 5
6.46 9 15.23 9
d 2 = 26
6d 2
n=10, rs = 1 − = =0.8424
n(n − 1)
2

There exist a fairly strong positive rank correlation between


rankings of rate and rankings of rent. High rankings of rate
normally paired with high rankings of rent and vice versa.

9
Note:
Sometimes two or more individuals or entries may be tied in
rank, in this case, each is given the average of the ranks as
shown by the following example

Salesman 1 2 3 4 5 6 7 8
Sales 20 35 25 20 35 40 20 10
Ranking 5 8 1

Salesman 1, 4 and 7 are tied for rank 2, 3 and 4, the average of


2, 3 & 4 = (2+3+4)/3 = 3 is assigned to each of these 3
salesmen.
Salesman 2 and 5 are tied for rank 6 and 7, the average of 6 & 7
= 6.5 is assigned as the rank for each of these 2 salesmen.

Example (Data had tied rank)


The following data relate to the number of vehicles owned per
100 population (X) and road deaths per 100,000 population for
12 countries. Calculate the Spearman’s rank correlation
coefficient and comment on the result.
X 30 31 32 30 46 30 19 35 40 46 57 30
Y 30 14 30 23 32 26 20 21 23 30 35 26

Solution:
X R1 Y R2 d = R1– R 2 d2
30 30 - 5.5 30.25
31 6 14 1 5 25
32 7 30 -2 4
30 23 -1 1
46 32 11 - 0.5 0.25
30 26 -3 9
19 1 20 2 -1 1
35 8 21 3 5 25
40 9 23 4.5 20.25
46 30 1.5 2.25
57 12 35 12 0 0
30 26 -3 9
d 2 = 127

10
6d 2 6(127)
n=12, rs = 1 − = 1 − = 0.5559
n(n 2 − 1) 12(12 2 − 1)
The result shows that there is fairly weak positive correlation
between rankings of vehicles owned and rankings of number
of road deaths.

Comparison of product moment correlation and rank


correlation:

Product moment coefficient, r


▪ The standard measure of correlation
▪ Data must be numeric

Spearman’s rank coefficient, r s


▪ Only an approximation to the product moment coefficient
▪ Easier to use with less calculations
▪ Can be used with non-numeric data
▪ Can be insensitive to small changes in actual values. This is
easily seen using the data values 12.3, 12.4 and 23, say,
where the allocated ranks would be 1, 2 and 3. No account
is taken of the small difference between the first two values
compared with the large difference between the second and
third values

11
LINEAR REGRESSION
▪ Regression is concerned with obtaining a mathematical
equation which describes the relationship between two
variables
▪ The equation can be used for comparison or estimation
purpose

Simple Linear Regression


▪ the simplest form of linear relationship between two
variables
▪ Y=a+bX
where Y  dependent variable
a  interception of the line at the y-axis
b  regression coefficient(slope/gradient)of the line
X  independent variable

Note: 1. b indicates the changes in Y when a unit change in X


2. b is positive  positive linear relationship between X
and Y
3. b is negative  negative linear relationship between X
and Y

Least squares method


▪ the standard technique for obtaining a linear regression line
such that the error sum of squares is the minimum, i.e. the
least squares regression line gives a minimum value for the
sum of the squares of the vertical deviations of every scatter
point from the regression line.

Least squares regression line


▪ The least squares regression line of Y on X is Yˆ =a+bX
where
nXY − (X )(Y )
b=
nX 2 − (X ) 2
Y X
a = Y − bX or a = −b
n n
and n  total observations in a set of bivariate data ( X, Y ).

12
Notes:
For any set of bivariate data, the least squares regression line of
Y on X
1. is used to estimate a value of Y given a value of X
2. passes through the mean point ( X , Y ) of the data

Example
The following table shows the output at a factory and costs of
production over the past 5 months. Find the equation of the least
squares regression line.
Month 1 2 3 4 5
Output(000’s units) 20 16 24 22 18
Costs (RM’000) 82 70 90 85 73

Solution:
Let X = output in 000’s units; Y = total costs in RM’000.
X Y XY X2
20 82 20(82)= 202=
16 70 1120 256
24 90 2160 576
22 85 1870 484
18 73 1314 324
X = 100 Y = 400 XY = 8104 X 2 = 2040

nXY − (X )(Y )


b= =
2 2
nX − (X )

Y X
a= −b =
n n

The regression line is Ŷ = a + bX =

13
Regression Analysis as a forecasting tool
• Two types of estimation using the regression equation
1. Extrapolation estimate
 Extrapolation  find the value of Y outside the observed
range of X
 most commonly used for forecasting using a time series
 may be less accurate and unreliable to a certain extent
2. Interpolation estimate
 Interpolation  find the value of Y within the observed
range of X
 forecasting using interpolation is more accurate and
more reliable than using extrapolation

Example:
The data below relates the weekly maintenance cost ($) to the
age (in months) of machines of similar type in a manufacturing
company.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395
(a) Find the least squares regression line of maintenance cost
on age.
(b) Using the regression line, predict the maintenance cost for
a machine of this type, which is 40 months old. Comment
on the accuracy of your estimate.
(c) Plot a scatter diagram and draw the regression line.
(d) Predict the maintenance cost for a 40-month old machine
graphically.

Solution:
From the previous Example, we have
∑ X = 300 ∑ Y = 2970 ∑ XY = 97650 ∑ X2 = 12050 ∑Y2 = 913050
nXY − (X )(Y )
(a) b=
nX 2 − (X ) 2 = =2.8033
Y X
a= −b
n n = =212.901
The least squares regression line of maintenance cost on age
is Ŷ = 212.901 + 2.8033X

14

(b) When X = 40, Y =
Comment: This estimate is obtained by interpolation since
X=40 lies within the range of X, i.e., [5,60]. Hence, it is more
accurate and reliable.

(c) Scatter diagram and the regression line:


Scatter Diagram of Weekly Maintenance Cost
and Age of Machine
400
Maintenance cost ($)

350
300
250
200
150
0 10 20 30 40 50 60
Age of machine (months)

Plotting the regression line:


X

Y

(d) From the regression line, maintenance cost for a 40-month


old machine is $ .

15
Interpretation of 'a ' and 'b '

In the regression equation Y = a + b X,
✓ a is the estimated value of Y when X = 0; i.e. the Y-intercept
value
✓ b indicates the changes in Y when a unit change in X
✓ b is positive  positive linear relationship between X and Y
✓ b is negative  negative linear relationship between X and Y
✓ b will always have the same sign as the coefficient of
correlation, r
Example:

If Y = a + b X = 3.33 + 0.47 X, then interpret the values of 'a '
and 'b ' ; where Y = sales ($'000) and X = advertising costs
($'00),

Solution:
a = 3.33 is the value of Y when X = 0. Hence it is the value of
sales ($'000) when there is no expenditure on advertising.

b = 0.47 is the increase in sales ($'000) for each unit increase


in X($’00).

Therefore, the estimated sales is $3,330 if there is no


expenditure on advertising, and for each $100 increase in
advertising expenditure , sales is estimated to increase by $470.

Example:

If Y = a + b X =28 + 2.6X, then interpret the values of 'a ' and 'b '

; where Y = expenditure in $'000 and X = output in 000's units.

Solution:

16
THE ADVANTAGES AND DISADVANTAGES OF
REGRESSION ANALYSIS

Advantages:
(a) It can be used to estimate a line of best fit using all the data
available. It is likely to provide a more reliable estimate than
any other technique of producing a straight line of best fit
(for example, estimating by eye).

(b) The reliability of the estimate can be evaluated by


calculating the correlation coefficient r.

Disadvantages:
(a) It assumes a linear relationship between the two variables,
whereas a non-linear relationship may exist.

(b) When it is used for forecasting future values, it assumes


that what has happened in the past will provide a reliable
guide to the future which may not be always true in real life
situations.

(c) The technique assumes that the value of Y depends solely


on the value of X. In reality, the value of Y might depend
on several other variables, not just on X.

17
Probability

1. Introduction

1.1 Experiment and outcomes


In statistics, it is customary to refer to any process of observation or
measurement as an experiment. The results one obtains from an
experiment whether they are instrument reading, counts or other kinds of
measurements are called the outcomes of the experiment.

1.2 Sample space


An experiment will, whenever, it is performed lead to an outcome. The
set of all possible outcome is called the sample space, S.
E.g. The experiment of throwing a coin, S = { H , T}
The experiment of throwing a die, S = { 1, 2, 3, 4, 5, 6}
The experiment of sitting for an examination, S = { pass, fail }

1.3 Event
An event A associated with the experiment is a subset of the sample
space, S .
E.g. Throwing a die once.
A = {5} is an event of S.
B = { even number } = { 2, 4, 6 } is also an event of S.

Event A is said to occur if the observed outcome is a member of A and it


does not occur if the observed outcome is not a member of A.

(a) An event which consists of a single outcome of S is called an


elementary event.
E.g. Throwing a die once.
C = { 6} and D = { 2 } are elementary events
(b) An empty set  is an event which does not contain any sample
point at all. It is sometimes referred to as an impossible event.
The sample space S is itself an event and it is called a certain
event.
(c) The complement of an event A of S is the set of sample points that
are in S but not in A and is denoted by A or A’ .
1
E.g. Throwing a die once.
S={1,2,3,4,5,6} A = { 1, 4, 6 } then A = { 2, 3, 5 }

A
A
2 1
3 4 6
5

A Venn diagram is a pictorial representation showing the


relationship between events.

(d) If A and B are two events of the sample space S, then the event
which consists of all the sample points in A , B or both is called
A union B and it is denoted by A  B or A + B.
E.g. Throwing a die once.
S={1,2,3,4,5,6} A = { 1, 3, 5 } then B = { 2, 3, 6 }
A  B = {1, 2, 3, 5, 6 }

(e) If A and B are two events of the sample space S, then the event
which consists of all the sample points that are common to A and
B is called A intersect B and it is denoted by A  B or A B.
E.g. Throwing a die once.
S={1,2,3,4,5,6} A = { 1, 3, 5 } then B = { 2, 3, 6 }
A B = { 3 }
(f) A  B is an event which occurs iff either A or B or both occurs.
A  B is an event which occurs iff both A and B occur together.
2
(g) Two events A and B are said to be mutually exclusive if the
occurrence of A excludes B and vice versa. That is, A and B
cannot occur together or if A occurs then B cannot occur and vice
versa.
E.g. Throwing a die once. S = { 1, 2, 3, 4, 5, 6 }
If A = { 1, 2, 3 }, B = { 4, 5, 6 } then A and B are mutually
exclusive events.
If C = { 1, 5, 6 }, D = { 2, 3, 5 } then C and D are not mutually
exclusive events.

E.g. In an examination.
A = you pass the exam.
B = you fail the exam.
A and B are mutually exclusive events.

2. Approaches to probability

Probability is a number that is assigned to individual event as an


indication as to the likelihood that that particular event will occur. The
probability of any given event A denoted by P(A) lies between 0 and 1,
i.e. 0  P(A)  1.

2.1 Classical approach to probability


If an experiment has N equally likely (same chance of occurring) and
mutually exclusive outcomes, NA of which are favourable to the
occurrence of an event A then the probability of occurrence of A is
P(A) = .

E.g. Throwing a die once.


S = { 1, 2, 3, 4, 5, 6 }, N(S) = 6
A = { 3, 4 }, N(A) = 2
P(A) = N A = 2 = 1
N 6 3
E.g. If a box contains 20 balls: 10 white, 7 black and 3 red. What is the
probability that a ball drawn at random is
(i) red? (ii) white? (iii) black?
3
Solution:
There are 20 possible outcomes i.e. N(S) = 20
(i) Let A be the event of getting a red ball.
Since there are 3 cases favourable to the occurrence of A
i.e. N(A) = 3
P(A) = 3
20
(ii) Let B be the event of getting a white ball.
P(B) =

(iii) Let C be the event of getting a black ball.


P(C) =

E.g. If a pair of dice are thrown, what is the probability that a total of 8
shows?
Let A be the event of getting a total of 8.
The first die gives 6 possible outcomes.
The second die gives 6 possible outcomes.
Therefore, 6x6 = 36 possible outcomes for the experiment of
throwing a pair of dice; N = 36

4
2.2 Relative frequency approach to probability (empirical approach)
If an experiment is repeated N times under the same conditions, N A of
these trials result in the occurrence of event A then the probability of
event A,
P(A) = lim N A
N → N
NA
This approach is formulated on the assumption that the proportion
N
will tend to be stable and approaches a constant when the number of N
trials increases.
E.g. No. of ‘head’ resulting from throwing a coin.
NA
No. of throws (N) No. of ‘head’ (NA)
N
___________________________________________
10 4 0.4
100 54 0.54
1,000 520 0.52
10,000 5,100 0.51

By increasing N, we should finally get closer and closer to a


number called the true probability of a head in a single throw of the
coin = 0.5

2.3 Subjective approach to probability


By the subjective approach, the probability of an event is the degree of
belief by an individual that the event will occur based on all evidence
available to him. This probability is particularly appropriate when there is
only one opportunity for the event to occur and it will either occur or not
occur at that one time. If one believes that it is very likely that the event
would occur then one may assign a probability close to 1 to its
occurrence. If one believes that it is unlikely that the event would occur
then one can assign a probability equal or near to zero to its occurrence.
As the probability value is a personal judgement, the subjective
approach is also called the personalistic approach.

Note: A single probability means that only one event can take place . It is
called a marginal or unconditional probability.

5
3. Probability rules
3.1 Complementary event
If A is the complementary event of A then P( A ) = 1 – P(A).

E.g. A = passing the exam.


P(A) = 0.9 then P( A ) = 1 – 0.9 = 0.1

3.2 Addition rule for mutually exclusive events


If we have 2 mutually exclusive events then we may wish to find
P(A  B) = P(A) + P(B)

probability of A or B or both occur

Venn diagram showing addition rule for mutually exclusive events

P(A  B) = P(A) + P(B)

E.g. Calculate the probability that


(a) a single throw of a die will produce a ‘2’ or a ‘5’.
(b) a single ball drawn from a box of 50 black, 20 white and 30
red balls will be either black or white.

Note: In general, if A1 , A2 , … , An are mutually exclusive events then


P(A1  A2  …  An) = P(A1) + P(A2) + … + P(An)
6
3.3 Addition rule for non-mutually exclusive events
If we have 2 non-mutually exclusive events A and B then we may wish
to find
P(A  B) = P(A) + P(B) – P(A  B)

E.g. A sample of 1,000 people showed that 400 smoke cigarettes, 500
drink beer, 250 smoke and drink.
Calculate the probability of a person who smokes cigarette or
drinks beer or both.
Let C be the event that a person who smokes cigarette.
Let D be the event that a person who drinks beer.

Method 1---- using Venn diagram

Method 2---- using a table

7
Method 2---using a table
C C Total
D 250 250

D 150 350
Total 1000

P(C  D) = P(C) + P(D) - P(C  D)

Alternatively, we can set up the following table

P(C) P( C ) Total
P(D)

P( D )
Total 1.0

P(C  D) = P(C) + P(D) - P(C  D)

3.4 Multiplication rule for independent events


A and B are said to be independent, if the occurrence of A does not
affect the occurrence of B and vice versa. Note that independent events
are not mutually exclusive, i.e. the 2 events can occur together.
E.g. A = passing your exam.
B = throwing a coin getting ‘H’
A and B are independent

If A and B are independent events then


P(A  B) = P(A) x P(B)
which is known as the multiplication rule for independent events,
where P(A  B) = probability of events A and B occur together or in
succession, which is a joint probability
P(A) = marginal probability of event A occurring
P(B) = marginal probability of event B occurring.
8
E.g. P(passing your exam.) = P(A) = 0.99
P(throwing a coin getting ’H’ ) = 0.5

P(A  B) =

E.g. If two dice are thrown consecutively, calculate the probability of


getting a ‘4’ on the first die and a ‘6’ on the second.

Note: In general, if A1 , A2 , … , An are independent events then


P(A1  A2  …  An ) = P(A1) x P(A2) x … x P(An)

3.5 Multiplication rules for dependent events


If A and B are dependent events then the outcomes of B depends upon
the outcome of A.

E.g. A = passing your exam.


B = getting a good job
A and B are dependent events

The conditional probability of the occurrence of B given that A has


occurred, denoted by P(B/A) is defined by

P( A  B)
P(B|A) = if P(A) > 0
P ( A)

Note: (1) If P(A) = 0 then P(B|A) is not defined.


(2) If A and B are independent then

9
P( A  B)
P(B|A) = = --------- =
P ( A)
E.g. In the experiment of throwing 2 dice once. One of the dice turns out 2.
What is the probability that the sum of the 2 dice is less than 5?

If we have 2 dependent events then P(A  B) = P(A) x P(B|A) which is


known as the multiplication rule of dependent events.

E.g. A batch of 12 T.V. sets, 4 of which are not working. What is the
probability that (i) choosing 2 consecutive sets at random, both
will be defective?
(ii) choosing 3consecutive sets at random, all will
be defective?

In general, if A1 , A2 ,…, An are dependent events then


P(A1  A2  …  An ) = P(A1) x P(A2 |A1) x P(A3 | A1  A2) x ….
10
x P(An| A1  A2  …  An-1 )

3.6 Bayes’ Theorem


Suppose that A and B are mutually exclusive events that exhaust the
sample space of an experiment associated with them, i.e. A  B =  and
A  B = S. P(A) and P(B) are known.
If D is the actual outcome of an experiment where
D = (A  D)  (B  D).
P(D/A) and P(D/B) are known then Bayes’ theorem enables us to
calculate
(a) P(D) = P[(A  D)  (B  D)]
= P (A  D) + P(B  D)
= P(A) x P(D/A) + P(B) x P(D/B)
The above is known as the theorem of total probabilities.

P( A  D) P( A) x( P( D / A)
(b) P(A/D) = = which is known
P( D) P( A) xP( D / A) + P( B) xP( D / B)
as posterior probability.

E.g. Two machines A and B produced respectively 40% and 60% of the
total output of a factory. The percentages of defective output of these
machines are 5% and 3% respectively.
(a) If an item is selected at random, find the probability that the item
is defective.
(b) If the item selected is found to be defective, what is the probability
that it is produced by machine B.

11
A probability tree can be used to represent the above probabilities.

l prior prob. →l l conditional prob. →l l joint probabilities→l

E.g. Suppose a factory manager wishes to advertise his new brand of jam
and have 3 alternative advertising mediums: T.V., posters and shop
promotions. Usually, he advertises on T.V. 60% of the time, posters 30% of the
time and shop promotions 10% of the time. It has been calculated that the
probability of each advertising medium being successful is 0.1, 0.3 and 0.2
respectively.
(a) Find the probability that his advertisements will be successful.
(b) If his advertisements were successful, what is the probability that he
used poster advertisement?

12
E.g. A company has 1,000 replacement parts for a given assembly. 20% of
the parts are defective and the rest are good. 40% were bought from
external sources and the rest were made by the company itself, and of
those bought from external sources, 80% are good. If a part is randomly
selected from this stock, what is the probability that :-

(i) the part is company made and good ?


(ii) the part is either company made or good ?
(iii) the part is bought given that it is defective ?
(iv) the part is bought and good ?

13
Probability Distribution

1. Random variable and probability distribution

In most repetitive experiments, we are not interested in each


individual experiment’s outcome, but in certain properties of the
whole set of experiments. Thus, if we toss coins, we may be interested
in the number of times ‘H’ appears; or if we throw two dice, we may
be interested in the total score of dice.

A random variable is a variable each of whose values is a number


determined by the outcome of repetitive experiments and is usually
denoted by X.

E.g. If we toss a coin and repeat the experiment three times and we
are interested in the number of ‘H’ obtained.

Let X = number of ‘H’ obtained toss a coin 3 times

X is a random variable and X can take values 0, 1, 2, or 3.


S = {TTT,TTH,HTT,THT,HHT,HTH,THH,HHH}

Now
X = 0 means the occurrence of outcome TTT
X = 1 means the occurrence of outcome TTH,HTT,THT
X = 2 means the occurrence of outcome HHT,HTH,THH
X = 3 means the occurrence of outcome HHH

P( X = 0 ) = P(TTT) =
P( X = 1 ) = P(TTH,HTT,THT) =
P( X = 2 ) = P(HHT,HTH,THH) =
P( X = 3 ) = P(HHH) =

P( X = 0 ), P( X = 1 ), P( X = 2 ) and P( X = 3 ) are the


probabilities of the values of the random variable and they
should add up to 1.

1
Any description for e.g. tabular, graphic or formula which gives each
value of a random variable with its corresponding probability is called
the probability distribution of that random variable.

E.g. X = number of ‘H’ obtained in tossing a coin 3 times

Tabular presentation of the probability distribution of X


X=x
P(X=x) 1/8 3/8 3/8 1/8

∑ 𝑃 (𝑋 = 𝑥 ) = 1
𝑚𝑒𝑎𝑛 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 = 𝜇 = 𝐸 (𝑋) = ∑ 𝑥 ∙ 𝑃(𝑥)

𝐸 (𝑋 2 ) = ∑ 𝑥 2 ∙ 𝑃(𝑥)

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = 𝐸 (𝑋 2 ) − [𝐸 (𝑋)]

Graphical presentation of the probability distribution of X

Probability Distribution of X
1/2

3/8
P(X=x)

1/4

1/8

0
0 1 2 3
All Possible Outcome of X

2
2. Discrete probability distribution
For discrete probability distribution, the random variable takes a discrete
set of values.

2.1 Binomial distribution


Let an experiment be such that it can give rise to only two events which
are independent and have well defined probabilities associated with
them. These two events may be called ‘success’ and ‘failure’. Let each
repetition of the experiment be called a ‘trial’.

E.g. Experiment of tossing a coin


a ‘H’ may be a ‘success’ then a ‘T’ is a failure
a single flip of the coin is a ‘trial’

Let the random variable X denote the number of ‘successes’ that can be
obtained from such a repetitive experiment of n independent trials.
Let p be the probability of getting a ‘success’ of each trial.

The probability distribution of X is given by


𝑃(𝑋 = 𝑟) = 𝑛𝐶𝑟 𝑝𝑟 𝑞(𝑛−𝑟) ; 𝑤ℎ𝑒𝑟𝑒 𝑞 = 1 − 𝑝 ; 𝑓𝑜𝑟 𝑟 = 0,1,2, … , 𝑛

X is called a binomial random variable with parameters n & p and it is


denoted by X ~ Bin (n , p).

E.g. Suppose that all students taking a certain examination only 40% pass.
If a group of three candidates for this examination is selected at
random, what is the probability that
(i) all three will pass
(ii) exactly two will pass
(iii) at least two will pass.

3
E.g. Suppose that 20% of all items coming off a production are defective. If
a random sample of 10 are chosen and inspected, what is the
probability of
(i) no defective item in the sample
(ii) at most 2 defective items
(iii) more than 3 but less than and equal to 6 defective items.

Notes:
For the use of binomial distribution (FITS)
(a) We are dealing with mutually exclusive events with only Two possible
outcomes i.e. either getting a success or a failure.
(b) The events are Independent such that the probability of getting a
success in a single trial remains the Same for all trials.
(c) Number of trials is Fixed.
4
2.1.1 Mean and standard deviation of binomial distribution
Mean = µ = n p
Standard deviation = σ = np(1 − p)

Note: the mean of binomial distribution is also equal to its expected


value.

E.g. 5 % of the units produced in a manufacturing process turn out to be


defective. A random sample of size 10 are chosen.
(i) What is the mean defective rate of the sample or what would be
the expected number of defective item in a sample of 10 ?
(ii) What is the standard deviation and variance of the sample ?

Standard deviation of a probability distribution is a measure of


dispersion. When σ is small, the probability that we will get a value close
to the mean is high and when σ is large, we are more likely to get a value
far away from the mean.
5
2.2 The Poisson distribution
Let X be a discrete random variable.
X has a Poisson distribution with parameter 𝝀 > 0 given in an interval,
and it is denoted by X ~ Po ( 𝜆𝐼 ) .

The probability distribution of X is given by


𝑥
𝑒−𝜆 𝜆
𝑃 (𝑋 = 𝑥 ) =
𝑥!
where x = 0, 1, 2, ...,  ; e = 2.718282 and
𝝀 = mean or expected value or rate of change

2.2.1 The Poisson distribution as an approximation to binomial distribution


We may approximate the binomial probabilities with the Poisson
probabilities whenever n is large (i.e. n  50) and p is small (i.e. p < 0.1)
and such that np = 𝝀 remains a constant.
If X ~ b ( x ; n, p ) and n is large (i.e. n 50) & p is small (i.e. p < 0.1)
then X can be approximated by X ~ Po(𝝀 = np)

E.g. Suppose a manufacturer produces items of which about 1 in 1000 are


defective. If a lot of 500 such items is obtained, find the probability that
(a) none of the items are defective in the lot;
(b) 1 defective item;
(c) 2 or more defective items;
(d) 2 or more but less than 5 defective items.

6
2.2.2 Poisson distribution not as an approximation
It is important to realise that although the Poisson distribution is useful
as an approximation to the binomial distribution, it is also a distribution
in its own right. It is used in many situations where we can expect a
fixed number of ‘successes’ per unit time ( or per some other kind of unit)
for e.g .average service rate = 10 customers per hour; average arrival rate
= 6 customers per hour; 1.6 accidents can be expected per day at a busy
road junction; 12 small pieces of meat can be expected in a frozen meat
pie etc.

E.g. A machine shop employing a large number of men finds that, over a
period of time, the average absentee rate is 3 men per shift. Calculate
the probability that, on a given shift,
(i) exactly two men will be absent;
(ii) more than four men will be absent.

Let X be the number of ‘absentee’ per shift, and 𝑋~𝑃𝑜(𝜆𝐼 = 3)


𝑒 −𝜆 𝜆𝑥
𝑃 (𝑋 = 𝑥 ) = ; 𝑥 = 0,1,2, …
𝑥!

2.2.3 Mean and standard deviation of Poisson distribution


Mean = μ = m
standard deviation = σ = m
7
3. Continuous probability distribution
For continuous probability distribution, the random variable assumes a
continuous set of values.

3.1 Normal distribution


The normal distribution is an extremely useful distribution as it is a very
common occurring one in nature. Phenomena such as height and weight
of individuals; I.Q. scores; errors in measuring the length of a metal rod
with accuracy etc. all have distributions that are normal.

3.1.1 Properties of normal distribution or normal curve


(a) The normal distribution is a symmetrical, rather bell-shaped curve.
The curve has a single peak i.e. unimodal. The mean lies at the
center of the normal distribution. Mean, median and mode are
equal. The two tails of the normal curve extend indefinitely and
never touch the horizontal axis.

(b) A normal distribution is completely specified by two parameters


the arithmetic mean (μ) and the standard deviation (σ)
If the random variable X has a normal distribution with mean μ
and standard deviation σ then X is denoted by X ~ N( μ , σ2 ).

8
(c) For 2 normal distributions which have the same mean but
difference standard deviations, the curve for the distribution which
has the largest standard deviation will be flatter and will not have
such a prominent peak as the curve for the distribution with the
smaller standard deviation.

(d) No matter what the shape of the normal distribution, one important
property of the normal distribution is the nature of the relationship
of the area under the curve to the standard deviation of the
distribution. In particular, 68.27% of the area under the curve is
contained within plus and minus one standard deviation of the
mean; 95.45% of the area under the curve is contained within plus
and minus 2 standard deviations of the mean; 99.73% of the area
under the curve is contained within plus and minus 3 standard
deviations of the mean.

9
3.1.2 Standard normal distribution
Normal distribution with mean μ = 0 and standard deviation σ = 1 is
known as standard normal distribution denoted by Z ~N( 0, 12).

Any normally distributed set of values x denoted by X ~ N( μ , σ2 )


can be converted into the standard normal values z by using the formula:
x−
z=

which indicates the number of standard deviations away from the mean.

Probability is the areas under the standard normal curve.


Note : The standard normal distribution is symmetrical about μ = 0

P(Z > z )

-z 0 z

(a) Area under the whole curve = 100% = 1

(b) P( Z > 0 ) = P( Z < 0 ) = 50% = 0.5

(c) P( Z > z ) = P( Z  z )

(d) P( Z < -z ) = P( Z > z )

(e) P( Z > -z ) = 1 - P( Z > z )


10
E.g. Take z = 1, P( Z > 1 ) =

E.g. Take z = 2, P( Z < 2) =

E.g. To show that 95.45% of the area under the normal curve lies within plus
and minus 2 standard deviations of the mean.
Consider standard normal curve Z ~ N( 0, 12 )

E.g. Suppose that a random variable X is distributed normally with μ = 5


and σ = 2, find
(a) c such that P( X  c ) = 0.1;
(b) k such that P(  X - μ < k ) = 0.8 .

11
E.g. Assume that the weight of adult males is normally distributed with a
mean of 69 kg. and a standard deviation of 3 kg.
(i) What is the conditional probability that an individual will be
heavier than 72 kg. if it is known that he is heavier than 70 kg. ?
(ii) Determine the maximum weight of 95% of adult males.

E.g. A manufacturer is filling cans with soup to a net weight of 16 gm. The
actual amount of soup which is put into the cans by the filling machine
is normally distributed about the set weight with a standard deviation of
½ gm. If the manufacturer requires not more than 1% of cans to contain
less than the advertised net weight of 16 gm. , at what weight should the
filling machine be set?

12
3.2 Normal approximation of binomial distribution
If X ~ b( x; n, p ), we can use the normal approximate with μ = n p,
if 0.1  p  0.9 and n p  5 .

In using the normal approximation to the binomial distribution, we are


approximating the distribution of a discrete random variable with the
distribution of a continuous random variable. The following corrections
for continuity are used to improve the above approximation.

(a) P( X = k )  P( k – 0.5  X  k + 0.5 )


(b) P( a  X  b )  P( a – 0.5  X  b + 0.5 )
(c) P( a < X < b )  P( a + 0.5  X  b – 0.5 )

Note: This approximation is useful when required a full calculation involving


a large number of binomial probabilities which would otherwise be very
tedious.

E.g. A well-balanced die is thrown 30 times. Use the normal approximation


to find (a) probability of at most six 5's ;
(b) probability of at least ten 5's ;
(c) probability of exactly five 5's.

13
E.g. Of the people who enter a large supermarket, it has been found that 70%
will make at least one purchase. For a sample of 50 individuals, use the
normal approximation to find
(a) at least 40 people make one or more purchases each;
(b) fewer than 30 people make at least one purchase.

3.3 Normal approximation of Poisson distribution


If X ~ P( x ; m ), when the mean m of a Poisson distribution is large i.e.
m  10 then we can use the normal approximation with μ = m and
σ = i.e. X ~ N ( μ = m , σ2 = m ).
The correction for continuity to improve the above approximation is the
same as the normal approximation of binomial probabilities.

E.g. The average number of calls for services received by a machine repair
department per 8-hour shift is 10. Determine the probability that more
than 15 calls will be received during a randomly selected 8-hour shift:
(a) Using Poisson distribution
(b) Using normal approximation to the Poisson distribution.

14
4. Mathematical expectation (or expected value)

Mathematical expectation represents an average. For example, the expected


birth rate is 2.6 children per family means that the average number of children
is 2.6 per family, some families will have more, some will have less but the
average is 2.6. The concept of mathematical expectation arises in situations
where there is risk and uncertainty.

Suppose x1, x2, …, xn are n possible outcomes of an experiment with p1, p2 ,…,
pn as their respective probabilities of occurrence.

Expected value for this experiment, E(x) =  xp

Variance for this experiment, V(x) =  (x - x )2 p


=  x 2p – (  xp)2
= E(x2) – [E(x)]2

Some rules for expected value and variance

(1) E(ax) = a E(x)


E(x + a) = E(x) + a
E(x1 + x2 + … + xn) = E(x1) + E(x2) + …+ E(xn)

(2) V(ax) = a2 V(x)


V(x + a) = V(x)
V(x1 + x2 + … + xn) = V(x1) + V(x2) + …+ V(xn)

15
E.g. 1
In a business venture, a man can make a profit of $2,000 with a probability of
0.65 or a loss of $4,000 with a probability of 0.35. Should he undertake the
venture? Calculate the variance of profit.

Solution:
x: profit
p: probability

Expected profit, E(x) =  xp= = - $100 (loss)

Variance of profit, V(x) =  x2p – (  xp)2


=

= 8,190,000

E.g. 2
A man is considering the purchase of a rottery ticket. He can win a first prize of
$50,000 , one of four second prizes of $10,000 each or one of ten third prizes of
$1,000 each. He estimates that 100,000 tickets will be sold. The winners will
be selected at random. How much should be prepared to pay for the ticket?
Solution:
x: prize

p: probability

Expected prize =  xp =
= $1

Therefore, it is not worthwhile paying more than $1 per ticket.

16
Sampling Distribution
Introduction
Sampling distribution is a study of the relationship between a population and
the samples drawn from the population.
There are 2 types of sample: (1) probability sample (or random sample)
(2) non-probability sample (or quota sample)
It is the probability samples that are of importance as they are the only type of
sample which allows us to make statistical inferences about the population
from which they were drawn.

Note:
** Parameter - a number that describes a population.
e.g. µ = population mean;
σ = population standard deviation;
p = population proportion are population parameters.

**Sample statistics - a number calculated from sample data.


e.g. = sample mean ;
s = sample standard deviation ;
= sample proportion are sample statistics

**Formulae : Ungrouped data Grouped data

Population where N = = pop. size

Sample where n = = sample size

1
Sampling distribution
Consider all possible random samples of size n drawn from a population. For
each sample, a sample statistic such as or is computed which will vary
from sample to sample. We then obtain the frequency distribution of the
sample statistic which is called the distribution of the sample statistic or
sampling distribution of the sample statistic.

E.g. Consider the population consists of the digits 1, 2 and 3.


If we take samples of size 2 from this population with replacement, then
we would get 9 samples (all possible arrangement = nr = 32 ).

Sample number Sample sample mean


1 {1, 1} 1
2 {1, 2} 1.5
3 {1, 3} 2
4 {2, 1} 1.5
5 {2, 2} 2
6 {2, 3} 2.5
7 {3, 1} 2
8 {3, 2} 2.5
9 {3, 3} 3

If we compile a frequency distribution of the sample means with 1 and


under 1.5 as the first class, we obtain a distribution of sample means or
a sampling distribution of sample means.

sample mean frequency


1 and under 1.5 1
1.5 and under 2.0 2
2.0 and under 2.5 3
2.5 and under 3.0 2
3.0 and under 3.5 1

A histogram can be plotted, and we can see that it has the shape of a
normal curve.
2
Sampling distribution of sample means
Properties
(a) For random samples of size of n taken from a population having mean
μ and standard deviation σ , the sampling distribution of sample means
has mean and standard deviation given by

(for infinite population)

Or
(for finite population)

where is called the finite population correction factor.

is called the standard error of the mean.

Note: As sample size n increases, is getting smaller.


Then, sample mean will getting closer to population mean μ.

(b) If the population is normally distributed with mean μ and standard


deviation σ , then the sampling distribution of sample means is also
normally distributed with mean and .
i.e. If X ~ N( μ , σ2 ) then ~ N( , 2 ); and Z = N( 0, 12 )

(c) If the size of the random sample n is large (n  30), then the sampling
distribution of sample means can be approximated using normal
distribution. This result is known as Central Limit Theorem.
i.e. If n  30 , then ~ N( , 2 ) .

(d) To calculate , σ must be known. If σ is unknown then it is estimated


by s and is estimated by where

= or

3
E.g. 1
A random sample of 25 adults is drawn from a normal population of height for
which the mean is 172.5 cm. and standard deviation 6.25 cm.
(a) This sample of size 25 has mean value that belongs to a sampling
distribution of sample mean. Find the shape of this sampling distribution.
(b) Find the mean and standard error of this sampling distribution.
(c) What is the probability that the sample mean will be greater than 175
cm ?
(d) Find a symmetric interval about 172.5 cm, which can be expected to
contain 99% of all sample mean.

Let X: height, and : mean height. Given n = 25, μ=172.5 , σ=6.25

0.005
0.005
0.99

a b
172.5

4
E.g. 2
We are making 500 special components for an unique machine. Each
component consists of a carbon rod which has a mean length of 4.08 cm and
standard deviation of 0.5 cm.
(a) Calculate the probability that, in a random sample of 100 of these
components, the mean length will be between 4.00 cm. and 4.10 cm.
(b) What is the probability that the combined length of a random sample 49
of these components is more than 205 cm?

Let : mean length . Given N = 500,


X: Length ~ N( μ=4.08 , σ2=0.52 )
~ N( = 4.08, 2= )

E.g. 3
A large number of samples with size n are taken from a normal distributed
population with mean 74 and standard deviation 6, find the sample size n
(a) if the probability that the sample mean exceeds 75 is 0.282
(b) if the probability that the sample mean is less than 70.4 is 0.00135 .
X~ N( μ=74 , σ2=62 ) => ~ N( = 74, 2 = 62/n )

5
Sampling distribution of sample proportions
Frequently in statistical work, it is desirable to know what proportion of items in
a population possess a certain characteristic. For e.g. what proportion of
consumers prefer a certain product or what proportion of the students pass a
certain examination? In these cases, we consider all possible samples of size
n drawn from a population having population proportion p. For each sample
the sample proportion is computed.

Note:
Population proportion ,p
= No. of items having the characteristic concerned in the population
Total no. of items in the population

Sample proportion ,
= No. of items having the characteristic concerned in the sample
Total no. of items in the sample

We can then obtain the sampling distribution of sample proportions.


Properties
(a) The sampling distribution of sample proportions has mean and
given by
and (for infinite population)

or (for finite population)

is called the standard error of the proportion.

(b) If n  30, then the sampling distribution of sample proportions is


approximated by the normal distribution.

i.e. If n  30, then ~N( , 2 ) and Z = N( 0, 12 )

The approximation improves as the sample size n increases.


6
(c) To calculate , p must be known. If p is unknown then it is estimated
by and is estimated by where

= or

E.g.1
It has been found that 2% of the items produced by a machine are defective.
What is the probability that in a shipment of 400 such items,
(a) more than 3% are defective?
(b) between 1% and 3% are defective?

7
Statistical estimation and hypothesis testing
Introduction
There are 2 types of statistical inferences:-
(1) Statistical estimation (2) Hypothesis testing

Statistical estimation is concerned with estimating the population parameters


using sample statistics.

Hypothesis testing involves the setting up of a hypothesis (or theory) about the
population and then sampling in order to see if the hypothesis is supported or
rejected.

Statistical estimation

Because of time and cost factors, the population parameters ( ) are

frequently estimated by using sample statistics ( , s, ).

Point estimate
An estimate of a population parameter given by a single value and
calculated from sample data is called a point estimate of the population
parameter.

(a) is a point estimate for .

(b) s is a point estimate for where

s= (for ungrouped data)

or s= (for grouped data)

Note: In a question on statistical inference, the standard deviation given is


taken to be s.

1
(c) Given 2 samples from the same population.

Sample 1 of size n1 , sample mean and sample standard deviation s1

Sample 2 of size n2 , sample mean and sample standard deviation s2

Then the point estimate for the population mean is

and the point estimate for the population standard deviation is

Sp =

E.g. A sample of 5 measurements of the diameter of a sphere recorded by a


scientist is as follows: 6.36mm, 6.32mm, 6.37mm, 6.33mm, 6.37mm

Determine a point estimate for the population parameter and .


Solution: Diameter X X2____
6.36 40.4496
6.32 39.9424
6.37 40.5769
6.33 40.0689
6.37 40.5769____
31.75 201.6147

(i) The point estimate for is

(ii) The point estimate for is

2
Confidence interval estimates or confidence limits

An estimate of a population parameter by 2 numbers together with an


assessment of the probability that the population parameter may lie is called a
confidence interval estimate of the population parameter.

σ
X  Z α/2
n

3
Confidence interval estimate of population mean ,

Model I :

Assumption: If population standard deviation is known.

The population X ~ N[ , 2 ] or distribution of X is unknown but

n 30 then sampling distribution of sample means is normal

where and or

Based on the characteristics of normal curve, a symmetric interval about

which contains 99% or 95% of all sample means can be obtained i.e.

C. I. estimate of can be obtained.

The 99% C. I. estimate for is where P( Z > 2.5758) = .

Similarly, the 95% C. I. for is where P( Z > 1.96) = .

In general, the 100(1- )% C. I. for is where P( Z > )= .

4
E.g. A normal population has unknown mean and standard deviation 15. A
random sample of size 25 drawn from this population was found to have a
mean of 950. Construct

(a) a 90% C.I. for ; (b) a 95% C.I. for ; (c) compare your
results.
Solution: Given:

E.g. A manufacturer wishes to estimate the mean dimension of a certain

component. He would be satisfied if he obtains an estimate within 0.01 cm.


of the true mean. The standard deviation of the dimension of the component is
0.2 cm. What must be the size of the sample that he should examine if he
wants to be 95% certain?
2
Z σ 
n =   / 2 
 e 

5
Model II :

Assumption: If population standard deviation is unknown then it is

estimated by s, and the sample size n is large (n 30) .

is estimated by or

The 100(1- )% C. I. for is .

The 99% C.I. for is ;

the 95% C.I. for is .

E.g. The management of a company making a certain type of car component


wishes to ascertain the average number of components per hour produced by
the workers. The company employs a very large number of workers and it is
decided to use a sample of the output of 400 workers. After checking the
output of this sample, it was found that the average output produced by each
worker every hour is 100 with a standard deviation of 20.
(a) Calculate a 95% confidence interval for the average output produced by each
worker per hour for the whole factory.
(b) How large a sample is needed if the management wishes to be 95% confident
that the sample mean will be within one unit of the true mean?

6
Model III :If is unknown then it is estimated by s, and the sample size n is
small (n<30) and the population is normal or approximately normal.

The 100(1- ) % C.I. for is

and is obtained from t distribution table.

Student’s t distribution or simply t distribution is a family of probability


distributions distinguished by their individual degrees of freedom (v), similar in
form to the normal distribution and it is used when the population standard
deviation is unknown and the sample size is small (n<30). The table of area
under t distribution is tabulated.

7
E.g. A sample of 10 packets of sugar packed by a machine has the following
weights (kg):
1 , 1.05, 1.10, 0.95, 0.96, 1.10, 1.02, 0.97, 0.99, 1
(a) Calculate the sample mean and standard deviation

(b) Obtain a 95% C.I. for .

8
Confidence interval estimate of difference of two means
Population I Population II

Pop. Mean

Pop. std dev.

Sample size n1 n2

Sample mean

Sample std dev. s1 s2

For large sample sizes, n1 and n2 ( 30)

then

where

and

If are unknown, is estimated by

9
Model IV : For large sample size n1 , n2 30 , are unknown and
estimated by s1 & s2 respectively.

The 99% C. I. for

The 95% C. I. for

In general, the 100(1- )% C.I. for

E.g. The management of a large company wishes to determine whether there


is any difference in performance between the day shift workers and the night
shift workers. A sample of 120 day shift workers and another 100 night shift
workers are selected. The results ( in number of parts produced per hour) are
given below:-
Day shift Night shift
Sample size 120 100
Sample mean 75.5 70.4
Sample std. dev. 4.13 4.27
Construct a 95% C. I. for the difference of the average output of the day shift
workers and that of night shift workers.

10
Confidence interval estimate of population proportion, p
Model V

Assumption: For large sample size, n 30, the 100(1- )% C.I. estimate for p
is

where or .

In particular, the 95% C.I. estimate for p is ;

and the 99% C.I. estimate for p is

E.g. A producer of steel pipes selected a simple random sample of 300 pipes
from the production process to estimate the proportion of defective pipes.
There were 15 defective pipes in the sample.
(a) What is the point estimate of the proportion of defective pipes in the
population?
(b) Construct a 95% confidence interval estimate of the proportion of the
defective pipes in the population.
(c) How large a sample would be needed if the probability is to be 0.95 that
the error of estimate will not exceed 0.02 unit?

11
Confidence interval estimate of difference of two proportions

Population proportion p1 p2
Sample size n1 n2
Sample proportion

Point estimate for difference of 2 pop. proportions ( p1 - p2) is -

For large sample sizes n1 , n2 30 ,

and which is

estimated by

12
Model VI : For large sample sizes n1 , n2 30. p1 & p2 are estimated by

- respectively.

The 100( 1- )% C. I. for p1 - p2 is ( - )

E.g. Superplasticized concrete is formed by adding chemicals to conventional


concrete to make it more fluid so that it can be placed more easily. Suppose
that a sample of 50 new construction projects in Area A yields 15 that are using
this type of concrete. A sample of 60 new projects in Area B also yields 15 using
superplasticized concrete. Construct a 99% C. I. for the difference in the
proportions of new construction projects in Areas A and B that are using
superplasticized concrete.

Statistical Significance and Confidence


Intervals Section
• If the two confidence intervals do not overlap, we can conclude
that there is a statistically significant difference in the two population
values at the given level of confidence; or alternatively
• If the confidence interval for the difference does not contain
zero, we can conclude that there is a statistically significant difference
in the two population values at the given level of confidence.

13
Hypothesis Testing

Statistical decisions
We study the sample data and then make decisions about the
population from which the sample is drawn. Such decisions are
called statistical decisions.

Statistical hypotheses
They are statements or assumptions which may or may not be
true concerning one or more populations. Based on sample
information, these hypotheses will be tested. Normally, the
hypothesis to be tested is formulated in the sole purpose of
being rejected or nullified. This hypothesis is called the null
hypothesis, denoted by H0. We have to formulate the other
hypothesis which differs from the null hypothesis and it is usually
called the alternative hypothesis, denoted by H1 or H a .

Tests of hypotheses and significance


Usually sample result would differ from those specified by H0 .
Even if the null hypothesis H0 is true, such observed difference
may be the result of pure chance. If the observed difference is
large, we say that the observed difference is significant, then the
decision is to reject H0. Procedures which enable us to decide
whether to reject or retain the null hypothesis or to determine
whether observed sample result differs significantly from
expected result specified by H0 are called tests of hypotheses or
tests of significance or rules of decision.

Type I and Type II errors

When H0 is tested , we may commit 2 types of errors :-


(1) Rejecting H0 when it is in fact true -----type I error
P [committing a type I error ] = P [ reject H0 / H0 is true ]
= level of significance = α

(2) Accepting H0 when it is in fact false ----- type II error


P [ committing a type II error ] = P [ accept H0 / H0 is
false ] = β
1
Accept H0 Reject H0
Decision
Hypothesis
H0 is true Correct decision. Type I error.
Probability = 1 - α Probability = α
corresponding to corresponding to
‘confidence level ‘ ‘significance level ‘
H0 is false Type II error Correct decision.
Probability = β Probability = 1 - β
corresponding to
‘ power ‘

Critical region and critical point


When H0 is true, the set that consists of all the possible
outcomes which lead to the acceptance of H0 is said to
constitute the acceptance region (AR) of H0, while the other set
which consists of all the sample outcomes which lead to the
rejection of H0 is said to constitute the rejection region or critical
region (CR) of H0. A critical value is a value used in the test
criterion to separate the critical region of H0 from the acceptance
region.
Two tailed test

Left-tailed test Right-tailed test

2
Procedures for testing statistical hypotheses
(1) State the assumptions or known facts about:-
(i) the population of interest
(ii) the nature of the samples and the sample sizes
(iii) state H0 and H1 .

(2) Select a test statistic whose sampling distribution is known


if H0 is true and other assumptions are satisfied.

(3) Choose the significance level, α of the test and thus


determine an appropriate critical region of fixed size.
(Usually α = 0.05 or 0.01 are used)

(4) To compute the realised value of the test statistic from


sample results and other known quantities.

(5) Make decision. If the test statistic falls in the CR, then we
reject H0 otherwise we accept it and draw conclusion.

Tests concerning means

Test of hypotheses concerning the mean, μ of a single


population

H0 : μ  μ0 or H0 : μ  μ0 or H0 : μ = μ0
H1 : μ < μ0 H0 : μ > μ0 H1 : μ  μ0
(left-tailed test) (right-tailed test) (2-tailed test)

where μ0 is a predetermined constant.

3
Test A (Z - test )
Model : Population is normal with known standard deviation σ .
OR
Population is not normal with known standard
deviation σ and sample size n  30.

X − o 
The test statistic is Z= where X =
X n

(a) Left-tailed test


To test H0 : μ  μ0 against H1 : μ < μ 0

______________

At α significance level, the CR = {z / z < - zα }

(b) Right-tailed test


To test H0 : μ  μ0 against H1 : μ > μ 0

______________

At α significance level, the CR = {z / z > zα }

(c) Two-tailed test


To test H0 : μ = μ0 against H1 : μ  μ 0

______________

At α significance level, the CR = {z /  z > zα/2 }


4
E.g. A standard intelligence examination has been given to the
students for several years and it is assumed that the scores
are normally distributed with an average of 80 and a
standard deviation of 7. A group of 25 students obtained a
mean grade of 77 in the examination this year. Is this
year’s students inferior in intelligence to the past years’
students using (i) 5% (ii) 1% level of significance ?

E.g. A chemical company obtains an average of 1800 lbs of


finished product per batch processed with a standard
deviation of 100 lbs. By a new processing technique, it is
claimed that the yield can be increased. To test this
hypothesis, 49 sample batches are processed using the
new technique and it is found that the average yield is 1850
lbs. Can we conclude that the new technique improves the
yield at 1% significance level ?

5
Test B ( large sample Z-test )

Model:- (i) Population is normal or not normal with unknown


standard deviation but estimated by sample
standard deviation s;
(ii) The sample size is large , n  30.

(a), (b) and (c) of test A would still valid .

E.g. A manufacturer of batteries believes that one particular


type of battery has a useful life of 1000 hours. A simple
random sample of 100 of the batteries is taken and the
mean life is found to be 950 hours with standard deviation
of 270 hours. Does this indicate that the mean life of this
type of batteries is not 1000 hours at 5% level of
significance ?

6
Test C (Small sample t-test)
Model: (i) Population is normally distributed with unknown
standard deviation but estimated by sample
standard deviation, s.
(ii) The sample size is small (n<30).

X − o s
The test statistic is t = where SX = which follows a
SX n
t-distribution with (n-1) degrees of freedom.

(a) Left-tailed test


To test H0 : μ  μ0 against H1 : μ < μ 0

______________

At α significance level, the CR = {t / t < - tα, n-1 }

(b) Right-tailed test


To test H0 : μ  μ0 against H1 : μ > μ 0

______________

At α significance level, the CR = {t / t > tα, n-1 }

(c) Two-tailed test


To test H0 : μ = μ0 against H1 : μ  μ 0

______________

At α significance level, the CR = {t /  t > tα/2, n-1 }


7
E.g. The expected life time of electric light bulbs produced by a
given process was 1500 hours. To test a new batch, a sample
of 10 was taken which showed a mean life time of 1400 hours
and standard deviation is 90 hours. Is there any evidence of a
significance change in the length of battery life? Use 5%
electric light bulbs

8
Tests of hypotheses concerning the difference of means of
two populations
H0 : 1 −  2  d 0 or H0 : 1 −  2  d 0 or H0 :  1 −  2 = d 0
H1: 1 −  2  d 0 H1: 1 −  2  d 0 H1 : 1 −  2  d 0
(Left-tailed test) (Right-tailed test) (2-tailed test)

where d0 is a predetermined constant.

Test D (Z-test)
Model: The two populations are normal with known standard
deviations  1 and  2 .

( x1 − x 2 ) − d 0  12  22
The test statistic is Z = where  x −x = + .
 x −x 1 2
n1 n2
1 2

(a) Left-tailed test


To test H0 : 1 −  2  d 0 against H1 : 1 −  2  d 0

At α significance level, the CR = {z / z < - zα }

(b) Right-tailed test


To test H0 : 1 −  2  d 0 against H1: 1 −  2  d 0

At α significance level, the CR = {z / z > zα }

(c) Two-tailed test


To test H0 : 1 −  2 = d 0 against H1 : 1 −  2  d 0

At α significance level, the CR = {z /  z > zα/2 }

Note: Test D is seldom used, it is used as a basis for test E.

9
Test E (Large sample Z-test)

Model: (i)  1 and  2 are unknown and estimated by s1 and s2.


(ii) the sample sizes n1 , n2 are large (  30)

( x1 − x 2 ) − d 0
2 2
s1 s
The test statistic is Z = where sx −x = + 2 .
sx −x 1 2
n1 n2
1 2

(a), (b) and (c) of Test D would still valid here.

E.g. To ascertain whether a new fertiliser is more efficient than


the old fertiliser in rice production. A piece of land was divided
into 100 squares of equal areas, all of the same quality. The new
fertiliser was applied to 50 squares and the old fertiliser to the
other 50. The mean no. of kg. of rice harvested per square of
land using the new fertiliser was 25.5 with a variance of 22. The
corresponding mean and variance for the squares using the old
fertiliser were 24.6 and 19 respectively. Is the new fertiliser more
efficient than the old one at 1% sig. level?

10
Test F (small sample t-test)
Model: (i) 2 populations are normal with unknown but common
standard deviation  1 =  2 =  .
(ii) The 2 samples are independent random samples
with small sample sizes n1 and n2 (< 30).

( x1 − x 2 ) − d 0
The test statistic is t =
sx −x
1 2

(n1 − 1) s1 + (n 2 − 1) s 2
2 2

where sp = pooled sample standard deviation =


n1 + n 2 − 2
1 1
and sx −x = s p + , t follows a t-distribution with (n1 +n2 -2)
1 2
n1 n2
degrees of freedom.

(a) Left-tailed test


To test H0 : 1 −  2  d 0 against H1 : 1 −  2  d 0

At α significance level, the CR = {t / t < − t ,n1 + n2 −2 }

(b) Right-tailed test


To test H0 : 1 −  2  d 0 against H1: 1 −  2  d 0

At α significance level, the CR = {t / t > t ,n1 +n2 −2 }

(c) Two-tailed test


To test H0 : 1 −  2 = d 0 against H1 : 1 −  2  d 0

At α significance level, the CR = {t /  t > t


, n1 +n2 − 2
}
2

11
E.g. Two salesmen A and B are working in a certain district.
From a sample survey conducted by the Head office, the
following results on sales were obtained:-
Salesman A Salesman B
n1 = 20 n2 = 18
x1 = 170 x2 = 205
s1 = 20 s2 = 25

State whether there is a significant difference in the mean sales


between the two salesmen at 5% sig. level.

Tests concerning proportions

Test G (Tests of hypotheses concerning the proportion, p


of a single proportion

H0 : p  p0 or H0 : p  p0 or H0 : p = p0
H1 : p < p0 H1 : p > p0 H1 : p  p0
(Left-tailed test) (Right-tailed test) (2-tailed test)
where p0 is a predetermined constant.

Model: (i) Population is large, i.e. the no. of elements, N in


the population is large;
(ii) Although the sample is large n  30, it is small
relative to the size of the population i.e. sampling
fraction = n/ N < 0.05
(iii) p is not near to 0 or 1.

p − p0 p0 (1 − p0 )
The test statistic Z = where  =

 
p n
p

(a) Left-tailed test


To test H0 : p  p0 against H1 : p < p0

At α significance level, the CR = { z / z < -z α}.

12
(b) Right-tailed test
To test H0 : p  p0 against H1 : p > p0

At α significance level, the CR = { z / z > z α }

(c) Two-tailed test.


To test H0 : p = p0 against H1 : p  p0

At α significance level, the CR = { z / z > z α/2 }

E.g. A bus company trains drivers in groups of 25.


Normally, 3 out of each group fail to pass the final test.
A new method of instruction is being carried out
where a group of 100 were trained together. There
were 9 failures. Test to see if the new training method
is better using 5% significance level.

13
Test H (Test concerning differences between proportions)
Consider 2 independent random samples from 2 binomial
populations consisting of n1 and n2 trials and the no. of
successes are x1 and x2 respectively.
Then p1 = x1 ~ N ( p1 , p1 (1 − p1 ) )

n1 n1
 x p (1 − p 2 )
p2 = 2 ~ N ( p2 , 2 ) approximately when n1 , n2 are
n2 n2
large.

Consider the difference between the 2 proportions p 1 – p2


 
whose estimator is p1 − p 2 .
 
Then we have E ( p1 − p2 ) = p1 – p2 and
    p1 q1 p q
V ( p1 − p2 ) = V( p1 ) + V ( p 2 ) = + 2 2
n1 n2
  p1 q1 p2 q2
The standard error of p1 − p 2 =    = +
p1 − p2 n1 n2

 
(i) If p1 = p2 = p known, the standard error of p1 − p 2 is
pq pq 1 1
   = + = pq ( + )
p1 − p2 n1 n2 n1 n2

 
(ii) If p1 = p2 = p unknown, the standard error of p1 − p 2 is
  1 1  x1 + x2
estimated by S   = pq ( + ) where p =
p1 − p 2 n1 n2 n1 + n2

The null hypothesis


Ho : p1 - p2 = do or H o : p 1 - p2  do or Ho : p 1 - p 2 
do
H1 : H1 : H1 :
 
( p1 − p 2 ) − d 0
The test statistic is Z = where
S 
p1 − p2

  1 1  x1 + x2
S   = pq ( + ) and p =
p1 − p 2 n1 n2 n1 + n2

14
(a) Left-tailed test
Ho : p1 - p2  do against H1 : p1 - p2 < do

At  sig. Level, the C.R. = {z / z < - z }

(b) Right-tailed test


Ho : p1 - p2  do against H1 : p1 - p2 > do

At  sig. Level, the C.R. = {z / z > z }

(c) Two-tailed test


Ho : p1 - p2 = do against H1 : p1 - p2  do

At  sig. Level, the C.R. = {z / Z  Z }


2

E.g. A political party X believes it has increased its percentage


of the vote by 5% points over the previous 12 months. A survey
of 500 electors 1 year ago showed that 100 voted for X. In a
recent survey, it received 96 votes from 400 electors. Would you
accept the view of a 5% points increase using 1% sig. level?

15
Test I (Paired comparison t – test)

We have n pairs of observations (x1 , y1), (x2 , y2), … , (xn , yn).


Assume that X and Y are normally distributed.
Let D = X – Y
D =
D =
(X − Y)
= X − Y
n n
( D ) 2
( D − D )2 D 2

n
and SD = =
n− 1 n −1

Distribution of D has mean  D =  X − Y and standard


D SD
deviation D = which is estimated by SD = .
n n

D − ( X − Y ) X − Y − ( X − Y )
The test statistic is t = =
SD SD
which follows a student’s t - distribution with (n – 1) degrees of
freedom.

(a) Left-tailed test Ho :  X − Y  0 against H1 :  X − Y  0

At  sig. level, the C.R. = { t / t < - t , n − 1 }

(b) Right-tailed test Ho :  X − Y  0 against H1 :  X − Y  0

At  sig. level, the C.R. = { t / t > - t , n − 1 }

(c) Two-tailed test Ho :  X − Y = 0 against H1 :  X − Y  0

At  sig. level, the C.R. = { t / t  t }


, n −1
2

16
E.g. A new product was introduced into the market in January
1997. After a poor year for sales, the manufacturer initiated an
intensive advertising campaign during January 1998. The table
below records the sales, in thousand dollars, for a one-month
period before and a one-month period after the advertising
campaign, for each of eleven regions.
Region A B C D E F G H I J K
Sales
Before 2.4 2.6 3.9 2.0 3.2 2.2 3.3 2.1 3.1 2.2 2.8
Sales
After 3.0 2.5 4.0 4.1 4.8 2.0 3.4 4.0 3.3 4.2 3.9
The sales may be assumed to follow a normal distribution.
Determine, at the 5% sig. level, whether an increase in sales has
occurred .

Solution:
Region A B C D E F G H I J
K
Sales
Before(X)2.4 2.6 3.9 2.0 3.2 2.2 3.3 2.1 3.1 2.2 2.8
Sales
After (Y) 3.0 2.5 4.0 4.1 4.8 2.0 3.4 4.0 3.3 4.2 3.9
D= Y-X
D= ;  D2 =

17
Chi-square tests ( tests) can be used in the following ways:-
(a) Test of differences among k proportions
(b) Test of contingency tables
(c) Test of goodness of fit

(a) Test of differences among k proportions


In many applications, we must decide whether observed differences
among more than 2 sample proportions or percentages are significant
or whether they can be attributed to chance.

The null hypothesis against the alternative hypothesis:


H0 : p1 = p2 = … = pk = p 0
H1 : at least one of the p’s does not equal to p 0 .

Sample sample size Success Failure

1 n1 x1 n1 - x1
2 n2 x2 n2 - x2
3 n3 x3 n3 - x3
. . . .
. . . .
. . . .
k nk xk nk - xk

We estimate p0 by if H0 is true.

Let O be the observed frequency and E be the expected


frequency.
If H0 is true, then the test statistic is :

1
At sig. level, the critical region CR = { } with
k – 1 degrees of freedom.

E.g. If random samples of registered voters (100 with primary


education, 300 with secondary education and 200 with
tertiary education) are asked how they would vote on a
certain piece of legislation. The results of this sample survey
are as follows:-

Primary Secondary Tertiary


Education Education education
For the legislation 33 147 114
Against the 67 153 86
legislation
Total 100 300 200

Test at 5 % level of sig. whether the actual proportions of


favorable votes are the same for all three groups.

The proportions of favorable votes for the 3 groups are


respectively

1 = = 0.33; 2 = = 0.49; 3 = = 0.57

2
We are interested in testing
H0 : p1 = p2 = p3 against
H1 : at least 2 of the proportions are not the same.

If H0 is true, then we can combine the 3 samples and estimate


the common proportion of voters favoring the legislation as

p= = 0.49

Primary Secondary Tertiary


education education education
For the legislation 33 147 114
(100x0.49) (300x0.49) (200x0.49)
= (49) = (147) = (98)
Against the legislation 67 153 86
(51) (153) (102)

The expected frequencies are shown in parentheses below the


observed frequencies.

The test statistic

At 5% sig. level and (3-1) = 2 degrees of freedom, = 5.991.

Since = 15.36 > = 5.991 , H0 is rejected at 5%


level of sig. which shows that there is a difference in the attitude
concerning the given piece of legislation depending on the extent of
one’s education at 5% level of sig.

3
(b) Test of contingency tables

There are situations in which individuals are classified


according to 2 qualitative variables (e.g. hair color and eye
color) and the problems deal with the relationship between
those 2 variables of classification.

A A1 A2 A3 … Ac Row total
B
B1 O11 O12 O13 . . . O1c O1.
B2 O21 O22 O23 . . . O2 c O2.
. . . . ... . .
. . . . ... . .
. . . . ... . .
Br Or 1 Or 2 Or 3 . . . Or c Or .
Column total O.1 O. 2 O. 3 . . . O. c O. .

The above table is called a r x c contingency table. The observed


frequency for the event Bi Aj will be denoted by Oi j and the
expected frequency for the event is

Eij = = (column total)(row total)


total observation

If H0 is true then the test statistic is .

At sig. level, the critical region CR = { } with


v = (r – 1)(c – 1) degrees of freedom.

4
E.g. Consider the following table of frequencies of 6800 men
according to eye color and hair color.
Hair color Fair Brown Black Red Total
Eye color
Blue 1768 807 189 47 2811
Grey or green 946 1387 746 53 3132
Brown 115 438 288 16 857
Total 2829 2632 1223 116 6800

Use the level of sig. = 1% to test the null hypothesis that there is
no relationship between hair and eye color.
To test H0 : eye color and hair color are not associated
against H1 : eye color and hair color are associated.

Hair color Fair Brown Black Red Total


Eye color
1768 807 189 47
Blue (1169.46) (1088.02) (505.57) (47.95) 2811
Grey or green 946 1387 746 53 3132
(1303.00) (1212.27) (563.30) (53.43)
Brown 115 438 288 16 857
(356.54) (331.71) (154.13) (14.62)
Total 2829 2632 1223 116 6800

The expected frequencies are shown in parentheses below the


observed frequencies.
Blue & Fair, E = (2829 x 2811)/6800 =1169.46

If Ho is true then the test statistic is

=
= 1073.52

5
At 1% sig. level and v = (3-1)(4-1) = 6 degrees of freedom, the
critical value is = 16.812
Since the calculated test statistic = 1073.52 > 16.812, H0 is
rejected showing that hair color and eye color are significantly
associated at 1% sig. level.

(c) Test of goodness of fit


We shall study another application of the criterion, in which we
compare an observed frequency distribution with a distribution we
might expect according to theory or assumption. The hypotheses may
be tested by comparing the observed frequencies Oi , i = 1, 2, …, k
with the corresponding theoretical frequencies Ei , i = 1, 2, …, k
in k classifications.
The test statistic is if Ho is true.
At sig. level, the critical region CR = { } where
v = k – 1 – no. of parameters estimated by sample data in the
assumed distribution.

Example
A study is conducted of the number of calls received on the
switchboard of an insurance firm. A count is made of the number of
incoming calls per minute for a sample of 100 minutes. The results
of the study are shown below:
No. of calls Observed
per min (X) freq. (O)
0 40
1 35
2 14
3 8
4 2
5 1

6
Test at 5% level of sig. whether the distribution of calls arriving at
the switchboard is a Poisson distribution.

Expected no. of calls per minute =


Let X = Poisson random variable
P[ X = x] =

Expected frequency, E = P [ X = x] x = P [ X = x ] x 100

No. of calls Expected freq. Observed


P[ X = x]
per min (X) (E) freq. (O)
0 0.3679 36.79 40 0.28
1 0.3679 36.79 35 0.09
2 0.1839 18.39 14 1.05
3 0.0613 6.13 8 0.57
4 0.0153 1.53 2 0.14
5 and more 0.0037 0.37 1 1.07
Total 1 100 100 3.2
H0 : the population distribution of incoming calls is Poisson
distributed
H1 : the population distribution of incoming calls is not Poisson
Distributed
The test statistic = 3.2

At 5% sig. level, with v = k – 1 – no. of parameters (6 – 1 –1) = 4


degrees of freedom, the critical value is 9.488.

Since = 3.2 < 9.488, H0 is not rejected showing that the


observed distribution is drawn from a population that is Poisson
distributed at 5% sig. level.

7
Rules for using Chi-square test
(1) The total no. of observations (total frequencies) should not be
too small , usually not less than 50.
(2) There should be at least a frequency of 5 in each expected
frequency. If E < 5, then it is necessary to group several
adjacent classes into one class.

E.g. In testing whether the distribution of calls arriving at the


switchboard is a Poisson distribution. We can combine the last 3
classes into one.
No. of calls Expected freq. Observed
P[ X = x]
per min (X) (E) freq. (O)
0 0.3679 36.79 40 0.28
1 0.3679 36.79 35 0.09
2 0.1839 18.39 14 1.05
3 and more 0.0613 6.13+1.53+0.37=8.03 8+2+1=11 1.10

The test statistic = 0.28 + 0.09 + 1.05 + 1.10 = 2.52

The critical value at 5% significant level with 4 -1 - 1= 2 d.f. is


5.991. Since = 2.52 is less than 5.991, H0 is not rejected, the
decision is the same as the previous example.

(3) When using - test with 1 d.f., Yate’s correction is used.

8
Analysis of Variance
Previously, we have tested hypotheses comparing two means. Now, we
expand further our idea of hypothesis tests. We describe a test that
simultaneously compares several means (three or more means).
The null hypothesis is Ho: 𝜇1 = 𝜇2 =⋅⋅⋅= 𝜇𝑐 for c population means.
The alternative hypothesis is H1 : not all 𝜇𝑖 ’s are equal, i = 1, 2, …, c.

The technique for comparing several means simultaneously is called


analysis of variance technique (ANOVA). The probability distribution
used in this chapter is the F-distribution.

The F-distribution
The F-distribution, similar to the t-distribution and the 𝜒 2 -distribution,
is a family of probability distributions. Each F-distribution is identified
by two numbers of degrees of freedom, the degrees of freedom in the
numerator and the degrees of freedom in the denominator.

Properties of the F-distribution


1. F is nonnegative in value; it ranges from 0 to ∞. It is a continuous
distribution.
2. F is nonsymmetrical: it is skewed to the right.
3. There is a separate F-distribution for each pair of numbers of degrees
of freedom.

ANOVA assumptions
To use ANOVA, we assume the following:
1. The c populations are normally distributed.
2. The populations have equal variances (𝜎 2 ).
3. The c samples are selected independently.
When these conditions are met, F is used as the test statistic. The
symbolic name for a critical value of F will be F𝛼 ,dfn, dfd , where
𝛼 = the level of significance of the test , i.e. the area under the
distribution curve to the right of the critical value being sought.
dfn = the degrees of freedom in the numerator
dfd = the degrees of freedom in the denominator

1
Table 9 gives values of F𝛼 , dfn, dfd for 𝛼 = 0.05, 0.025, 0.01, 0.001 for
various combinations for the degrees of freedom. Hence, the critical
value with 6 and 10 d.f. for 𝛼 = 0.05 (area to the right = 0.05) is
F 0.05, 6, 10 = 3.22

One Way Analysis of Variance- --Completely Randomized Design


‘One-way ’ refers to the fact that only one factor is being studied in the
experiment. The single factor is studied at c levels/ treatments. We
assume that the experiment has been completely randomized. Here, we
discuss only fixed-effects model. The term ‘fixed – effects’ refers to
the fact that the treatments or levels of the factor involved are
specifically selected by the experimenter because they are of particular
interest. They are not randomly selected from a larger group of
possible treatments or levels. Random selection of treatments or levels
lead to ‘random –effects’ model which will not be discussed here.

Random samples of ki observations are taken from each of the c


populations, where i = 1, 2, …,c. These c different populations are
classified on the basis of a single factor such as different treatments ( or
levels). Today, the term treatment is used generally to refer to the
various classifications. For example, a factor such as baking
temperature may have several levels: 3000 C, 3500 C, 4000 C , 4500
C. A factor such as fertilizer may have several treatments: Brand A,
Brand B, Brand C. It is assumed that the c populations are independent
and normally distributed with means 𝜇1 , 𝜇2 ,⋅⋅⋅, 𝜇𝑐 and common
variance 𝜊 2 .

We wish to derive appropriate method for testing the hypothesis


Ho: 𝜇1 = 𝜇2 =⋅⋅⋅= 𝜇𝑐 against H1: at least two of the means are not
equal.

2
Data presentation
Let xij denote the jth observation from the ith treatment and arrange the
data as follows
Treatment 1 2 … i … c
x11 x21 xi1 xc1
x12 x22 xi2 xc2
. . . .
. . . .
x1𝑘1 xi
x2𝑘2 xc𝑘𝑐
Total T1 T1 T2 … Ti … Tc T
mean 𝑥1 𝑥1 𝑥2 𝑥𝑖 𝑥𝑐 𝑥
Here, Ti is the total of all observations for treatment i (or column total),
𝑥𝑖 is the mean of all observations for treatment i, and n is the total
sample size.
𝑘𝑖 𝑥
𝑘𝑖 ∑𝑗=1 𝑖𝑗
That is , Ti = ∑𝑗=1 𝑥𝑖𝑗 , 𝑥𝑖 = , n = k1+k2+…. +kc = ∑𝑐𝑖=1 𝑘𝑖
𝑘𝑖
The symbol T represents total of all observations in the experiment and
𝑥 represents the overall mean of all observations:
𝑘
𝑘𝑖 ∑𝑐𝑖=1 ∑𝑗=1
𝑖 𝑥
𝑖𝑗 𝑇
T = ∑𝑐𝑖=1 ∑𝑗=1 𝑥𝑖𝑗 = ∑𝑐𝑖=1 𝑇𝑖 , 𝑥= =
𝑛 𝑛

Partitioning the Total Variation in An Experiment


ANOVA procedure begins by considering the total variability of our
data, which is measured by a quantity called the total sum of squares
2 2 (∑𝑖 ∑𝑗 𝑥𝑖𝑗 )2
SS(Total) = ∑𝑖 ∑𝑗(𝑥𝑖𝑗 − 𝑥 ) = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 −
𝑛
2 𝑇2
𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) = ∑ ∑ 𝑥𝑖𝑗 −
𝑛
𝑖 𝑗
This total SS is partitioned into two components: The first component,
called the sum of squares for treatments [ SS(treatments) ], measures
the variation among the c sample means
𝑇1 2 𝑇2 2 𝑇𝑐 2 (∑𝑖 ∑𝑗 𝑥𝑖𝑗 )2
SS(treatments) = ∑𝑖 (𝑥𝑖 − 𝑥)2 = ( + +⋅⋅⋅ + )−
𝑘1 𝑘2 𝑘𝑐 𝑛
2 2 2 2
𝑇1 𝑇2 𝑇𝑐 𝑇
𝑆𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠) = ( + +⋅⋅⋅ + )−
𝑘1 𝑘2 𝑘𝑐 𝑛

3
The second component, called the sum of squares for error [ SS(error) ],
is used to measure the variation within the c samples

2 2 𝑇1 2 𝑇2 2 𝑇𝑐 2
SS(error) = ∑𝑖 ∑𝑗(𝑥𝑖𝑗 − 𝑥𝑖⋅ ) = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 − ( + +⋅⋅⋅ + )
𝑘1 𝑘2 𝑘𝑐

SS(Total) = SS(treatments) + SS(error)


=> SS(error) = SS(Total) – SS(treatment)

The degrees of freedom, df, associated with each of these 3 sources of


variation are determined as follows:-
df (Total) = total number of observations – 1 = n – 1
df (treatments) = total number of treatments – 1 = c – 1
df (error) = sum of the degrees of freedom for the c treatments
= (k1 – 1) +(k2 – 1) + … +(kc – - 1) = (k1 + k2 +… +kc) – c
= n–c

*Note that df(total) = df(treatments) + df(error)

The mean square for treatments and the mean square error are obtained
by dividing the sum-of-squares value by the corresponding number of
degrees of freedom.

𝑆𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠) 𝑆𝑆(𝑒𝑟𝑟𝑜𝑟)
MS (treatments) = MS (error) =
𝑐−1 𝑛−𝑐

The total variation in the experiment is then displayed in an ANOVA table


as follows:

Source df SS MS F
𝑆𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠) 𝑀𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠)
Treatments c–1 SS(treatments)
𝑐−1 𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)
𝑆𝑆(𝑒𝑟𝑟𝑜𝑟)
Error n–c SS(error)
𝑛−𝑐

Total n–1 SS(total)

4
Testing the Equality of the Treatment Means
The mean squares in the ANOVA table are used to test the null hypothesis
Ho : 𝜇1 = 𝜇2 =⋅⋅⋅= 𝜇𝑐 against
H1 : at least two of the means are not equal

𝑀𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠)
The test statistic is F* =
𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)
In the formula of F, a measure of the variation between the treatments, the
MS(treatments), is compared to a measure of variation within treatments,
the MS(error). If the MS(treatments) is significantly larger than the
MS(error), we will conclude that the treatment means are not all the same
as expressed by H1. This would imply that the factor being tested does have
a significant effect on the response variable.
Thus, large values of F* lead to the rejection of Ho.
If, however, the MS(treatments) is not significantly larger than the
MS(error), we will not be able to reject the null hypothesis that all means
are equal.

At 𝛼 sig. level, the critical value is F 𝜶, c-1 , n-c .


If the computed value F* falls in critical region (CR), Ho is rejected
otherwise Ho is not rejected.

Example The temperature at which a factory is maintained is believed to


affect the rate of production of workers in the factory. The data in the
following table are the number of units produced per hour, x, for randomly
selected one-hour periods when the production process in the factory was
operating at each of three temperatures.
Do these data suggest that temperature has a significant effect on the
production rate at 5% sig. level?
Temperature (treatments)
68oF 72oF 76oF
10 7 3
12 6 3
10 7 5
9 8 4
7
T1 = 41 T2 = 35 T3 = 15 T = 91

5
Step 1 Let 𝜇1 = mean production rate at 68o F
𝜇2 = mean production rate at 72o F
𝜇3 = mean production rate at 76o F

Ho : 𝜇1 = 𝜇2 = 𝜇3 against
H1 : not all means production rate are equal

Step 2 We will use an ANOVA table to record the sums of squares (SS)
and organize the calculation.

Source df SS MS F
Treatments c–1
Error n–c
Total n-1

∑ ∑ 𝑥𝑖𝑗 2 =102+122+102 + 92+72+ 62 + 72 +82+72+32+32 +52+42 =731


n = 13
𝑇2 912
SS(total) = ∑ ∑ 𝑥𝑖𝑗 2 − = 731 – = 731 – 637 = 94
𝑛 13
𝑇1 2 𝑇2 2 𝑇3 2 𝑇2
SS(treatments) = ( + + )−
𝑘1 𝑘2 𝑘3 𝑛
SS(treatment) = = 721.5 – 637 = 84.5

SS(error) = SS(total) – SS(treatments)

SS(error) = = 9.5
42.25
The calculated value of the test statistic F* = = 44.47
0.95

Step 3
At 5% sig. level, the critical value is F 5%, 2, 10 = 4.10.

Step 4
Since F* > F 5%, 2, 10 , Ho is rejected at 5% sig. level. We conclude that the
three temperature treatments have significantly different effects on the
production rate at 5% sig. level.

6
Estimating Differences in the Treatment Means
Suppose in carrying out the ANOVA procedure, we make the decision to
reject the null hypothesis. This allows us to conclude that all the treatment
means are not the same. The next question we may want to ask is : which
treatment means are different from the others? This section provides a
procedure for comparing a particular pair of means.
The t- distribution is used as a basis for such comparison. Recall that one
of the assumptions of ANOVA is that the population variances are equal
for all treatments. This common population variance 𝜎 2 is estimated by
the mean square error, the MS(error), in the ANOVA.

The 100(1-𝛼)% confidence interval for a single treatment mean is


𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)
𝑥𝑖 ∓ 𝑡𝛼,𝑛−𝑐 ⋅ √
2 𝑘𝑖
The 100(1- 𝛼 )% confidence interval for the difference between two
treatment means is
1 1
(𝑥𝑖 − 𝑥𝑗 ) ∓ 𝑡𝛼,𝑛−𝑐 ⋅ √𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)( + )
2 𝑘𝑖 𝑘𝑗

Using the same data as the previous example,


(a)Find a 99% C.I. for the mean production rate of workers who work under
68oF
(b)Find a 99% C.I. for the difference in the mean production rate of workers
who work under 68oF and 76oF. Could there be a significant difference in
the mean production rate of workers who work under 68oF and 76oF?

7
Randomized Complete Block Design
When we want to compare the means of c populations in the presence
of an extraneous variable, blocking is used. A block is a collection of
c experimental units that are as nearly alike (homogeneous) as possible
relative to the extraneous variable. Each treatment is randomly assigned
to 1 unit within each block. Since the effect of the extraneous variable
is controlled by matching like experimental units, any differences in
response are attributed to treatments effects.

Examples of block factors are time, people, machinery, batches of raw


material , etc.

The experimental design presented here is still one factor experiment


and is called the randomized complete block design with fixed effects.
The word ‘blocks’ refers to the fact that experimental units have been
matched relative to some extraneous variable; ‘randomized ‘ refers to
the fact that treatments are randomly assigned within blocks; and to
say that the design is complete implies that each treatment is used
exactly once within each block.

The term ‘fixed effects’ applies to both blocks and treatments. That is,
it is assumed that neither blocks nor treatments are randomly chosen.
Any inferences made apply only to the c treatments and b blocks
actually used.

In this case, it is classified as a two-way ANOVA and it can be


considered as a special case of a two-factor analysis of variance where
block is considered as the second factor.

Let 𝜇.𝑖 be the population mean of the ith treatment


Ho : 𝜇.1 = 𝜇..2 =. . . = 𝜇.𝑐 against
H1 : not all 𝜇.𝑖 ’s are equal, i = 1,2,…,c

Data presentation
Let xij denote the observation from the ith block and jth treatment .
There are n = cb measurements divided between c treatment levels and
b blocks.

8
Treatment
Block 1 2 3 . .j . c block total block mean
1 x11 x12 x13 x1c T1. 𝑥1.
2 x21 x22 x23 x2c T2. 𝑥2.
3 x31 x32 x33 x3c T3. 𝑥.3.
.
i xi1 xi2 xi3 xij xic Ti . 𝑥𝑖.
.

b xb1 xb2 xb3 xbc Tb. 𝑥𝑏.

Treatment total: T.1 T.2 T.3 T.j T.c T..


Treatment mean: 𝑥.1 𝑥.2 𝑥.3 𝑥.𝑗 𝑥.𝑐 𝑥..

The derivation of the formulae used will be the same as one-way


ANOVA. There are three sources of variation in this case: between-
block variation, between-treatment variation and random variation due
to experimental error.

Formulae for a randomized complete block design


Source df SS MS F
∑𝑐𝑗=1 𝑇.𝑗 2 𝑇.. 2 𝑆𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠) 𝑀𝑆(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠)
Treatments c-1 SS(treatments)= −
𝑏 𝑛 (𝑐−1) 𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)

∑𝑏
𝑖=1 𝑇𝑖.
2
𝑇.. 2 𝑆𝑆(𝑏𝑙𝑜𝑐𝑘) 𝑀𝑆(𝑏𝑙𝑜𝑐𝑘𝑠)
Blocks b -1 SS(blocks)= −
𝑐 𝑛 (𝑏−1) 𝑀𝑆(𝑒𝑟𝑟𝑜𝑟)

𝑆𝑆(𝑒𝑟𝑟𝑜𝑟)
Error (c-1)(b-1) SS(error) by subtraction
(𝑐−1)(𝑏−1)

𝑇.. 2
Total n-1 SS(total) =∑ ∑ 𝑥𝑖𝑗 2 −
𝑛

SS(total) = SS(treatments) + SS(blocks) + SS(error)

9
Example
In an experiment to compare the percentage efficiency of different
chelating agents in extracting metal ions from aqueous solution the
following results were obtained:
Chelating agent
Day A B C D
1 84 80 83 79
2 79 77 80 79
3 83 78 80 78
(a) Test whether the different chelating agents have significantly
different efficiencies at 5% sig. level.
(b) Test whether the day-to-day variation has significantly affect the
efficiencies at 5% sig. level.

In this e.g. , the factor is chelating agent ,there are 4 treatments (A, B, C,
and D) and 3 blocks.
The day on which the experiment is performed introduces uncontrolled
variation caused both by changes in laboratory temperature, pressure etc.
and slight differences in the concentration of the metal ion solution, i.e.
the day is an extraneous uncontrolled random factor. It is a randomized
complete block design.
Chelating agent
Day A B C D Total
1 84 80 83 79 T1. =
2 79 77 80 79 T2. =
3 83 78 80 78 T3. =
Total: T.1 = T.2= T.3= T.4 = T.. =

∑ ∑ 𝑥𝑖𝑗 2 = 842 + 802 + 832 + … + 782 = 76854


2 𝑇.. 2
SS(total ) = ∑ ∑ 𝑥𝑖𝑗 − = =54
𝑛
∑4𝑗=1 𝑇.𝑗 2 𝑇.. 2
SS(treatments) = − = =28.67
𝑏 𝑛
∑3𝑖=1 𝑇𝑖. 2 𝑇.. 2
SS(blocks) = − = =15.5
𝑐 𝑛
SS(error) = SS(total) - SS(treatments) - SS(blocks)
SS(error) = = 9.83

10
ANOVA table
Source df SS MS F*
Treatments
Blocks
error
total

(a) Ho : 𝜇.1 = 𝜇.2 = 𝜇.3 = 𝜇.4


H1 : not all treatment means are equal

The test statistic F*= 5.83

At 5% sig. level, the critical value is F 5% , 3, 6 = 4.76


Since F* falls in CR, Ho is rejected indicating that there is a significant
difference between treatments, i.e. a significant difference between the
efficiency of different chelating agents at the 5% sig. level.

(b) Ho : 𝜇1. = 𝜇2. = 𝜇3.


H1 : not all block means are equal

The test statistic F* = 4.73

At 5% sig. level, the critical value is F 5% , 2, 6 = 5.14


Since F* falls in AR, Ho is not rejected indicating that there is no
significant difference between days, i.e. no significant difference between
the efficiency of different days at the 5% sig. level.

Remarks: Randomized complete block design is a one factor with several


treatments/ levels model, our main concern is to test any significant
difference between treatments and not to test between blocks. Sometimes,
we may also wish to test any significant difference between block means,
if the test procedure shows that the block means do not differ greatly,
blocking may not be necessary in future experiments.

11
Example A company sells three shampoos for dry, normal and oily hair.
Sales, in millions of dollars, for the past five months are given in the
following table:-
month Sales ($mns)
Dry Normal Oily
June 7 9 12
July 11 12 14
Aug. 13 11 8
Sept. 8 9 7
Oct. 9 10 13
Using 5% sig. level, apply the ANOVA procedure to test whether:-
(a) the mean sales for dry, normal and oily hair are the same.
(b) the mean sales are the same for each of the five months.

The response variable : sales of shampoo at 3 levels/treatments


The 5 months represent 5 blocks.
n = 3x5 = 15
∑ ∑ 𝑥𝑖𝑗 2 = 72 + 92 + 122 + . . . + 102 + 132 = 1633

month Sales ($mns)


Dry Normal Oily
June 7 9 12
July 11 12 14
Aug. 13 11 8
Sept. 8 9 7
Oct. 9 10 13

2 𝑇.. 2 1532
SS(total) = ∑ ∑ 𝑥𝑖𝑗 − = 1633 - = 72.4
𝑛 15
∑𝑐𝑗=1 𝑇.𝑗 2 𝑇.. 2 482 512 542 1532
SS(treatments) = − = ( + + )−
𝑏 𝑛 5 5 5 15
= 1564.2 - 1560.6 = 3.6
∑𝑏
𝑖=1 𝑇𝑖.
2
𝑇.. 2 282 372 322 242 322 1532
SS(blocks) = = ( + − + + )−
𝑐 𝑛 3 3 3 3 3 15
= 1592.33 - 1560.6 = 31.73
SS(error) = SS(total) - SS(treatments) - SS(blocks)
= 72.4 - 3.6 - 31.73 = 37.07

12
ANOVA table:

Source df SS MS F*
Treatments
Blocks
Error
total

(a) Ho : 𝜇.1 = 𝜇.2 = 𝜇.3


H1 : not all treatment means are equal
1.8
The test statistic F*= = 0.39
4.634

At 5% sig. level, the critical value is F 5% , 2, 8 = 4.46


Since F* falls in AR, Ho is not rejected indicating that there is no
significant difference between treatments, i.e. no significant difference
in the mean sales for dry, normal and oily hair shampoo at 5% sig. level.

(b) Ho : 𝜇1. = 𝜇2. = 𝜇3. = 𝜇4. = 𝜇5.


H1 : not all block means are equal
7.933
The test statistic F* = = 1.71
4.634

At 5% sig. level, the critical value is F 5% , 4, 8 = 3.84


Since F* falls in AR, Ho is not rejected indicating that there is no
significant difference in the mean sales for each of the 5 months at 5%
sig. level.

13
Statistical Quality Control (SQC)
Introduction
In this chapter, we shall be concerned with industrial
manufacturing processes with repetitive operations, where the
quality of the product is of interest. Quality control is a tool
for cost reduction and quality improvement.

E.g. A machine is used to pack dry cement into sacks. The


weight is specified as 100 kgs. Because of such factors as
humidity, flow of cement to and from the machine, quality of
cement, operator effects etc, there will be variation in weight
from sack to sack. Usually, the sack weights are specified as
100  1 kgs i.e. design specification limits (or specification
tolerance or tolerance limits ) are given.

There are two main aspects of SQC:-


(a) Process control, using control charts, whereby the
proportion of unsatisfactory product is kept at minimum.
(b) Acceptance sampling, whereby no lots or batches should
be accepted if they contain an excessive number of
unsatisfactory items.
The syllabus covers only (a) process control.

Types of measurements
(i) variable e.g. weight, length, height etc. Variable
measurements are usually assumed to be normally
distributed.
(ii) Attribute e.g. good or defective, accepted or rejected, up or
down etc. binomial random variable is appropriate.

1
Sources of variation in a manufacturing process

(a) Random (or chance) variation


Some sources of random variation in the example on weight
of cement might be the humidity, the quality of cement and
the internal machine friction. Chance variations are small
variations in the product or production process due to the
inherent dissimilarity in productive inputs used in the
production process.

(b) Non-random or assignable variation


Refer again the example on weight of cement, some sources
of non-random variation are operator clumsiness, the sack
openings being too small and the machine being improperly
set. These factors can be detected and eliminated to ensure a
more uniform product. Assignable variation in the product or
production process signals that the process is out of control
and corrective efforts are required.

The purpose of process quality control is to detect and


identify non-random sources of variation , so that they can be
corrected.

Control charts for variables


The primary objective of process control is to keep the
process within the control limits; and the statistical tool used
for doing this is a control chart. A control chart consists of a
central line, lower and upper control limits (LCL & UCL). It
is a plot of a measure such as the mean , range , standard
deviation, proportion defective or number of defects per item
obtained for a process against time/sample number
(subgroup).

2
UCL

Central line

LCL

Time/sample
number

When dealing with a variable (continuous data), one is


generally interested in controlling the mean and variation of
the process; thus there are two main process control charts for
variables:
(a) Control chart for sample mean , X - chart, it is used to
monitor the behavior of the sample means.
(b) Control chart for sample variation (here, range R-chart and
standard deviation S-chart will be discussed), it is used to
monitor the sample variation or dispersion.

Note : SQC uses range as measure of dispersion, although it


is not a good estimator in general, it is adequate for small
samples and has found acceptance in quality control circles.

Quality control in the main is based upon sampling theory


Control chart for sample means, X - chart
For samples of size n, the sample means are approximately
normally distributed with mean  =  and  = 
x x
n
where  and  are the population mean and standard
deviation respectively.

3
The 2–sigma (2  ) limits or inner control limits are:
2
  2 =  x
n
95.45% of the sample means will lie inside the 2  limits
(warning limits)
The 3 – sigma (3  ) limits or outer control limits are:
3
 3  =  
x
n
99.73% of the sample means will lie inside the 3  limits
(action limits)

In practice,  and  are unknown, they have to be


estimated from preliminary samples/subgroups taken when
the process is thought to be in control. From k (at least 20 to
25) samples, each of size n (often either 4, 5 or 6), the mean
of the k sample means, X is used to estimate  . The mean
of the sample ranges, R multiplied by a tabulated factor an
(depends upon n ) is used to estimate  .
Usually , only 3  limits (action limits) are calculated and
shown on the control chart i.e.   3 which are estimated by
n
an R 3 an
X  3 = X  ( )R = X  A2R
n n
3 an
where A2 = is a tabulated factor for different n in
n
constructing 3  control chart for averages as given in table.
Control chart for means, we have the following formulae:
A2 = 0.483

Lower control limit, LCL = X − A2R


Upper control limit, UCL = X + A2R
Central line = X

4
Features indicating a process is out of control (or assignable
variations are present)
(a) A single point is outside the 3  limits. The probability of a
point being outside the 3  limits, if the process is in control is
0.27%.
(b) Several points are near the control limits, especially
successive points. Some quality control experts draw 2  and
3  limits on a control chart. Several successive points
beyond the 2  limits indicate that the process should be
carefully watched.
(c) A run of several points is on one side of the central line. The
probability that a single point is (say) above the central line is
1
. The probability that six successive points would lie above
2
1
the central line only ( )6 = 0.016.
2
(d) A trend in the points exists.

Example
The weight of component is specified as 5.5  1.0 kg. 5
Samples each of size n = 6 items are drawn each hour for 5
consecutive hours. The results are tabulated as follows:-

Sample
Weight (kg)
Hour(am) no. Sum Mean Range
8:00 1 4.9 4.8 4.8 5.1 6.6 5.2 31.4 5.2 1.8
9:00 2 6.8 5.1 5.2 7.1 5.3 5.2 34.7 5.8 2
10:00 3 7.1 6.9 5.9 6.2 6.9 6.9 39.9 6.7 1.2
11:00 4 6.8 6.2 6.5 7.1 7.6 6.8 41.0 6.8 1.4
12:00 5 6 4.6 4.5 4.5 4.3 5.2 29.1 4.9 1.7
Average: 5.9 1.62
(a) Find the % of components failed to meet the specified
tolerances.
(b) Construct the X -chart. Does the process appear to be in
control?

5
Solution
(a) A chart which is not a control chart may be of interest to
production supervision, but it does not give the definite basis
for action that the control chart supplies. It shows individual
measurements plotted for each sample and the nominal
weight, upper and lower tolerance limits.

From the above chart, 12 out of the 30 components failed


to meet the specified tolerances.
% components failed to meet the specified tolerances = 40%

(b) Mean of the sample means = X = 5.9


Mean of the sample ranges = R = 1.62

The 3  control limits for sample means are


LCL = 5.9 + 0.483(1.62) = 6.68246
UCL = 5.9 - 0.483(1.62) = 5.11754

6
The X -chart shows sample means (averages) rather than
individual values on the vertical scale is then drawn as
follows:-

From the above chart, we can see that 2 of the 5 sample means
lie outside the 3  control limits. The process is out of control
and it is the responsibility of the Q.C. inspector to inform the
production department so that any assignable causes can be
detected.

Remark:
Tolerance limits should not be indicated on the X -chart. It is
the individual output that has to meet the tolerance limits, not
the average of a sample. Averages of samples often fall
within tolerance limits even though some of the individual
outputs in the sample are outside the limits.

7
Control chart for sample ranges , R-chart
In addition to monitoring changes in the mean, it is also useful
to closely scrutinize variation in the dispersion within a
process. Although standard deviation is a reliable measure of
dispersion, quality control techniques usually rely on the
range as an indication in the variability of the process. Range
is easier to compute and more readily understood by those
without a sufficient statistical background.

A lower control limit (LCLR) and upper control limit (UCLR)


for the ranges are calculated which, like those for the X -chart,
are three standard errors above and below the mean range. In
principle, they are determined as follows:-

LCLR = R - 3 SR and
UCLR = R + 3 SR
where SR = standard deviation in the distribution of the
sample ranges. In practice, it is calculated as follows:-

LCLR = D3 R
UCLR = D4 R
Central line = R
where D3 = 0 and D4 =2.004

Example (using data same as the previous one)


We have calculated R = 1.62
From table, n = 6, D3 = 0, D4 = 2.004

LCLR = D3 R = 0 (1.62) = 0
UCLR = D4 R = 2.004 (1.62) = 3.25

8
The R-chart is constructed as follows:-

Remarks:
The interpretation of the R-chart is similar to that of X -chart.
The process can be considered out of control with respect to
dispersion if
(a) a single point is outside the 3  limits
(b) several points are near the control limits
(c) several points lie on one side of the central line
(d) there is a trend in the points
In the above example, none of the 4 features described above
are clearly satisfied, thus the process variation is in control.

9
Control charts for attributes
Control charts for X -chart and R –chart are designed to
monitor quantitative data in a process. In many cases it is
necessary or desirable to measure the quality of a process, or
the output of that process, based on the attribute of
acceptability. The term attribute, as used in quality control,
refers to those quality characteristics which are either good or
defective, accepted or rejected etc. They conform to
specifications or they do not conform to specifications.

Two common types of control charts focus on acceptability


will be discussed namely p-chart for fraction defective in a
sample or subgroup.

Control chart for fraction defective , p-chart


Let X = number of defectives in a sample of size n,
then X follows a binomial distribution with parameters n
and p where p = probability of a given item being defective.

For binomial distribution, mean E(X) = np & V(X) = np(1 – p).

X
Fraction defective (or proportion defective) =
n
p (1 − p )
with mean E( X ) = p and V( X ) =
n n n
Control charts can be set up using the binomial distribution but
the working is simplified by using normal distribution to
approximate the binomial distribution.
The 3  control limits of the fraction defectives are:
LCLp = p - 3 p (1− p ) and
n
p (1 − p )
UCLp = p + 3
n
and p is estimated by p = mean proportion of defectives for
all samples.

10
In constructing p-charts, we simply take note of the proportion
of defective items in a sample. This sample proportion, p̂ is
Number of defectives in a sample
p̂ =
sample size

As with control charts for variables, several samples are taken,


yielding several values for p̂ . The mean proportion of
defectives for these several samples, p is then calculated as

total number of defectives in all samples


p =
total number of all items inspected

The 3  control limits for p-chart,

p (1 − p )
LCLp = p − 3
n
p (1 − p )
UCLp = p + 3
n
Central line = p

11
Example
XYZ Factory makes electric guitars and other musical
instruments. A quality control procedure to detect proportion
of defective in their AAA model guitar entailed the selection
of k = 15 different samples of size n = 40. The number of
defective guitars in each sample is shown in the following
table:

Sample Number of Sample proportion


Defective guitars Of defective
1 10 0.25
2 12 0.30
3 9 0.23
4 15 0.38
5 27 0.68
6 8 0.20
7 11 0.28
8 11 0.28
9 13 0.33
10 15 0.38
11 17 0.43
12 3 0.08
13 25 0.63
14 18 0.45
15 17 0.43
211

(a) Design a fraction defective chart. Indicate the upper and


lower control limits and central line on the chart.
(b) Interpret the chart.

12
Solution:-
(a) A total of k x n = 15(40) = 600 guitars are inspected.

211
p = = 0.35
600
p (1 − p ) 0.35 (1 − 0.35 )
LCLp = p − 3 = 0.35 - 3 = 0.12
n 40
p (1 − p ) 0.35 (1 − 0.35 )
UCLp = p + 3 = 0.35 + 3 = 0.58
n 40

13
(b) Interpretation of the p-chart is the same as for the
variable chart. The same four indications that the process is
out of control applied here.
Note that a point above the UCLp indicates a deterioration in
quality whereas a point below indicates an improvement in
quality. If the sample size n changes, the control limits must
be recomputed since UCLp and LCLp are linked to n.
In this example, the p- chart shows that 3 out of 15 samples
are outside the control limits (i.e. sample nos. 5, 12 and 13 are
out of control). The search for the assignable causes revealed
that:
(i) Sample no. 5 was taken during time when certain key
personnel were on vacation, and less skilled employees were
forced to fill in.
(ii) Sample no. 12 has an unusually low proportion of
defective which is due to a one-time use of superior raw
materials when the regular supplier was unable to provide
the usual material.
(iii) Sample no. 13 was taken when new construction at the
plant temporarily interrupted electric power, thus
disallowing the use of computerized production methods.

The assignable cause has been identified for each abnormality.


If desired, action be taken to prevent their reoccurrence.

Further analysis can be done by eliminating all sample values


which fall outside the control limits , and final control limits
be calculated on this reduced sample, which are then used to
monitor production in the future.

14
Example
Eliminating sample nos. 5, 12 and 13 yields

156
p = = 0.33
480

p (1 − p ) 0.33 (1 − 0.33 )
LCLp = p − 3 = 0.33 - 3 = 0.11
n 40

p (1 − p ) 0.33 (1 − 0.33 )
UCLp = p + 3 = 0.33 + 3 = 0.55
n 40

The final control limits can also be shown on the p-chart.

15

You might also like