0% found this document useful (0 votes)
13 views54 pages

Chapter 5 Up To 9 Revised

Uploaded by

galeca4913
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views54 pages

Chapter 5 Up To 9 Revised

Uploaded by

galeca4913
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

STATISTICS FOR INDUSTRIAL CHEMIST

UNIT FIVE
ELEMENTARY PROBABILITY
Objectives:
Having studied this unit, you should be able to
✓ understand the elements of probability
✓ calculate some probabilities of events associated with random experiments
✓ apply the concept of probability in some biological phenomena
5.1 Introduction

Without some formalism of probability theory, the student cannot appreciate the true
interpretation from data analysis through modern statistical methods. It is quite natural
to study probability prior to studying statistical inference. Elements of probability allow
us to quantify the strength or “confidence” in our conclusions. In this sense, concepts in
probability form a major component that supplements statistical methods and helps us
to gauge the strength of the statistical inference. The discipline of probability, then,
provides the transition between descriptive statistics and inferential methods. Elements
of probability allow the conclusion to be put into the language that the science or
engineering practitioners require. An example follows that will enable the reader to
understand the notion of a P-value, which often provides the “bottom line” in the
interpretation of results from the use of statistical methods.

5.2 Definition of some probability terms


Definition 5.1: Random experiment is an experiment in which the outcome cannot
be determined or predicted exactly in advance, i.e. it is the process of observing or
measuring the outcome of a chance event.
Some of the characteristics of a random experiment are
✓ All the possible outcomes of the experiment can be specified in advance.
✓ The experiment can be repeated indefinitely.
✓ There is a sort of regularity in the outcomes observed in large repetitions of the
experiment.
Examples of random experiments includes throwing a fair coin and observing the
outcome, throwing a fair die and observing the number on the top face, taking a student
at random from science class and noting the sex of the student.
All of these examples satisfy the above characteristics of a random experiment.

1
STATISTICS FOR INDUSTRIAL CHEMIST

Definition 5.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the
random experiment. The sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote
events with capital letters, A, B, C, etc.

Example 5.1: If an experiment consists of flipping of a coin once, then


S = {H, T} where H means that the outcome of the toss is a head and T that it is a tail.
A= {H} represents the event of head occurring.
Example 5.2: If an experiment consists of rolling a die once and observing the number
on top, then the sample space is S = {1, 2, 3, 4, 5, 6} where the outcome imeans that
iappeared on the die, i= 1, 2, 3, 4, 5, 6. {1}, {2},{3},{4},{5} and {6}are elementary events
i.e. events consisting of a single outcome. Let A represents the event of an odd number
will occur, then A is simply the set containing 1, 3 and 5 i.e. A= {1, 3, 5}.

Review of set theory

Concepts of set theory are important in understanding probability. Given A,B and C are
events associated with a sample space S and ω represents an elementary event
(outcome) in S, then the following are some useful definitions and results in set theory.

Definitions 5.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points
that are both in A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any ω  A, then ω  B. Then A  B .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and
denoted by  .
5. Complement: The complement of a set A denoted by Ac is the set where ω  S, ω  Ac but, ω
A .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if
their intersection is empty. (i.e. A n B =  ). Subsets A1, A2,… are defined to be mutually
exclusive if Ai n Aj =  for every i ≠ j.

Theorem 5.1:Important elementary set theory results


i) Au B=B u A and A n B = B n A
ii) Au (B u C) = (Au B) u C and A n (B n C) = (A n B) n C
iii) An (B u C) = (A n B) u (A n C) and Au (B n C) = (A u B) n (A u C)
iv) (Ac)c = A
2
STATISTICS FOR INDUSTRIAL CHEMIST

v) An S = A; A u S = S; A n  =  ; and A u A =A
vi) (A u B)c = Ac n Bc and (A n B)c = Ac u Bc

5.3 Counting rules


Combinatorics refers to the methods used to count things. If a sample space contains a
finite set of outcomes, determining the probability of an event often is a counting
problem. But often the numbers are just too large to count in the 1, 2, 3, 4 ordinary
ways. For example, if you put a grain of rice on the first square of a chessboard, then
two grains on the second square, four on the third square, and continue doubling until
all 64 squares are filled, how many grains of rice would you have in all? The number is
so large that it is difficult to handle without a systematic enumeration technique.

In short, to assign probabilities for an event, we might need to enumerate the possible
outcomes of a random experiment and need to know the number of possible outcomes
favoring the event. The following principles will help us in determining the number of
possible outcomes favoring a given event.

Theorem 5.2:Addition principle


If a task can be accomplished by k distinct procedures where the ithprocedure has ni
alternatives, then the total number of ways of accomplishing the task equals
n1 + n2+…+nk.

Example 5.3: Suppose one wants to purchase a certain commodity and that this
commodity is on sale in 5 government owned shops, 6 public shops and 10 private shops.
How many alternatives are there for the person to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways

Theorem 5.3: Multiplication principle


If a choice consists of k steps of which the first can be made in n1 ways, for each of
these the second can be made in n2 ways… and for each of these the kth can be made
in nk ways, then the whole choice can be made in n1.n2….nk ways.

Example 5.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to
Washington D.C. in 3 ways then the number of ways in which we can go from Addis
Ababa to Rome to Washington D.C. is 2x3 ways or 6 ways. We may illustrate the
situation by using a tree diagram below:

3
STATISTICS FOR INDUSTRIAL CHEMIST

R W

W
A

W
R
W

Example 5.5: If a test consists of 10 multiple choice questions, with each permitting 4
possible answers, how many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of completing the exam.
Note that there is only one way in which he /she can give correct answers to all
questions and that there are 310 ways in which all the answers will be incorrect.
Example 5.6: A manufactured item must pass through three control stations. At each
station the item is inspected for a particular characteristic and marked accordingly. At
the first station, three ratings are possible while at the last two stations four ratings are
possible. Hence there are 48 ways in which the item may be marked.
Example 5.7: Suppose that car plate has three letters followed by three digits. How
many possible car plates are there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.

Definition 5.4: If n is a positive integer, we define n!= n(n-1)(n-2)…1 and call it n-


factorial and 0!=1.

Permutations
Suppose that we have n different objects. In how many ways, saynPn, may these objects
be arranged (permuted)? For example, if we have objects a, b and c we can consider the
following arrangements: abc, acb, bac, bca, cab, and cba. Thus the answer is 6. The
following theorem gives general result on the number of such arrangements.

Theorem 5.4: Permutation


i) The number of permutations of n different objects is given by nPn= n!
ii) A permutation of n objects, arranged in groups of size r, without repetition,
4
STATISTICS FOR INDUSTRIAL CHEMIST
and order being important is:
n!
n Pr =
(n − r )!

Example 5.8: Suppose that we have five letters a, b, c, d.


i) What is the number of possible arrangements of these letters taken all at a time?
ii) What is the number of possible arrangements of these letters if we use only three
of the letters at a time?
Solution:
i) Using (i) of theorem 5.4, we have 4! ways of arranging the 4 letters, i.e. we have
24 possible arrangements.
ii) Using (ii) of theorem 5.4, we have 4P3 ways of arranging 3 letters taken from the
four letters, i.e. we have 24 possible arrangements.
Example 5.9: In a class with 8 boys and 8 girls
i) In how many ways can the children line up if they alternate girl-boy-girl-boy-... ?
ii) In how many ways can the children line up so that no two of the same sex are next
to each other?
Solution:
i) The 8 girls can line-up in 8! ways, and likewise the 8 boys can line-up in 8! ways.
For any single arrangement of the girls, all possible arrangements of the boys are
possible, thus by multiplication principle we have 8!x 8! ways to arrange the
children in girl-boy lines.
ii) Now we must include the case of boy-girl. So we have 2x8!x 8! ways of arranging.
Example 5.10: If I have 5 different books on my shelf, in how many ways can I arrange
these books? Solution: We can arrange the books in 5! different ways or 5x4x3x2x1 ways
or 120 ways.
Remarks
i) The number of permutations of n distinct objects arranged in a circle is (n-1)!.
This is because we consider two permutations the same if one is a rotation of the other.
For n objects arranged around a circle, there a n rotations that give the same
permutation. Dividing n! by n gives (n - 1)!. The two circular permutations below are
considered the same; their order is a, b, c, d, e.

ii) Permutations when not all objects are different


Given n objects of which n1 are one kind, n2 are another kind, …,nk of another kind, then
the total number of distinct permutations that can be made from these objects is
n!
.
n1!n 2 !...n k !
5
STATISTICS FOR INDUSTRIAL CHEMIST
Example 5.11
i) How many "words" (text strings or distinct arrangements) can be made from
the letters b,k,o,o?
ii) How many permutations are there for the letters in the word banana?
Solution:
i) If we label the two o’s as o1 and o2, and think of them as distinct, then the
number of permutations is 4!. For each permutation there will be a matching
permutation that switches the o’s, that is for o1o2bk there is the matching
o2o1bk permutation. We can see then that if we divide the number of distinct
permutations by two, we have a count of the number of permutations of the 4
letters where we do not distinguish between the two o’s. Therefore, there are
distinct4!/2 text strings or 12 text strings.
ii) If we think of all 6 letters as distinct, then we would have 6! permutations. As
in the preceding example for the two n’s, we would need to divide 6! by 2. For
the 3 a’s, we would have 6 counts for a single permutation. For instance, each
of the following would be a single word if the a’s were not distinct. a1a2a3bnn,
a1a3a2bnn, a2a1a3bnn, a2a3a1bnn, a3a1a2bnn, and a3a2a1bnn. Hence the number
of distinct permutations of the word banana is
6!
= 60 .
2!3!
Combinations
Consider n different objects. This time we are concerned with counting the number of
ways we may choose r out of these n objects without regard to order. For example, we
have the objects a, b, c and d, and r=2; we wish to count ab, ac, ad, bc, bd, and cd. In
other words, we do not count ab and ba since the same objects are involved and only the
order differs.

There are many problems in which we are interested in determining the number of ways
in which r objects can be selected from n distinct objects without regard to the order in
which they are selected. Such selections are called combinations or r-sets. It may help to
think of combinations as committees. The key here is without regard for order.

To obtain the general result we recall the formula derived above: the number of ways of
choosing r objects out of n and permuting the chosen r equals n!/(n-r)!. Let C be the
number of ways of choosing r out of n, disregarding order. C is the number required.
Note that once the r items have been chosen, there are r! ways of permuting them.
Hence applying the multiplication principle again, together with the above result, we
obtain
n!
C.r! = n!/(n-r)!. Therefore, C = . This number arises in many contexts in
r!(n − r )!
mathematics and hence a special symbol is used for it. We shall write

6
STATISTICS FOR INDUSTRIAL CHEMIST
n n!
  = n C r = .
r r!(n − r )!

Theorem 5.5: Combination


The number of ways of choosing r out of n different objects, disregarding order, is
n n!
given by   = .
 r  r!(n − r )!

Example 5.12: How many different committees of 3 can be formed from Hawa, Segenet,
Nigisty and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how
many subsets of 3 elements are there? In terms of combinations the question becomes,
what is the number of combinations of 4 distinct objects taken 3 at a time? The list of
committees:{H,S,N}, {H,S,L}, {H,N,L}, {S,N,L}.Therefore, we have 4C3 or 4 possible
number of committees.
Example 5.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different
committees are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2
men and 3 women can be formed?
 20  20!
Solution: (i) There are   = = 1140 possible committees.
 3  3!17!
 5  7  5! 7!
(i)    = = 350 possible committees.
 2  3  2!3! 3!4!
Remarks:
n  n 
i)   =  
r  n − r

ii) A set with n elements has 2n subsets.

5.4 Probability of an event


Definition 5.5: The Axioms of Probability
Probabilities are real numbers assigned to events (or subsets) of a sample space. We can
think of the assignment of probabilities to events, or probability measure, as a function
between the collection of subsets of the sample space and the real numbers. Mathematically,
a probability measure P for a random experiment is a real-valued function defined on the
collection of events that satisfies the following axioms:
Axiom 1: The probability of an event is a nonnegative real number; that is, P(A) ≥ 0 for any
subset A of S.
Axoim 2: P(S) = 1
7
STATISTICS FOR INDUSTRIAL CHEMIST
Axiom 3: If A1, A2, A3 ... is a finite or infinite sequence of mutually exclusive
events of S, then P(A1 u A2 u A3 u ...) = P( A1) + P( A2) + P( A3) + ...=  P( Ai )

It is rather surprising that with only these three axioms, we can construct the "entire"
theory of probability! The next theorems and definitions help in assigning probabilities
of events.

Theorem 5.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the
probabilities of the individual outcomes comprising A.

Theorem 5.7: Suppose that we have a random experiment with sample space S and
probability function P and A andB are events. Then we have the following results:
i) P(  ) = 0
ii) P(Ac) = 1 − P(A)
iii) P(B n Ac) = P(B) − P(A n B)
iv) If A subset of B then P(A) ≤ P(B).

Definition 5.6: The classical definition of probability


If an experiment can result in any one of N equally likely and mutually exclusive outcomes,
n
and if n of these outcomes constitute the event A, then the probability of event Ais P( A) =
N
.
Example 5.14: Consider the experiment of tossing a fair die. A fair die means that all six
numbers are equally likely to appear. Calculate the probabilities of the following events:
a) A=One will occur ={1}
b) B=Even number will occur ={2, 4, 6}
c) C=Odd number will occur ={1, 3, 5}
d) D=A number less than 3 will occur ={1,2}
Solution:
a) Since the die is fair
1
P( A) = P({1}) =
6
1
P({2}) = P({3}) = P({4}) = P({5}) = P({6}) =
6
3 3 2
b) P( B) = = 0.5; c) P(C ) = = 0.5; d ) P( D) =
6 6 6
Example 5.15: Suppose that we toss two coins, and assume that each of the four
outcomes in the sample space S = {(H,H),(H, T ), (T ,H), (T , T )} are equally likely and
hence has probability ¼. Let A= {(H, H),(H, T )} and B= {(H,H), (T ,H)} that is, Ais the
event that the first coin falls heads, and Bis the event that the second coin falls heads.

8
STATISTICS FOR INDUSTRIAL CHEMIST
Then, calculate the probabilities of A, B, Ac, Bc, and Sc. The event that none of the
outcomes will occur is the same as Sc.
Solution:
2
P ( A) = = 0.5
4
2
P ( B ) = = 0 .5
4
P ( A ) = 1 − P ( A) = 1 − 0.5 = 0.5
c

P ( B c ) = 1 − P ( B ) = 1 − 0.5 = 0.5
P ( S c ) = 1 − P ( S ) = 1 − 1 = 0 = P( )
Example 5.16: From a group of 5 men and 7 women, it is required to form a committee
of 5 persons. If the selection is made randomly, then
i) what is the probability that 2 men and 3 women will be in the committee?
ii) what is the probability that all members of the committee will be men?
iii) what is the probability that at least three members will be women?
12  12!
Solution: The total number of possible committees is   = = 792 , i.e. the number of
 5  5!7!
possible out comes in the sample space is 792.
i) Let A be the event that the committee will consist of two 2 men and 3 women. We
need to know the number of possible outcomes favoring this event. The
 5  5!
number of ways we can select 2 men from 5 men is   = = 10 and the
 2  2!3!
 7  7!
number of ways of selecting 3 women out of 7 women is   = = 35 .
 3  3!4!
Using the multiplication principle, the number of elements favoring event A is
10x35 or 350.
Hence, using the classical definition of probability,
  7 
5
  
P( A) =    =
2 3 350
= 0.44
12  792
 
5
ii) Let B be the event that all members of the committee will be men. Hence
 5  7 
  
P( A) =    =
5 0 1
12  792
 
5
iii) Let C be the event that at least three of the committee members will be women.

9
STATISTICS FOR INDUSTRIAL CHEMIST
Basically, three different compositions of committee members can be formed
in terms of sex: 3 women and 2 men, 4 women and 1 man, and all are women.
Hence the number of possible outcomes favoring event C using the principle
of combination together with the addition principle is
 5  7   5  7   5  7 
   +    +    = 350 + 175 + 21 = 546 .
 2  3   1  4   0  5 
 5  7   5  7   5  7 
   +    +   
Therefore, P(C ) =          =
2 3 1 4 0 5 546
= 0.69
12  792
 
5

Definition 5.7: Relative Frequency Definition of probability


If an experiment is repeated a large number, n, of times and the event A is observed
nAtimes, the probability of A is P(A) ≈ nA/n.

The above definition of probability is based on empirical data accumulated through time
or based on observations made from repeated experiments for a large number of times.
5.5 Some probability rules
Theorem 5.8: If A and B , thenP(A u B) = P(A) + P(B) − P(A n B).

Example 5.17: Consider the experiment of tossing a fair die. Let


A = Even number occurring = {2,4,6}
B = A number greater than 2 occurring ={3, 4, 5, 6}
C = Odd number occurring ={1, 3, 5}
i) What is the probability that A and B will occur?
ii) What is the probability that A or B will occur?
Solution: We use the concept of set theory to help us solve probability questions very
easily and vein diagrams are useful tools to depict the relations between events within
the sample space. The shaded region on Fig 1.shows the event that both A and B will
occur.
i) A and B ≡ AnB ={4,6}
Thus P(AnB)=2/6.
ii) A or B ≡ AUB ={2,3,4,5,6}
AnB={4,6} Hence,
3 4 2 5
P( AUB) = P( A) + P( B) − P( AnB) = + − = .
6 6 6 6
Example 5.18: Sixty percent of the families in a certain community own their own car,
thirty percent own their own home, and twenty percent own both their own car and
their own home. If a family is randomly chosen,
a) what is the probability that this family do not have a car?

10
STATISTICS FOR INDUSTRIAL CHEMIST
b) what is the probability that this family owns a car or a house?
c) what is the probability that this family owns a car or a house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car nor a house?
Solution: Let A represents that the family owns a car and B represents that the family
owns a house. Given information: P(A)=0.6,P(B)=0.3, and P(AnB)=0.2.
a) Required: P(Ac) = ?
P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ?
P(AUB) = P(A)+P(B)-P(AnB) = 0.6+0.3-0.2 = 0.7
c) Required: P((AnBc)U(AcnB)) = ?
P((AnBc)U(AcnB)) = P(AnBc)+P(AcnB) = [P(A)-P(AnB)]+[P(B)-P(AnB)]
= [0.6-0.2]+[0.3-0.2]=0.5
d) Required: P(AcnB) =?
P (AcnB) = P(B)-P(AnB) = 0.3-0.2 = 0.1
e) Required: P(AcnBc) = ?
P (AcnBc) = P((AUB)c) = 1-P(AUB) = 1-0.7 = 0.3
We can represent various events by an informative diagram called vein diagram. If
properly and correctly drawn, a vein diagram helps to calculate probabilities of events
easily. The figure below shows various events represented by shaded regions. Note that
the rectangle in each figure represents the sample space.

5.6 Conditional probability and independence


Conditional Probability
Conditional probability provides us with a way to reason about the outcome of an
experiment, based on partial information. Here are some examples of situations we may
have in our mind:
(a) What is the probability that a person will be HIV-Positive given he has tuberculosis?
(d) A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?

In more precise terms, given an experiment, a corresponding sample space, and a


probability law, supposes that we know that the outcome is within some given event B.
11
STATISTICS FOR INDUSTRIAL CHEMIST
We wish to quantify the likelihood that the outcome also belongs to some other given
event A. We thus seek to construct a new probability law, which takes into account this
knowledge and which, for any event A, gives us the conditional probability of A given B,
denoted by P(A|B).

Definition 5.8:If P(B) > 0, the conditional probability of A given B, denoted by


P( AnB)
P(A|B), is P( A / B ) = .
P( B)
Example 5.19: Suppose cards numbered one through ten are placed in a hat, mixed up,
and then one of the cards is drawn at random. If we are told that the number on the
drawn card is at least five, then what is the conditional probability that it is ten?
Solution :Let A denote the event that the number on the drawn card is ten, and Bbe the
event that it is at least five. The desired probability is P(A|B).
P( AnB) P({10}n{5,6,7,8,9,10}) P({10}) 1 / 10 1
P( A / B) = = = = =
P( B) P({5,6,7,8,9,10}) P({5,6,7,8,9,10}) 6 / 10 6
Example 5.20: A family has two children. What is the conditional probability that both
are boys given that at least one of them is a boy? Assume that the sample space S is
given by S = {(b, b), (b, g), (g, b), (g, g)}, and all outcomes are equally likely. (b, g)
means, for instance, that the older child is a boy and the younger child is a girl.
Solution:Letting A denote the event that both children are boys, and B the event that at
least one of them is a boy, then the desired probability is given by
P( AnB) 1 / 4 1
P( A / B) = = =
P( B) 3/ 4 3
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B)
and we are asked to find P(AnB). An example illustrates the use of this formula. Suppose
that 5 good fuses and two defective ones have been mixed up. To find the defective
fuses, we test them one-by-one, at random and without replacement. What is the
probability that we are lucky and find both of the defective fuses in the first two tests?
Example 5.21: Suppose an urn contains seven black balls and five white balls. We draw
two balls from the urn without replacement. Assuming that each ball in the urn is
equally likely to be drawn, what is the probability that both drawn balls are black?
Solution:Let A and B denote, respectively, the events that the first and second balls
drawn are black. Now, given that the first ball selected is black, there are six remaining
black balls and five white balls, and so P (B|A) = 6/11. As P(A) is clearly 7/12 , our
desired probability is
7 6 7
P( AnB) = P( A) P( B / A) = . =
12 11 22

12
STATISTICS FOR INDUSTRIAL CHEMIST
Independence
We have introduced the conditional probability P (A|B) to capture the partial
information that event B provides about event A. An interesting and important special
case arises when the occurrence of B provides no information and does not alter the
probability that A has occurred, i.e., P(A|B) = P(A).When the above equality holds, we
say that A is independent of B. Note that by the definition P(A|B) = P(A ∩ B)/P(B), this is
equivalent to P(A ∩ B) = P(A)P(B).

Definition 5.9: Independence


Two events A and B are said to independent if P (A ∩ B) = P (A)P(B). If in addition,
P (B) >0, independence is equivalent to the condition P(A|B) = P(A).

13
STATISTICS FOR INDUSTRIAL CHEMIST

UNIT SIX
PROBABILITY DISTRIBUTIONS
Objectives:
Having studied this unit, you should be able to
✓ compute probabilities of events using the concept of probability distributions.
✓ compute expected values and variances of random variables.
✓ apply the concepts of probability distributions to real-life problems.
Introduction
In many applications, the outcomes of probabilistic experiments are numbers or have
some numbers associated with them, which we can use to obtain important information,
beyond what we have seen so far. We can, for instance, describe in various ways how
large or small these numbers are likely to be and compute likely averages and measures
of spread. For example, in 3 tosses of a coin, the number of heads obtained can range
from 0 to 3, and there is one of these numbers associated with each possible outcome.
Informally, the quantity “number of heads” is called a random variable, and the numbers
0 to 3 its possible values. The value of a random variable is determined by the outcome
of the experiment. Thus, we may assign probabilities to the possible values of the
random variable.

6.1 Definition of random variables and probability distributions


Given an experiment and the corresponding set of possible outcomes (the sample space),
a random variable associates a particular number with each outcome. Mathematically, a
random variable is a real-valued function of the experimental outcome. The following
are some examples of random variables:
(a) In an experiment involving a sequence of 5 tosses of a coin, the number of heads in
the sequence is a random variable.
(b) In an experiment involving two rolls of a die, the following are examples of random
variables: (1) The sum of the two rolls, (2) The number of sixes in the two rolls.
(c) In an experiment involving the transmission of a message, the time needed to
transmit the message, the number of symbols received in error, and the delay with
which the message is received are all random variables.
Notation: We will use capital letters to denote random variables, and lower case
characters to denote real numbers such as the numerical values of a random variable.
Types of random variables: Generally, two types of random variables exist: discrete
and continuous. A random variable is called discreteif its range (the set of values that it
can take) is finite or at most countably infinite. For instance, the number of children in a
family, number of car accidents within given period of time in a certain locality, the
number of bacteria in a cubic mm of agar, etc. If random variable assumes any
numerical value in an interval or collection of intervals, then it is called a continuous

14
STATISTICS FOR INDUSTRIAL CHEMIST
random variable. Examples include body weight of new born baby, life time of a human
being, height of a person, etc.
The most important way to characterize a random variable is through the probabilities
of the values that it can take. For a discrete random variable X, these are captured by the
probability mass function (p.m.f. for short) of X, denoted PX(x). For a continuous
random variable X it is done by the probability density function (p.d.f.), denoted f X(x).

Definition 6.1: Probability mass function


If x is any possible value of X, the probability mass of x, denoted PX(x), is the
probability of the event {X = x} consisting of all outcomes that give rise to a value
of X equal to x. A probability mass function must satisfy the following conditions:
i. PX(x)≥0 for any value of x of X.
ii.  PX (x) = 1 where the summation is over all values of x .

Example 6.1: Consider an experiment of tossing two fair coins. Letting X denote the
number of heads appearing on the top face, then X is a random variable taking on one of
the values 0, 1, 2 . The random variable X assigns a 0 value for the outcome (T,T), 1 for
outcomes (T ,H) and (H, T ), and 2 for the outcome (H,H). Thus, we can calculate the
probability that X can take specific value/s as follows:
P(X = 0) = P({(T , T )}) = ¼
P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
P(X = 2) = P({(H,H)}) = ¼
The table below shows the probability mass function X.
X 0 1 2
PX(x) ¼ 2/4 ¼
We can justify that PX(x) is probability mass function.
PX(x)≥0 for x=0,1,2 and
P(X = 0) + P(X = 1)+P(X = 2) = ¼ + 2/4 + ¼=1
Suppose we are interested to calculate the probability that X≥1. The values of X which
are greater than or equal to 1 are 1 and 2. Thus, the probability that X is greater than or
equal to 1, denoted P(X≥1), is found as P(X≥1) = P(X = 1) + P(X = 2)=3/4.

Definition 6.2: Continuous random variable


A random variable X is called continuous if there exists a function fX(x) called the
probability density function of X which satisfies
a. fX(x)≥0 for all x.

b. f
−
X ( x)dx = 1

We can use the probability density function to calculate probabilities of events expressed
in terms of the random variable X. For instance, if we are interested in the probability
15
STATISTICS FOR INDUSTRIAL CHEMIST
that X lies between two points, say a and b, we can find it using integration of fX(x) on
the interval [a,b],i.e.
b
P(a  X  b) =  f X ( x)dx
a

Figure: P(a≤ X ≤ b) is the shaded region


Remarks:
i) The area bounded under the graph of a probability density function and below by the
horizontal axis is 1.
ii) The probability that a continuous random variable X will assume a specific value is
c
zero, i.e. P( X = c) =  f X ( x)dx = 0 where c is a constant.
c

iii) The probability that a continuous random variable X will assume a value in a
closed intervals is the same as the probability that it will assume in open interval
or half open intervals, i.e. , P(a≤X≤b) = P(a<X<b) = P(a≤X<b) = P(a<X≤b), P(X≤c)
= P(X<c) , P(X≥c) = P(X>c) where a, b, and c are constants.

6.2 Introduction to expectation: mean and variance


We can associate with each random variable certain “averages” of interest, such as mean
and variance which give useful summary of a probability distribution.
Mean
Definition 6.3: The (mean) expected value of a random variable X denoted by E(X)
or μ is given by
i) E ( X ) =  xPX ( x) if X is discrete r.v.

ii)  xf X ( x)dx if X is continuous r.v.
−

It is useful to view the mean of X as a “representative” value of X, which lies somewhere


in the middle of its range. We can make this statement more precise, by viewing the
mean as the center of gravity of the distribution.

Variance
Definition 6.4: The variance of a random variable X denoted V(X) or σ2 is defined
as V(X)=E[(X- μ)2] = E(X2) – μ2.
i) if X is discrete, V ( X ) = [ x 2 PX ( x)] −  2
16
STATISTICS FOR INDUSTRIAL CHEMIST

ii) if X is continuous, V ( X ) = [  x 2 f X ( x)dx] −  2
−

The variance provides a measure of dispersion of X around its mean. Another measure of
dispersion is the standard deviation of X, which is defined as the square root of the
variance and is denoted by σ.
Example 6.2: Calculate the mean and variance of the random variable X in example 7.1.
1 1 1
E ( X ) =  xPX ( x) = 0  + 1 + 2  = 1
4 2 4
1 1 1
E ( X 2 ) =  x 2 PX ( x) = 02  + 12  + 22  = 1.5
4 2 4
V ( X ) = E ( X ) −  = 1.5 − 1 = 0.5
2 2 2

6.3 Common discrete probability distributions – binomial and Poisson


The Binomial distribution
Many real problems (experiments) have two possible outcomes, for instance, a person
may be HIV-Positive or HIV-Negative, a seed may germinate or not, the sex of a new
born bay may be a girl or a boy, etc. Technically, the two outcomes are called Success
and Failure. Experiments or trials whose outcomes can be classified as either a
“success” or as a “failure” are called Bernoulli trails.
Suppose that n independent trials, each of which results in a “success” with probability p
and in a “failure” with probability 1 − p, are to be performed. If X represents the number
of successes that occur in the n trials, then X is said to have binomial distribution with
parameters n and p. The probability mass function of a binomial distribution with
parameters n and p is given by
n
PX ( x) =   p x (1 − p) n − x , x = 0, 1, 2, ..., n
 x
The mean and variance of the binomial distribution are np and np(1-p), respectively.
Note that the binomial distributions are used to model situations where there are just
two possible outcomes, success and failure. The following conditions also have to be
satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent

Example 6.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out
of the four trials. Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear

17
STATISTICS FOR INDUSTRIAL CHEMIST
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each
other. In addition the probability that a head will appear in each trial is the same. Thus,
X has a binomial distribution with number of trials 4 and probability of success (the
occurrence of head in a trial) is ½. The probability mass function of X is given by
n n
PX ( x) =  0.5 x (1 − 0.5) n − x =  0.5 n , x = 0, 1, 2, 3,4 , Note that n = 4 and p = 1/2
 x  x
 4
i) P( X = 2) =  0.5 2 (1 − 0.5) 4−2 = 0.3750
 2
 4
ii) P( X = 0) =  0.5 0 (1 − 0.5) 4−0 = 0.0625
 0
iii) P( X  2) = P( X = 2) + P( X = 3) + P( X = 4) = 0.3750 + 0.2500 + 0.0625 = 0.6875
iv) P( X  2) = P( X = 0) + P( X = 1) = 0.0625 + 0.2500 = 0.3125
v) P( X  2) = P( X = 0) + P( X = 1) + P( X = 2) = 0.0625 + 0.2500 + 0.3750 = 0.6875
Example 6.4:Suppose that a particular trait of a person (such as eye color or left
handedness) is classified on the basis of one pair of genes and suppose that d represents
a dominant gene and r a recessive gene. Thus a person with ddgenes is pure dominance,
one with rris pure recessive, and one with rdis hybrid. The pure dominance and the
hybrid are alike in appearance. Children receive one gene from each parent. If, with
respect to a particular trait, two hybrid parents have a total of four children, what is the
probability that exactly three of the four children have the outward appearance of the
dominant gene?
Solution:If we assume that each child is equally likely to inherit either of two genes
from each parent, the probabilities that the child of two hybrid parents will have dd, rr,
or rdpairs of genes are, respectively, ¼, ¼,½. Hence, because an offspring will have the
outward appearance of the dominant gene if its gene pair is either ddor rd, it follows
that the number of such children ,say X, is binomially distributed with parameters n
equals 4 and p equals ¾. Thus the desired probability is
 4
P( X = 3) =  0.753 (1 − 0.75) 4−3 = 0.421875.
 3
Example 6.5: Suppose it is known that the probability of recovery for a certain disease
is 0.4. If random sample of 10 people who are stricken with the disease are selected,
what is the probability that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?
Solution: Let X be the number of persons will recover from the disease. We can assume
that the selection process will not affect the probability of success (0.4) for each trial by
assuming a large diseased population size. Hence, X will have a binomial distribution

18
STATISTICS FOR INDUSTRIAL CHEMIST
with number of trials equal to 10 and probability of success equal 0.4.
10 
P( X = k ) =  0.4 k 0.610− k , k = 0,1,2,...10
k
10 
(a) P( X = 5) =  0.4 5 0.610−5 = 0.200658
5
10 
(b) P( X  9) = 1 − P( X = 10) = 1 −  0.410 0.610−10 = 1 − 0.000105 = 0.9999
10 
The Poisson Random Variable
A random variable X, taking on one of the values 0, 1, 2, . . . , is said to have a Poisson
distribution if its probability mass function is given by
e − x
PX ( x) = , x = 0, 1, 2, 3, ... and   0 .
x!
λ is the parameter of this distribution. The mean and variance of the Poisson
distribution are equal and their values are equal to λ. Note that poison distributions is
used to model situations where the random variable X is the number of occurrences of a
particular event over a given period of time (or space). Together with this , the following
conditions must also be fulfilled: events are independent of each other, events occur
singly, and events occur at a constant rate (in other words for a given time interval the
mean number of occurrences is proportional to the length of the interval).
The poisson distribution is used as a distribution of rare events such as telephone calls
made to a switch board in a given minute, number of misprints per page in a book, road
accidents on a particular motor way in one day, etc. The processes that give rise to such
events are called poisson processes.
Example 6.6:Suppose that the number of typographical errors on a single page of this
lecture note has a Poisson distribution with parameter λ = 1. if we randomly select a
page in this lecture note, calculate the probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
e −  k
P( X = k ) = ,  = 1, k = 0,1,2,...
k!
e −110 1
a) Required P(X≥1)=? P( X = 0) = = = 0.367879
0! e
−1 3
e 1
b) P( X = 3) = = 0.061313
3!
c) P( X  2) = P( X = 0) + P( X = 1) = 0.73576
D) P( X  1) = 1 − P( X = 0) = 1 − 0.367879 = 0.632121

19
STATISTICS FOR INDUSTRIAL CHEMIST
Example 6.7:If the number of accidents occurring on a highway each day is a Poisson
random variable with parameter λ = 3, what is the probability that no accidents will
occur on a randomly selected day in the future?
Solution: Let X= number of accidents per day
e −3 3 k
P( X = k ) = , k = 0,1,2,...
k!
e −3 30
Required P(X= 0) = ? P( X = 0) = = e −3 = 0.05
0!
Note: The Poisson random variable has a wide range of applications in a diverse number
of areas. An important property of the Poisson random variable is that it may be used to
approximate a binomial random variable when the binomial parameter n is large and p
is small. The probability that X will be k can be approximated by substituting λ by np in
e −  k
the poisson distribution, i.e. P( X = k ) = ,  = np .
k!
6.4 Common continuous probability distributions
Normal distribution
The normal distribution plays an important role in statistical inference because many
real-life distributions are approximately normal; many other distributions can be almost
normalized by appropriate data transformations (e.g., taking the log) and as a sample
size increases, the means of samples drawn from a population of any distribution will
approach the normal distribution.

A continuous random variable X is said to follow normal distribution , if and only if , its
1 x− 2
1 − ( )
probability density function (p.d.f.) is f X ( x) = e where x  (-∞,∞ ), μ 
2 

2 
(-∞,∞ ) and σ  (0,∞ ). There are infinitely many normal distributions since different
values of μ and σ define different normal distributions. For instance, when μ= 0 and σ
1
1 − 2 z2
=1 , the above density will have the following form f Z ( z ) = e . This particular
2
distribution is called the standard normal distribution and sometimes known as Z-
distribution.. The random variable corresponding to this distribution is usually denoted
by Z. If X has a normal distribution with mean μ and variance σ2, we denote it as
X ~ N ( ,  2 ) .
Properties of normal distribution
i) The normal distribution curve is a bell shaped, symmetrical about μ and
mesokurtic. The p.d.f. attains its maximum value at x= μ.
ii) Since for x= μ divides the area under the normal curve into two equal parts, μ is
the mean, the median and the mode of the distribution.
iii) The mean and variance of the normal distribution are μ, and σ2, respectively.

20
STATISTICS FOR INDUSTRIAL CHEMIST
iv) The total area under the curve and bounded from below by the horizontal axis is 1,

i.e. f
−
X ( x)dx = 1

Figure: The shaded area under the normal curve is one


Since a normal distribution is a continuous probability distribution, the probability that
X lies between a and b is the area bounded under the curve, from left to right by the
vertical lines x = a and x = b and below by the horizontal axis.

Figure: P(a<X<b) equals the shaded region


b
However, evaluating P(a  X  b) =  f X ( x)dx is very complicated. To facilitate this
a

problem, we use the standard normal table which gives area values bounded by two
points. Areas under the standard normal distribution curve are tabulated in various
ways. The most common tables give areas bounded between Z=0 and a positive value of
Z. In addition to the standard normal table, the properties of normal distribution and
the following theorem are useful to make probability calculations very easy for any
normal distribution.

Theorem 6.1: Standardization of a normal random variable


If X has a normal distribution with mean, μ and standard deviation ,σ , then
X −
i) Z = will have a standard normal distribution.

21
STATISTICS FOR INDUSTRIAL CHEMIST
a− X − b−
P ( a  X  b) = P (   )
ii)   
a− b−
= P( Z )
 

Example 6.8: Let Z be the standard normal random variable. Calculate the following
probabilities using the standard normal distribution table: a) P(0<Z<1.2) b)
P(0<Z<1.43) c) P(Z≤0) d) P(-1.2<Z<0) e) P(Z≤-1.43)
f)P(-1.43≤Z<1.2) g) P(Z≥1.52) h)P(Z≥-1.52)
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the
standard normal table as follows: look for the value 1.2 from z column ( first
column) and then move horizontally until you find the value of 0.00 in the first
row. The point of intersection made by the horizontal and vertical movements will
give the desired area (probability). Hence P(0<Z<1.2)= 0.3849. Refer the table
below as a guide to find this probability.

Figure: P(0<Z<1.2) is the shaded area


b) In a similar way P(0<Z<1.43)= 0.4236.
c) We know that the normal distribution is symmetric about its mean. Hence the area
to the left of 0 and the to the right of zero are 0.5 each. Therefore
P(Z≤0)=P(Z≥0)=0.5

22
STATISTICS FOR INDUSTRIAL CHEMIST

Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764

Figure: P(Z<-1.43) is the shaded region


f) P(-1.43≤Z<1.2) = P(-1.43≤Z<0) + P(0≤Z<1.2)=P(0<Z≤1.43) + 0.3849= 0.4236 +
0.3849 =0.8085

Figure: P(-1.43≤Z<1.2) is the shaded region


g) P(Z≥1.52) = 0.5 – P(0≤ Z<1.52)=0.5 – 0.4357=0.0643
23
STATISTICS FOR INDUSTRIAL CHEMIST

Figure: P(Z≥1.52) is the shaded region


h) P(Z≥-1.52) = P(-1.52≤Z<0) + P(Z ≥0 )= P(0 < Z≤1.52) + 0.5
=0.4357 +0.5=0.9357
Example 7.11: Find the following values of z* of a standard normal random variable
based on the given probability values:
a) P(Z > z*) =0.1446
b) P(Z>z*) = 0.8554
Solution: We need to find specific values of Z given some probability values.
a) If the probability that Z>z* is 0.1446 implies that z* is to the right of zero because
P(Z>0) = 0.5 is greater than P(Z>z*).

P(Z > z*) = 0.1446 implies that P(0<Z≤z*) = 0.5 -0.1446=0.3554.


Hence we can look for the value of z* satisfying the above condition form the standard
normal table. Thus z* =1.06
b) If the probability that Z>z* is 0.8554 implies that z* is to the left of zero because
P(Z>0) = 0.5 is less than P(Z>z*). It implies that z* is a negative number.

24
STATISTICS FOR INDUSTRIAL CHEMIST

P(Z>z*) = 0.8554 = P(z*≤ Z <0) + P( Z ≥ 0) = P(0 ≤ Z ≤ - z*) + 0.5


Implies P(0 ≤ Z ≤ - z*) = 0.8554 – 0.5=0.3554. Hence the value –z* form the table
satisfying the above condition is 1.06. Therefore z* = -1.06.
Example 6.9: If the total cholesterol values for a certain target population are
approximately normally distributed with a mean of 200 (mg/100 ml) and a standard
deviation of 20 (mg/100 ml), calculate the probability that a person picked at random
from this population will have a cholesterol value
a) greater than 240 (mg/100 ml)
b) between 180 and 220(mg/100 ml)
c) less 200 (mg/100 ml)
Solution: Let X be the cholesterol values in mg/100 ml, then X ~ N (200, 400)
X − b−
P( X  240) = P(  )
a)  
240 − 200
= P( Z  ) = P( Z  2) = 0.5 − P(0  Z  2) = 0.5 − 0.4772 = 0.0228
20
X − X − b−
P(180  X  220) = P(   )
  
180 − 200 220 − 200
b) = P( Z ) = P(−1  Z  1)
20 20
= 2 P(0  Z  1) = 2  0.3413 = 0.6826
200 − 200
c) P( X  200) = P( Z  ) = P( Z  0) = 0.5
20
Example 6.10: Assume that the test scores for a large class are normally distributed
with a mean of 74 and a standard deviation of 10.
(a) Suppose that you receive a score of 88. What percent of the class received scores
higher than yours?
(b) Suppose that the teacher wants to limit the number of A grades in the class to no
more than 20%. What would be the lowest score for an A?
Solution: Let X be the score of a randomly picked student, then X ~ N (74, 100)

25
STATISTICS FOR INDUSTRIAL CHEMIST
X − 74 88 − 74
P( X  88) = P(  ) = P( Z  1.4)
a) 10 10
= 0.5 − P(0  Z  1.4) = 0.5 − 0.4192 = 0.0808
Hence 8.08 percent of the students score more than you did?
b) Let XA be the lowest mark to get letter grade A. We are given that
X − 74 x A − 74
P( X  x A ) = 0.2 = P(  ) = P( Z  z A )
10 10
x − 74
 P(0  Z  z A ) = 0.5 − 0.2 = 0.3  z A = 0.85  z A = 0.85 = A
10
Hence, the lowest mark to get letter grade A is 82.5.
The chi-square and t distributions
The chi-square and t distributions are important continuous distributions which are
useful in statistical inference. In this section we will see a brief introduction of these
distributions. In later chapters, we are going to see in detail on how to use these
distributions in estimation and hypotheses testing.
Chi-square distribution
A random variable X is said to have a chi-square distribution with n degrees of freedom
(denoted by  n2 ) if its probability density function is given by
n −x
1 −1
f X ( x) =
n
x e , x  0.
2 2

2 2 ( n )
2
The chi-square distribution has one parameter called the degrees of freedom, n.
Depending on the values of n, we can have many different chi-square distributions. The
mean and the variance of chi-square distribution are n, and 2n, respectively.

Figure: The chi-square distribution

Because of its importance, the chi-square distribution is tabulated for various values of
the parameter n (refer table). Thus we may find in the table that value, denoted by  2 (n)
, satisfying p( X   2 (n)) =  , 0    1. The example below helps on how to read chi-
square distribution values.
Example 6.11:To read the chi-square value with 2 degrees of freedom where the area to
the right of this value is 0.005.Look the degrees of freedom, 2, in the first column (df
column) and then move horizontally until you find the value of α , 0.005 in the first row.
The point of intersection made by the horizontal and vertical movement will give the
desired chi-square value, 10.597. This value satisfies the following:

26
STATISTICS FOR INDUSTRIAL CHEMIST
P( X  10.597) = 0.005 . In a similar way,The chi-square value with 100 degrees of
freedom where the area to the right of this value is 0.975 is 74.222.

The t distribution
The t distribution is an important distribution useful in inference concerning population
mean/means. This distribution has one parameter called the degrees of freedom.
Depending on the values of the degrees of freedom, we may have different t
distributions. The degrees of freedom is usually denoted by n. In inference on the
population mean, the degrees of freedom is related to sample size. As the sample size
or degrees of freedom increases, the t distribution approaches the standard normal
distribution.
The t- distribution shares some characteristics of the normal distribution and differs
from it in others. The t distribution is similar to the standard normal distribution in the
following ways.
i) it is bell-shaped
ii) it is symmetrical about the mean
iii) the mean, median, and mode are equal to 0 and are located at the center of the
distribution.
iv) The curve never touches the x-axis
The t distribution differs from the standard normal distribution in the following ways.
i) the variance is greater than 1.
ii) The t distribution is actually a family of curves based on the concept of degrees of
freedom.

Figure: The t distribution


Due to its importance in inference values of t distribution is tabulated for some values of
n (refer table). Thus we may find in the table that value, denoted by t (n) , satisfying

27
STATISTICS FOR INDUSTRIAL CHEMIST
p(t (n)  t (n)) =  , 0    1 and t(n) represents the t random variable with n degrees of
freedom. The following example will help you to read t distribution values.
Example 6.12:To find the t value with 3 degrees of freedom where the area to the right
of this value is 0.05.Look the degrees of freedom, 3, in the first column (df column) and
then move horizontally until you find the value of α , 0.05 in the first row. The point of
intersection made by the horizontal and vertical movement will give the desired t value
2.353. This value satisfies the following: P(t  2.353) = 0.05

28
STATISTICS FOR INDUSTRIAL CHEMIST

UNIT SEVEN
SAMPLING AND SAMPLING DISTRIBUTION
OF SAMPLE MEAN

Objectives:
After a successful completion of this unit, students will be able to:
✓ Differentiate the two major sampling techniques: probabilistic and non-
probabilistic
✓ Apply simple random sampling technique to select sample
✓ Define sampling distribution of the sample mean

Introduction to sampling and sampling distribution


In our daily life we are forced to make decision based on small scale study. For instance,
a laboratory technician take small droplets of blood to examine the presence disease; we
examine fruits before we purchase it; zoologists use the concept of sampling to estimate
the population of rodents, e t c. This process of inspection is very wide and is commonly
used on various occasions. But this job is difficult to implement on large scale. On the
basis of small study, we make inference about the entire population.

7.1 Methods of sampling


Definition of some basic terms
Sampling:is the technique of selecting representative sample from the whole.
Population: is the totality of elements or units under study.
Sample: is the part of the population.
Sampling Frame: A complete list of all the units of the population is called the sampling
frame. A unit of population is a relative term. If all the workers in a factory make a
population, then a worker is a unit of the population. If all the factories in a country are
being studied for some purpose, then a factory is a unit of the population of factories.
The frame provides a base for the selection of a sample.
Major reasons to use sampling
1. Saves Time and Cost: As the size of the sample is small as compared to the
population, the time and cost involved on sample study are much less than the
complete counts. Hence a sample study requires less time and cost.
2. To prevent destruction: The destructive nature of some experiments (or inspection)
do not allow to carryout complete enumeration, for instance, to check quality of
beers, to study the efficacy of new drugs, testing the life length of a bulb, e t c.
3. Sample survey provides higher level of accuracy: This accuracy can be achieved
29
STATISTICS FOR INDUSTRIAL CHEMIST
through more selective recruiting of interviewers and supervisors, more extensive
training programs, a closer supervision of the personnel involved and a more
efficientmonitoring of the field work.
Types of sampling
Generally, two types of sampling methods exist: probability and non-probability
sampling.
Probability Sampling
The term probability sampling (or random sampling) is used when the selection of the
sample is purely based on chance. There is no subjective bias in the selection of units.
Every unit of the population has a known nonzero probability to be in the sample. The
following are some of the t random sampling methods: Simple random sampling,
Stratified random sampling, Cluster sampling, Systematic random sampling.

Simple random sampling


Simple random sampling is a method of selecting a sample from a population in such a
way that every unit of the population is given an equal chance of being selected. In
practice, you can draw a simple random sample of elements using either the 'lottery
method' or 'tables of random numbers'.

For example, you may use the lottery method to draw a random sample by using a set of
'N' tickets, with numbers ' 1 to N' if there are 'N' units in the population. After shuffling
the tickets thoroughly, the sample of a required size, say n, is selected by picking the
required n number of tickets.
The best method of drawing a simple random sample is to use a table of random
numbers. After assigning consecutive numbers to the units of population, the researcher
starts at any point on the table of random numbers and reads the consecutive numbers
in any direction horizontally, vertically or diagonally. If the read out numbers
corresponds with the one written on a unit card, then that unit is chosen for the sample.

Suppose that a sample of 6 study centers is to be selected at random from a serially


numbered population of 60 study centers. The following table is portion of a random
numbers table used to select a sample.

Row> 1 2 3 4 5 …… N
Column∀
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850

30
STATISTICS FOR INDUSTRIAL CHEMIST
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042
If you start in the first row and first column, centers numbered 23, 05, 14,…, will be
selected. However, centers numbered above the population size (60) will not be included
in the sample. In addition, if any number is repeated in the table, it may be substituted
by the next number from the same column. Besides, you can start at any point in the
table. If you chose column 4 and row 1, the number to start with is 83. In this way you
can select first 6 numbers from this column starting with 83.
The sample, then, is as follows:
83 75
53 33
40 01
05 26
Hence, the study centers numbered 53, 40, 05, 33, 01 and 26 will be in the sample.
Simple random sampling ensures the best results. However, from a practical point of
view, a list of all the units of a population is not possible to obtain. Even if it is possible,
it may involve a very high cost which a researcher or an organization may not be able to
afford. In addition, it may result an unrepresentative sample by chance.
Stratified sampling
Stratified random sampling takes into account the stratification of the main population
into a number of sub-populations, each of which is homogeneous with respect to one or
more characteristic(s). Having ensured this stratification, it provides for selecting
randomly the required number of units from each sub-population. The selection of a
sample from each subpopulation may be done using simple random sampling. It is useful
in providing more accurate results than simple random sampling.
Systematic sampling
In this method, samples are selected at equal intervals from the listings of the elements.
This method provides a sample as good as a simple random sample and is comparatively
easier to draw a sample. For instance, to study the average monthly expenditure of
households in a city, you may randomly select every fourth households from the
household listings
Cluster sampling

31
STATISTICS FOR INDUSTRIAL CHEMIST
Cluster sampling is used when sampling frame is difficult to construct or using other
sampling techniques (simple random sampling) is not feasible or costly. For instance,
when the geographic distribution of units is scattered it is difficult to apply simple
random sampling. It involves division of the population of elementary units into groups
or clusters that serve as primary sampling units. A selection of the clusters is then made
to form the sample. The precision of estimates made based on samples taken using this
method is relatively low.
Non-probabilily sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined
by personal judgment. This method is cost effective; however, we cannot make objective
statistical inferences. Depending on the technique used, non-probability samples are
classified into quota, judgment or purposive and convenience samples.

Sampling and non-sampling errors


Sampling error is the difference between the value of a sample statistic and the value of
the corresponding population parameter. On the other hand, non-sampling error is an
error that occurs in the collection, recording and tabulation of data. Sampling error can
be minimized by using appropriate sampling methods and/or increasing the sample size.
The non-sampling error is likely to increase with increase in sample size.

7.2 Sampling distribution of the sample mean 𝒙 ̅


The value of the sample mean for any sample will depend on the elements included in
that sample. Consequently, the sample mean is a random variable. Therefore, like other
random variable, the sample means possess a probability distribution which is more
commonly called the sampling distribution of sample mean. In general, the probability
distribution of a sample statistic is called its sampling distribution. Sampling
distribution is important in statistical inference. The important characteristics of the
sampling distribution of the sample mean are its mean, variance and the form of the
distribution.
Example 7.1: Suppose we have a hypothetical population of size 3, consisting of three
children namely: A is 3 years old, B is 6 years old and C is 9 years old. Construct
sampling distribution of the sample mean of size 2 using sampling without replacement
and with replacement.
Solution: The mean and variance of the population are 6 and 6, respectively.
1. If sampling is without replacement we will have 3C2 = 3 possible samples: (A, B),
(A, C) and (B, C) and their corresponding sample means are (3+6)/2 = 4.5, 6 and
7.5, respectively. Hence the probability distribution (sampling distribution) of the
sample mean is:
x 4.5 6 7.5
P( X = x ) 1/3 1/3 1/3
E ( X ) =  x P( x ) = 4.5(1/3) + 6(1/3) + 7.5(1/3) = 6

32
STATISTICS FOR INDUSTRIAL CHEMIST

V ( X ) = ( x 2 P ( x )) −  x = (6.75 + 12 + 18.75) – 36 = 1.5


2

2. If sampling is with replacement we will have Nn = 32 = 9 possible samples: (A, A),


(A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B) and (C, C). Hence the probability
distribution (sampling distribution) of the sample mean is:
x 3 4.5 6 7.5 9
P(X = x ) 1/9 2/9 3/9 2/9 1/9
E ( X ) =  x P( x ) = 3(1/9) + 4.5(2/9) + 6(3/9) + 7.5(2/9) + 9(1/9) = 6
V ( X ) = ( x 2 P ( x )) −  x = (1 + 4.5 + 12 + 12.5 + 9) – 36 = 3
2

Note:
✓ The mean of the sampling distribution of the sample mean is the same as the
population mean irrespective of the sampling procedure.
✓ The variance of the sampling distribution of the sample mean is:
 2
 , if sampling is with replacement
 n
 2
  N − n , if sampling is without replacement
 n  N − 1 
✓ The problem with using sample mean to make inferences about the population
mean is that the sample mean will probably differ from the population mean.
This error is measured by the variance of the sampling distribution of the sample
mean and is known as the standard error. The standard error is the average
amount of sampling error found because of taking a sample rather than the whole
population. As sample size increases, the standard error decreases.
7.3 Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ2, then
as n goes to infinity the distribution of the sample mean, X , approximates normal
distribution with mean μ and variance σ2/n. That is, as n gets large, X N (μ, σ2/n) and
X −
its standardized form is Z = ~ N (0,1).
/ n
Note: The central limit theorem is useful for approximating the distribution of the
sample mean based on a large sample size and when the population distribution is non
normal; however, if the population is normal, then the sampling distribution of the
sample mean will be normal regardless of the sample size.
Example 7.2: If the uric acid values in normal adult males are normally distributed with
mean 5.7 mgs and standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and
variance 1.
a) If a sample of size 4 is taken, then X ~ N (5.7, 0.25) since the population is
33
STATISTICS FOR INDUSTRIAL CHEMIST
normally distributed.
5 − 5.7
P( X  5) = P ( Z  ) = P( Z  −1.4)
0.5
= 0.5 − P (0  Z  1.4) = 0.0808
b) If a sample of size 9 is taken, then X ~ N (5.7, 1/9) since the population is
normally distributed.
6 − 5.7
P( X  6) = P( Z  ) = P( Z  0.9)
1
3
= 0.5 − P(0  Z  0.9) = 0.1841

34
STATISTICS FOR INDUSTRIAL CHEMIST

UNIT EIGHT
SIMPLE LINEAR REGRESSION AND
CORRELATION

Objectives:
Having studied this unit, you should be able to:
✓ Formulate a simple linear regression model.
✓ express quantitatively the magnitude and direction of the association between
two variables

Introduction
The statistical methods discussed so far are used to analyze the data involving only one
variable. Often an analysis of data concerning two or more variables is needed to look
for any statistical relationship or association between them. Thus, regression and
correlation analysis are helpful in ascertaining the probable form of the relationship
between variables and the strength of the relationship.

8.1 Simple linear regression analysis


Regression analysis is the statistical method that helps to formulate a functional
relationship between two or more variables. It can be used for assessment of
association, estimation and prediction. For instance one might be interested to
formulate a statistical model to relate the height of fathers and their sons, blood
pressure and age, fertilizer amount and yield, etc.
A simple model to relate dependent (response) variable Y and with only one predictor
variable X is to consider a linear relationship.
The first step in regression analysis involving two variables is to construct a scatter plot
(diagram) of the observed data. Scatter diagram is a plot of all ordered pairs ( X i , Yi ) on
the coordinate plane which is helpful for determining an apparent relationship between
two variables.
The simple linear regression of Y on X can be expressed with respect to the population
parameters  and  as
Y = +  X +
where  = y-intercept that represents the mean value of the dependent variable Y when
the independent variable X is zero;  = slope of the regression line that represents the
change in the mean of Y for a unit change in the value of X ;  = error term

35
STATISTICS FOR INDUSTRIAL CHEMIST
The population parameters  and  can be estimated from sample data using the least
square technique. The estimators of  and  are usually denoted by a and b,
respectively. The resulting regression line is

Y = a+bX
and the equation is known as the fitted regression line. The estimated values of Y are

denoted by Y . The observed values of Y are denoted by y. The difference between the

observed and the estimated values, Y - Y , is known as error or residual, and is denoted
by ˆ . The residual can be positive, negative or zero.
A best fitting line is the one for which the sum of squares of the residuals,  ˆ 2 has the
minimum value. This is called the method of least squares. According to this method,

one would select a and b such that  ˆ 2
=  (Y − Y ) 2
is minimum. The solution of this
minimization problem using partial differentiation is as follows:
 X Y
 XY − n n XY −  X  Y
b= = and a = Y − bX
( X ) 2
n X 2 − ( X ) 2
X − n
2

Example 8.1: A researcher wants to find out if there is any relationship between height
of the son and his father. He took random sample of 6 fathers and their sons. The height
in inch is given in the table below:
Height of father (X) 63 65 64 65 67 68
Height of the son (Y) 66 68 65 67 69 70
i) Draw the scatter diagram and comment on the type of relationship.
ii) Fit the regression line of Y on X.
iii) Predict the height of the son if his father’s height is 66 inch.
Solution:
i)

From the scatter plot one can see that the points are roughly on straight line.
ii)
n = 6  X = 392 ,  Y = 405 ,  X = 25628 ,  XY = 26476 , Y = 27355
2 2

36
STATISTICS FOR INDUSTRIAL CHEMIST
n XY −  X  Y 6(26476) − (392)(405) 405 392
b= = = 0.923 a = Y − bX = − 0.923 = 7.2
n X − ( X )
2 2
6(25628) − (392) 2
6 6
Then the fitted (regression) line of Y on X is given by:

Y = a + b X = 7.2+0.923X
✓ The slope of the line, i.e. b=0.923, tells us that a unit (one inch) increase in
the height of the father results in 0.923 inch increase in the height of the son.
✓ The y-intercept of the line, i.e. a=7.2, is the value of Y when the value of X is
zero(do you think that the intercept is meaningful?)
iii) Y=7.2+0.923(66) =68.118, thus the height of the son is 68.118 inch.

8.2The covariance and the correlation coefficient


Correlation coefficient measures the degree of linear relationship between two variables.
The population correlation coefficient is represented by  and its estimator is r. For a set
of n pairs of sample values X and Y, Pearson’s correlation coefficient is calculated as the
ratio of the covariance of the variables X and Y to the product of the standard deviations
of X and Y. symbolically,
( X − X )(Y − Y )
Cor ( X , Y )  n −1
r= =
Var ( X ) Var (Y ) .  ( X − X )  (Y − Y ) 2
2

n −1 n −1

=
 ( X − X )(Y − Y )
 ( X − X )  (Y − Y )
2 2

Alternatively, the Pearson’s correlation coefficient r can be obtained as:


n XY − ( X )( Y )
r=
n X 2 − ( X ) 2 n Y 2 − ( Y ) 2
Properties of Pearson’s correlation coefficient r,
o It is appropriate to calculate when both variables X and Y are measured on an
interval or ratio scale.
o The value of r is independent of the unit in which X and Y are measured. i.e., it is
a pure number.
o The value of r ranges from +1 to -1.
o r = +1 indicates a perfect linear relationship between X and Y with positive slope.
o r = -1 indicates a perfect linear relationship between X and Y with negative slope.
o r = 0 indicates no linear relationship between the two variables X and Y.
o as r approaches +1 indicates strong and positive linear relationship between the
two variables

37
STATISTICS FOR INDUSTRIAL CHEMIST
o as r approaches -1 indicates strong and negative linear relationship between the
two variables
o as r approaches 0 indicates weak linear relationship between the two variables
Examples of correlation coefficients:

Example 8.2: In some locations, there is strong association between concentrations of


two different pollutants. An article reports the accompanying data on ozone
concentration x (ppm) and secondary carbon concentration y ( g / m 3 ) :
X 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100
Y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8

0.112 0.055 0.154 0.074 0.111 0.140 0.071 0.110


8.0 7.0 20.6 16.6 9.2 17.9 2.8 13

a. Calculate the correlation coefficient and comment on the strength and direction
of the relationship between the two variables.
Solution: The summary quantities are
n = 16,  xi = 1.656,  y i = 170.6,  xi y i = 20.0397,  xi = 0.196912,  y i = 2253.56
2 2

The Person’s correlation coefficient is


n XY − ( X )( Y )
r=
n X 2 − ( X ) 2 n Y 2 − ( Y ) 2
16(20.0397) − (1.656)(170.6)
=
16(0.196912) − (1.656) 2 16(2253.56) − (170.6) 2
320.6352 − 282.5136 38.1216
= =
0.408256 6952.6 (.639)(83.38)
= 0.716
The value of 0.716 indicates that there is somehow strong and positive relationship
between ozone concentration and secondary carbon concentration.

38
STATISTICS FOR INDUSTRIAL CHEMIST

UNIT NINE
ESTIMATION AND HYPOTHESIS TESTING
Objectives:
Having studied this unit, you should be able to
✓ construct and interpret confidence interval estimates
✓ formulate hypothesis about a population mean
✓ determine an appropriate sample size for estimation
Introduction
We now assume that we have collected, organized and summarized a random sample of
data and are trying to use that sample to estimate a population parameter. Statistical
inference is a procedure whereby inferences about a population are made on the basis of
the results obtained from a sample. Statistical inference can be divided in to two main
areas: estimation and hypothesis testing. Estimation is concerned with estimating the
values of specific population parameters; hypothesis testing is concerned with testing
whether the value of a population parameter is equal to some specific value.
9.1 Point and interval estimation of the mean
Point estimate: In point estimation, a single sample statistic (such as x , s or pˆ ) is
calculated from the sample to provide an estimate of the true value of the corresponding
population parameters (such as  ,  or p ). Such a single statistic is termed as point
estimator, and the specific value of the statistic is termed as point estimate. For
example, the sample mean X is an estimator for population mean and X = 10 is an
estimate, which is one of the possible values of X .
Interval estimate: In most practical problems, a point estimate does not provide
information about ‘how close is the estimate’ to the population parameter unless
accompanied by a statement of possible sampling errors involved based on the sampling
distribution of the statistic. Hence, an interval estimate of a population parameter is a
confidence interval with a statement of confidence that the interval contains the
parameter value.
An interval estimate of the population parameter  consists of two bounds within which
the parameter will be contained:
L  U
where L is the lower bound and U is the upper bound.
Case 1: When the population is normal.
✓ If the variance  2 is known, the sampling distribution of the sample mean X is
2  2  X −
normal with mean  and variance . i.e., X ~ N   ,  and Z = ~
n  n  
n
N(0,1).
39
STATISTICS FOR INDUSTRIAL CHEMIST
X −
✓ If the variance  2 is unknown, t = will have t-distribution with
S
n
n - 1 degrees of freedom. Moreover, as the sample size increases t is
approximately the same as standard normal.
Consider the case  2 is known, we can derive a (1 −  )100% confidence interval for the
population mean  .

Let Z  be a point on the standard normal curve that cuts an area of to the right. i.e.
2 2

P( Z  Z  ) = . By the symmetric property of the normal distribution, P( Z  − Z  ) =
2 2 2


(see the diagram below).
2
From the standard normal distribution, we know that
P(− Z   Z  Z  ) = 1 − 
2 2

2.5

1.5 α/2 α/2


1-α
1

0.5 -Zα/2 Z=0 Z=α/2

0
0 2 4 6 8 10

To obtain the limit of the interval estimate, we use the standardized form of X in the
X −
above probability statement. i.e., letting Z =

n
P(− Z   Z  Z  ) = 1 −  Becomes
2 2

X −
 P(− Z    Z ) = 1 − 
2  2
n
 
 P(− Z   X −   Z ) = 1−
2 n 2 n
 
 P(− X − Z   −  − X + Z ) = 1−
2 n 2 n
 
 P( X − Z     X + Z ) = 1−
2 n 2 n
40
STATISTICS FOR INDUSTRIAL CHEMIST
 
We can assert with probability 1 −  that the interval ( X − Z     X + Z )
2 n 2 n
contains the population mean we are estimating.

Thus, (1 −  )100% confidence interval for the population mean  is given by


   
 X − Z  , X + Z 
 2 n 2 n
 
The end points of the interval, X − Z  and X + Z , are called confidence limits
n2 2 n
and the probability 1 −  is called the degree of confidence.
In a similar way a (1 −  )100% confidence interval for the population mean  with
unknown variance  2 is given by
 S S 
 X − t (n − 1) , X + t (n − 1) 
 2 n 2 n

where t is the critical value of t-test statistic providing an area in the right tail of
2 2

the t-distribution with n − 1 degrees of freedom, and S =


(X i − X )2
.
n −1
Case 2: When the population is non normal.
We use the central limit theorem to approximate the distribution of the sample mean
based on large sample ( n  30 ). Large sample size is a necessary condition to use the
X −
normal distribution. And hence, Z = ~ N(0,1). If  is unknown we can replace it

n
by its sample estimate S. The resulting (1 −  )100% confidence interval of  becomes
   
 X − Z 2 , X + Z , when  is known
 n 2 n

 X − Z S , X + Z S  , when  is unknown
 
2 n

2 n

Example 9.1: A drug company is testing a new drug which is supposed to reduce blood
pressure. From the six people who are used as subjects, it is found that the average drop
in blood pressure is 2.28 millimeter of mercury (mmHg) with a standard deviation of
0.95 mmHg. What is the 95% confidence interval for the mean change in blood
pressure? (Assume that the population is normal).
Solution: Given: X = 2.28 , S = 0.95 , n = 6
(1 −  )100% = 95%  1 −  = 0.95   = 0.05   = 0.025
2
✓ X = 2.28 is a point estimate for the population mean drop in blood pressure  .
A 95% confidence interval of population mean for unknown  2 and small sample size is:
41
STATISTICS FOR INDUSTRIAL CHEMIST
 S S 
 X − t (n − 1) , X + t (n − 1)  .
 2 n 2 n
And from the t distribution table, t (n − 1) = t 0.025 (5) = 2.571
2

 0.95 0.95 
 2.28 − (2.571) , 2.28 + (2.571) 
 6 6 
 (2.28-0.997, 2.28+0.997)
 (1.28, 3.27)
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg
and 3.27 mmHg for the sampled population.
Example 9.2: Punctuality of patients in keeping appointment is of interest to a research
team. In a study of patients flow through the office of general practitioners, it was found
that a sample of 35 patients were 17.2 minutes late for appointments, on the average.
Previous research had shown the standard deviation to be about 8 minutes. The
population distribution was felt to be not normal. What is the 90 percent confidence
interval for the true mean amount of time late for appointment?
Solution: Given: X = 17.2 ,  = 8 , n = 35
(1 −  )100% = 90%  1 −  = 0.90   = 0.1   = 0.05
2
Since the sample size is fairly large (n > 30), and since the population standard
deviation is known, according to the central limit theorem, the sampling distribution of
sample mean is approximately normal. Thus, a confidence interval of the population
mean is given by:
   
 X − Z  , X + Z 
 2 n 2 n
And from the standard normal distribution table, Z  = Z 0.05 = 1.65
2

 8 8 
17.2 − (1.65) , 17.2 + (1.65) 
 35 35 
 (17.2 – 2.2, 17.2 + 2.2)
 (15.0, 19.4)
Therefore, the 90% confidence interval for true mean amount of time late for
appointment is between 15.0 and 19.4 minutes.

9.2 Hypothesis Testing about the Mean


In many circumstances we merely wish to know whether a certain proposition is true or
false. The process of hypothesis testing provides a framework for making decisions on
an objective basis, by weighing the relative merits of different hypotheses, rather than
on a subjective basis by simply looking at the numbers. Different people can form
different opinions by looking at data, but a hypothesis test provides a standardized
decision-making process that will be consistent for all people.
42
STATISTICS FOR INDUSTRIAL CHEMIST
Statistical hypothesis: is a claim (belief or assumption) about an unknown population
parameter values.
Examples of hypothesis:
✓ There is association between lung cancer and number of cigarettes an individual
smokes.
✓ The proportion of female students in Hawassa University is 0.35.
✓ In sub-Saharan Africa 40% of individuals are leaving below poverty line.
Hypothesis testing: is the procedure that enables decision-makers to draw inferences
about population characteristics by analyzing the difference between the value of sample
statistic and the corresponding hypothesized parameter value.
General procedure for hypothesis testing
To test the validity of the claim or assumption about the population parameter, sample
is drawn from the population and analyzed. The result of the analysis are used to decide
whether the claim is valid or not.
Step 1: State the null hypothesis ( H 0 ) and alternative hypothesis ( H 1 )
Null hypothesis ( H 0 ): refers to a hypothesized numerical value of the population
parameter which is initially assumed to be true. The null hypothesis is always expressed
in the form of an equation making a claim regarding the specific value of the population
parameter. That is, for example
H 0 :  = 0
where  0 is hypothesized value of the population mean.
Alternative hypothesis ( H 1 ): is the logical opposite of the null hypothesis. The
alternative hypothesis states that specific population parameter value is not equal to the
value stated in the null hypothesis. For example,
H 1 :    0 (Two-sided test)
H1 :    0 or H1 :    0 (One-sided test)
Step 2: State the level of significance  (alpha) for the test
The level of significance is the probability to wrongly reject the null hypothesis H 0 when
it is actually true. It is specified by the statistician or the researcher before the sample is
drawn. The most commonly used values of  are 0.10, 0.50 or 0.01.
Step 3: Calculate the appropriate test statistic
Test statistic is a value computed from a sample that is used to determine whether the
null hypothesis has to be rejected or not. The choice of suitable test statistic depends on
the sampling distribution of the sample statistic. Accordingly, we have the following
cases:
Case 1: When the population is normal.

43
STATISTICS FOR INDUSTRIAL CHEMIST
✓ If the variance  2 is known, the sampling distribution of the sample mean X is
2  2 
normal with mean  and variance . i.e., X ~ N   ,  and the test statistic is
n  n 
X −
Z= ~ N(0,1).

n
X −
✓ If the variance  2 is unknown the test statistic is, t = ~t (n-1).
S
n

Case 2: When the population is non normal.


We use the central limit theorem to approximate the distribution of the sample mean
based on large sample ( n  30 ). Large sample size is a necessary condition to use the
normal distribution. And hence the test statistic is
X −
Z= ~ N(0,1). If  is unknown we can replace it by its sample estimate S.

n
Step 4: Establish a decision rule (critical or rejection region)
The cut-off point to reject or not reject H 0 depends on the level of significance  , the
type of test statistic chosen and the form of the alternative hypothesis. If the value of the
test statistic falls in the rejection region, the null hypothesis is rejected, otherwise we do
not reject H 0 (see fig 1 below). The value of the sample statistic that separates the
regions of acceptance and rejection is called critical value. For a specified  , we read the
critical values from the Z or t tables, depending on the test statistic chosen.

Rejection Rejection
region, α/2 Acceptance region, α/2
region, 1-α

µ=µ0
Critical Critical
value, Zα/2 value, Zα/2

Figure:Area of acceptance and rejection of H 0 (Two-tailed test)


Based on the form of the alternative hypothesis and the test statistic we can make the
following decisions:

44
STATISTICS FOR INDUSTRIAL CHEMIST
i. For H 1 :    0 (two-tailed test) reject H 0 if Z  Z  .
2

Rejection Rejection
region, Acceptance region,
α/2 region, 1-α α/2

-Zα/2 Z=0 Z=α/2

ii. For H 1 :    0 (right-tailed test) reject H 0 if Z  Z  .

Rejection
Acceptance region, α
region, 1-α

Z=0 Zα

iii. For H 1 :    0 (left-tailed test) reject H 0 if Z  −Z .

Rejection
region, α Acceptance
region, 1-α

-Zα Z=0

We can summarize the decsion rules as follows:


Decision Alternative hypotheses
H1 :    0 H1 :    0 H1 :    0
Reject H 0 :  =  0 if Z  Z Z  Z Z  −Z 
2

Reject H 0 :  =  0 if t  t (n − 1) t  t (n − 1) t  −t (n − 1)
2

45
STATISTICS FOR INDUSTRIAL CHEMIST
Step 5: Interpret the result.
Errors in Hypotesis Testing
Ideally the hypotesis testing procedure should lead to the rejection of the null hypothesis
H 0 when it is false and nonrejection of H 0 when it is true. However, the correct decision
is not always possible. Since the decision to reject or do not reject a hypothesis is based
on sample data, there is a possibility of committing an incorrect decision or error.
Hence, a decision-maker may commit one of the two types of errors while testing a null
hypothesis. These errors are summarized as follows:

Decision Null Hypothesis ( H 0 )


True False
Reject H 0 Type I error (  Correct
) decision
Accept H 0 Correct decision Type II error ( 
)
Type I error is committed if we reject the null hypothesis when it is true. The probability
of committing a type I error, denoted by  is called the level of significance. The
probability level of this error is decided by the decision-maker before the hypothesis test
is performed. Type II error is committed if we do not reject the null hypothesis when it
is false. The probability of committing a type II error is denoted by  (Greek letter
beta). As type one error increases type two error will decrease (they are inversely
proportional). Hence we cannot reduce both errors simultaneously. As the sample size
increases both errors will decrease.
Example 9.3: The life expectancy of people in the year 1999 in a country is expected to
be 50 years. A survey was conducted in eleven regions of the country and the data
obtained, in years, are given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and
53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of
significance.
Solution: Let  be the life expectancy of people in the year 1999 in a country.
1. H 0 :  = 50 (The life expectancy of people in the year 1999 in a country is 50
years)
H 1 :   50 (The life expectancy of people in the year 1999 in a country is
different from 50 years)
2. Level of significance, α = 0.05.
3. Since  is unknown and the population is normal, the t-test statistic is
appropriate.
Given: n = 11;  0 = 50 and we need to compute X and s .

46
STATISTICS FOR INDUSTRIAL CHEMIST
11

x i
54.2 + 50.4 + ..... + 57.5 + 53.4 598.5
X = i =1
= = = 54.41
n 11 11
11

x = 54.2 2 + 50.4 2 + ..... + 57.5 2 + 53.4 2 = 32799.91


2
i
i =1

1 
 x i −
( xi )  1 
2

 = 32799.91 −
(598.5) 2 
S =2 2

n −1  n  10  11 
 
1
= (236.07) = 23.607
10
 S = 23.607 = 4.859
Then, the t-test statistic is calculated as:
X −  0 54.41 − 50 4.41
t= = = = 3.01
S 4.859 1.465
n 11
4. For α = 0.05 and two-tailed test, the critical (table) value is:
t (n − 1) = t 0.05 (11 − 1) = t 0.025 (10) = 2.228
2 2

0.02 0.02
5 5
-2.228 0
2.228

Since t = 3.01  t (n − 1) = 2.228  reject the null hypothesis H 0 . That is, the
2

calculated t value lies in the rejection region (the shaded region).


5. Conclusion: The data do not confirm the expected view. That is, the life
expectancy is different from 50 years at 5% level of significance.
Example 9.4: Suppose that we want to test the hypothesis with a significance level of
.05 that the climate has changed since industrialization. Suppose that the mean
temperature throughout history is 50 degrees. During the last 40 years, the mean
temperature has been 51 degrees and the population standard deviation is 2 degrees.
What can we conclude?
Solution:
Let  be the mean temperature.
1. H 0 :  = 50 (There is no change in temperature since industrialization)
H 1 :   50 (There is change in temperature since industrialization)
2. Level of significance, α = 0.05.
3. Since n = 40 is large, the Z-test statistic is appropriate.
Given: n = 40;  = 2; X = 51;  0 = 50
47
STATISTICS FOR INDUSTRIAL CHEMIST
X − 0 51 − 50 1
Z= = = = 3.16
 2 0.316
n 40
4. For α = 0.05 and two-tailed test, the critical (table) value is:
Z  = Z 0.05 = Z 0.025 = 1.96
2 2

0.0 0.0
25 25
-
1.96 Z= 1.9
0 6
Since Z = 3.16  Z  = Z 0.025 = 1.96  reject the null hypothesis H 0 . That is, the
2

calculated Z value lies in the rejection region (the shaded region).


5. Conclusion: There has been a change in temperature since industrialization, at
5% level of significance.
Example 9.5:A study was conducted to describe the menopausal status, menopausal
symptoms, energy expenditure and aerobic fitness of healthy midwife women and to
determine relationship among these factors. Among the variables measured was
maximum oxygen uptake (Vo2max). The mean Vo2max score for a sample of 242 women
was 33.3 with a standard deviation of 12.14. On the basis of these data, can we conclude
that the mean score for a population of such women is greater than 30? Use 5% level of
significance.
Solution:
Let  be the mean Vo2max score for a population of healthy midwife women.
1. H 0 :  = 30 (The mean score for a population of healthy midwife women is 30)
H 1 :   30 (The mean score for a population of healthy midwife women is
greater than 30).
2. Level of significance, α = 0.05.
3. Since n = 242 is large, the Z-test statistic is appropriate.
Given: n = 242; S = 12.14; X = 33.3;  0 = 30
X − 0 33.3 − 30 3.3
Z= = = = 4.23
S 12.14 0.7804
n 242
4. For α = 0.05 and right-tailed test, the critical (table) value is:
Z  = Z 0.05 = 1.65

48
STATISTICS FOR INDUSTRIAL CHEMIST

0.
05

Z= 1.6
0 5
Since Z = 4.23  Z  = 1.65  reject the null hypothesis H 0 . That is, the calculated
Z value lies in the rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy
midwife women is greater than 30 at 5% level of significance.
9.3 Test of Association (Independence)
Usually we encounter with nominal scale data. The  2 test of association is useful for
determining whether there is any relationship or association exists between two
nominal variables. For instance, we might be interested in the relationship between HIV
status with sex, lung cancer and smoking habit, political affiliation and sex, e t c.
When observations are classified according to two variables or attributes and arranged
in a table, the display is called a contingency table as shown below:

The test of association or independence uses the contingency table format. Here the
variables A and B have been classified into mutually exclusive categories. The values O ij
in row i and column j of the table shows the observed frequency falling in each joint
category i and j. The row and column totals are the sums of their corresponding
frequencies. The sum of row or column totals will give grand total n, which represents
the sample size. The procedures to test the association between two independent
variables is summarized as follows:
Step 1: State the null and alternative hypotheis
H 0 : There is no association or relationship exists between two variables, that is,
the two variables are independent.
H 1 : There is association or relationship between two variables, that is, the two
variables are dependent.
Step 2: State the level of significance,  .

49
STATISTICS FOR INDUSTRIAL CHEMIST
Step 3: Calculate the expected frequencies, Eij, corresponding to the observed frequency
in row i and column j. The expected frequencies in each cell are calculated as:
Row i total  Column j total Ri  C j
Eij = =
Sample size n
Step 4: Compute the value of test-statistic:
r c (O − E ) 2
 Cal = 
2 ij ij

i =1 j =1 Eij
where Oij is the observed frequency of row i and coulumn j and Eij is the expected
frequency of row i and coulumn j.
Step 5: Find the critical (table) value of   2 (df ) (from Appendix..). The value of   2
correponds to an area in the right tail of the distribution.
where df = (Number of rows – 1)(Number of columns – 1) = (r – 1)(c – 1)
Step 6: Compare the calculated and table values of  2 . Decide wheather the variables
are independent or not, using the following decision rule:
Reject H 0 if  Cal 2 is greater than   2 , (df ) . Otherwise do not reject H 0 .
Example 9.6: The following data on the colour of eye and hair for 6800 individuals
were obtained from a source:
Hair Eye colour
colour Fair Brown Black red Total
Blue 1768 808 190 47 2813
Green 946 1387 746 43 3122
Brown 115 444 288 18 865
Total 2829 2639 1224 108 6800
Test the hypothesis that hair colour and eye colour are independently distributed (there
is no association between colour of eye and colour of hair) at the level of  = 0.01.
Solution:
1. H 0 : There is no association between hair colour and eye colour.
H 1 : There is association between hair colour and eye colour.
2.  = 0.01.
3. Calculate the expected frequencies, Eij
Ri  C j
Eij =
n
2813  2829 2813  108
E11 = = 1170.29 ……………….. E14 = = 44.68
6800 6800
865  2829 865  108
E31 = = 359.87 ………………….. E34 = = 13.74
6800 6800

50
STATISTICS FOR INDUSTRIAL CHEMIST
Therefore, the contingency table for expected frequencies is as follows:
Hair Eye colour
colour Fair Brown Black red Total
Blue 1170.29 1091.69 506.34 44.68 2813
Green 1298.84 1211.61 561.96 49.58 3122
Brown 359.87 335.70 155.70 13.74 865
Total 2829 2639 1224 108 6800
4. Calculate the test statistic:
r c (O − E ) 2
 Cal = 
2 ij ij

i =1 j =1 E ij

(1768 − 1170.29) 2 (47 − 44.68) 2 (946 − 1298.84) 2


 Cal 2 = + ..... + + + ..... +
1170.29 44.68 1298.84
(43 − 49.58) 2 (115 − 359.87) 2 (18 − 13.74) 2
+ + ..... +
49.58 359.87 13.74
 Cal = 1074.43
2

5. Critical value   2 (df )


df = (r – 1) (c – 1) = (3 – 1) (4 – 1) = (2) (3) = 6
  2 (df ) =  0.01 2 (6) = 16.812
6. Since  Cal 2 = 1074.43 >   2 (df ) = 16.812  Reject H 0 .
7. Conclusion: There is association between hair colour and eye colour. That is, hair
colour and eye colour are dependent.

51
STATISTICS FOR INDUSTRIAL CHEMIST

52
STATISTICS FOR INDUSTRIAL CHEMIST

53
STATISTICS FOR INDUSTRIAL CHEMIST

54

You might also like