0% found this document useful (0 votes)
5 views

II-Quantitative Methods for Economic Analysis-II

Uploaded by

sunilps464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

II-Quantitative Methods for Economic Analysis-II

Uploaded by

sunilps464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIVERSITY OF CALICUT

SCHOOL OF DISTANCE EDUCATION

MA ECONOMICS
(2020Admission Onwards)

QUANTITATIVE METHODS FOR


ECONOMIC ANALYSIS-II
Self-Learning Material

II SEMESTER

CORE COURSE - EC02C08

190308
QUANTITATIVE METHODS FOR
ECONOMIC ANALYSIS-II

STUDY MATERIAL

II SEMESTER

CORE COURSE - EC02C08

MA ECONOMICS
(2020 Admission Onwards)

UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University- PO, Malappuram,
Kerala, India - 673 635
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL SECOND SEMESTER

MA ECONOMICS
(2020 Admission Onwards)

QUANTITATIVE METHODS FOR


ECONOMIC ANALYSIS-II

CORE COURSE - EC02C08

PREPARED BY:
Dr.DINESH.M.P,
ASSISTANT PROFESSOR,
DEPARTMENT OF ECONOMICS,
GOVERNMENT COLLEGE, MOKERI, KOZHIKODE

SCRUTINIZED BY :
Dr. VIMALA.M,
ASSISTANT PROFESSOR,
HEAD OF THE DEPT. (ECONOMICS)
VIMALA COLLEGE, THRISSUR
Module I

Probability and Probability Distributions

Probability

When we associated in an uncertainty situation we use some number to explain the


situation is known as probability. For an example, if we toss a fair coin there is equal chance of
getting a head of tail as our outcome. This type of numerically explained uncertainty situation is
known as probability.

Probability is a number associated with an event, indented to represent its likelihood,


chance of occurring, degree of uncertainty and so on. The probability theory has its origin in
Game of Chance.

The growth and concept of probability from its origin to its modern development can be
studied by analyzing its various definitions. There are mainly 3 types of definitions for
probability.

1. Classical definition of probability ( Mathematical )


2. Frequency definition of probably ( statistical)
3. Axiomatic definition of probability ( Modern)

Classical definition of probability

Some important concepts

1. Random experiment

It is a physical phenomenon and at its completion we ob. serve certain results. There are
some experiments, called deterministic experiments, whose outcomes can be predicted. But in
some cases, we can never predict the outcome before the experiment is performed. An
experiment natural, conceptual, physical or hypothetical is called a random experiment if the
exact outcome of the trails of the experiment is unpredictable. In other words by a random
experiment, we mean

1. It should be repeatable under uniform conditions.

2. It should have several possible outcomes.

3 .One should not predict the outcome of a particular trail.


Example: Tossing a coin, rolling a die, life time of a machine, length of tables, weight of
a new born baby, weather condition of a certain region etc.

2. Trial and Event

Trial is an attempt to produce an outcome of a random experiment. For example, if we


toss a coin or throw a die, we are performing trails. The outcomes in an experiment are termed as
events or cases. For example, getting a head or a tail in tossing a coin is an event. Usually events
are denoted by capital letters like A, B, C, etc...

3. Equally likely events

Events or cases are said to be equally likely we have no reason to expect one rather than
the other.

For example when we toss a coin there will be an equal chance to get head or tail. There
will be unbiased events.

4. Exhaustive events

The set of all possible outcomes in a trail constitutes the set of exhaustive cases. For
example if we toss a coin there will be two exhaustive cases head of tail.

5. Mutually Exclusive events

Events said to be mutually exclusive means, if the happening of any one of them excludes
happening of other in a trail.

6. Favorable cases

The cases which entails to occurrence of an event is said to be favorable to the events.
For example while tossing a die the occurrence of 1 or 3 or 5 are the favorable events which
entitle of an odd number.

Classical definition (Mathematical of a priori)

If a trail results in ‘n’ mutually exclusive, equally likely and exhaustive cases and ‘m’ of
them are favorable to happening of an event A, the probability of A designed as P (A) defines as

𝑚 𝑛𝑜 𝑜𝑓 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠
𝑃(𝐴) = =
𝑛 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠

Obviously 0 ≤ 𝑃(𝐴) ≤ 1

𝑊ℎ𝑒𝑛 𝑃 (𝐴) = 0 , 𝐴 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑖𝑚𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑣𝑒𝑛𝑡𝑠


𝑊ℎ𝑒𝑛 𝑃 (𝐴) = 1, 𝐴 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑎 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡

𝑊ℎ𝑒𝑛 0 ≤ 𝑃(𝐴) ≤ 1, 𝐴 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑣𝑒𝑛𝑡𝑠

II. Frequency definition of probably (statistical)

Let the trails be repeated over a large number of times under essentially homogenous
condition. The limit of the ratio of the number of times an event A happens ‘m’ to be total
number of trails ‘n’, as number of trails tends to infinity is called the probability of the event A.
it is however , assumed that the limit is unique as well as finite.
𝑚
𝑃(𝐴) = lim
𝑛→∞ 𝑛
III. Axiomatic definition of probability (Modern)

Sample space

The set of all possible of a random experiment is called a sample space. It is usually
denoted by S. Every element of the sample space is called sample points of elementary
outcomes.

Example

Sample space for a tossing a coin S = {H, T}

When 2 coin tossed together S = {HH, HT, TH, TT}

Events

The outcomes in an experiment are termed as events or cases. For example, getting a
head or a tail in tossing a coin is an event. Usually events are denoted by capital letters like A, B,
C, etc...

Definition

Let S defined an event as any possible outcome of a random experiment. Let A be an


event defined over S. then we can define a real number P (A) with every A of S, Known as the
probability for occurrence of A if the followings are satisfied.

1. 0 ≤ 𝑃(𝐴) ≤ 1
2. P(S) = 1

3. If A and B are two mutually exclusive events 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Permutation and combinations

If an event A can happen in 𝑛1 ways and another events B happens 𝑛2 ways then the
number of ways in which both events can A and B can happen is specified order is 𝑛1 𝑋 .

If there are three routes from X to Y: two routes from Y to Z then destination Z can be
reached from X is 3 x 2= 6 ways.

Permutation

Permutation refers to the arrangements which can be made by taking some (r) of all
things at a time all of ‘n’ things at a time with attention given to the order of arrangement of the
selected objects.

Mathematicians use a near notation for permutation (arrangement) of ‘n’ object taking ‘r’
objects at the time by writing this statement as 𝑛𝑝𝑟 or nPr. Here the letter P stands for
permutation.

Suppose we want to arrange 3 students A, B and C by choosing two from them at a time. The
arrangements can done in 6 ways

𝐴𝐵, 𝐵𝐴, 𝐴𝐶, 𝐶𝐴, 𝐵𝐶 𝑎𝑛𝑑 𝐶𝐵

In general suppose there are ‘n’ objects to permute in a row taking all at a time. This can be done
in 𝑛𝑝𝑟 ways.

𝑛𝑝𝑟 = 𝑛(𝑛 − 1)(𝑛 − 2) = 3 × 2 × 1 = 6

The permutation of n things taken r at a time (𝑟 < 𝑛) is given by

𝑛𝑝𝑟 = 𝑛(𝑛 − 1) … (𝑛 − 𝑟 + 1)

Example

4𝑃4 = 5 𝑥 4 𝑥 3 𝑥 2 𝑥 1 = 24

Factorial notations

We have a compact nation for the full expression given by the product 𝑛(𝑛 − 1)(𝑛 −
2) … .3.2.1. this is written as 𝑛! read as ‘n factorial’.
So
𝑛𝑝𝑛 = 𝑛! = 𝑛(𝑛 − 1)(𝑛 − 𝑟 + 1) … 3.2.1.

6P6 = 6! = 6.5.4.3.2.1 = 720 by definition 0! = 1

We have,
(𝑛−1)(𝑛−𝑟−1)… 3.2.1) 𝑛!
𝑛𝑝𝑟 = 𝑛(𝑛 − 1) … (𝑛 − 𝑟 + 1) = [ (𝑛−1)(𝑛−𝑟−1)..3.2.1) ] , 𝑛𝑝𝑟 = (𝑛−𝑟)!

Results

1. The number of permeation of n objects when r objects taken at a time when repetition
allowed = nr
2. The number of permutation of n objectives when all n taken at a time when repletion
allowed = nn
3. The number of permutations of n objects of which , n1 are of one kind , n2 are of another
kind n3, are another kind etc., taking all n together is given by
𝑛!
𝑤ℎ𝑒𝑟𝑒 𝑛1 + 𝑛2 + ⋯ 𝑛𝑘 = 𝑛
𝑛1 ! 𝑛2 ! 𝑛3 ! … … . 𝑛𝑘 !
Combination
A combination is a groping or selection or collection of all or part of given
number of thing without reference to their order of arrangement.
If three students A, B, and C are given AB, AC and CB are the only combination
of three things A, B, C taken two at a time and it is denoted as 3𝐶2 the other permutation
BA,CB and AC are not new combinations. They obtained by permuting each
combination among themselves.

Combination of n different things taken r at a time

The number of combinations of n different things taken r at time is denoted as


𝑛𝐶𝑟 or( 𝑛𝑟). It is given by

𝑛𝑃𝑟 𝑛!=𝑛(𝑛−1)(𝑛−𝑟+1)
𝑛𝐶𝑟 = =
𝑟! 1.2.3…𝑟

Or

𝑛!
𝑛𝐶𝑟 =
𝑟! (𝑛 − 𝑟)!
7! 7.6.5.4.3.2.1
Example: 7𝐶2 = = = 21
2!(7−2)! 2!5!
Important rules
𝑛!
1. 𝑛𝐶𝑛 = = 1 This is the combination of n things taken all at a time.
𝑛!0!

𝑛!
2. 𝑛𝐶0 = = 1 This is the combination of n things taken none at a time.
𝑛!0!

3. 𝑛𝐶𝑟 = 𝑛𝐶𝑛−𝑟

4. 𝑛𝐶𝑟 + 𝑛𝐶𝑟−1 = (n+1) 𝐶𝑟

Example 1

A fair coin is tossed 3 times. Find the probability that the tosses results in

i) Two head ii) at least one head

Solution

The sample space is S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}

There would be 8 mutually exclusive outcomes

A: the two head in the toss

B: at least one head


4 1
P (A) = =
8 2

7
P (B) = 8

Example 2

A box contains 8 red 9 blue and 3 white balls. 3 balls are selected randomly. Determine
the probability

a) 2 are red one is white

b) At least one is white

c) One of each color is drawn

Solution
8𝐶2 𝑋 3𝐶1 4
a) P (2 red and 1 white) = =
20𝐶3 95

b) P (at least one is white = 1 – (none is white)


17𝐶3 34 23
= 1− 20𝐶3
= 1 − 57 = 57

8𝑐1 𝑥 3𝐶1 𝑥 9𝐶1 18


c) P (one of each color) = =
20𝐶3 95

Theorems in probability

The followings are some consequence of axioms of probability, which have got general
applications and so they are called theorems.

Theorem 1

The probability an impossible event is zero

𝑃(∅) = 0

Let ∅ be impossible event

We have S∪ ∅ = 𝑆 𝑃(𝑆 ∪ ∅) = 𝑃(𝑆)

P(S) =P(∅) = 𝑃(𝑆)

1 + P(∅) = 1 𝑆𝑜 P(∅) = 1

Theorem 2

𝑃 (𝐴𝑐 ) = 1 − 𝑃 (𝐴)

If A and 𝐴𝑐 𝑎𝑟𝑒 𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑤𝑒 ℎ𝑎𝑣𝑒

𝐴 ∪ 𝐴𝑐 = 𝑆

𝑠𝑜 𝑃(𝐴 ∪ 𝐴𝑐 ) = 𝑃(𝑆)

𝑃(𝐴) + 𝑃(𝐴𝑐 ) = 1 𝑠𝑖𝑛𝑐𝑒 𝐴 𝑎𝑛𝑑 𝐴𝑐 𝑎𝑟𝑒𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒

𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)

𝑃(𝑛𝑜𝑛𝑜𝑐𝑐𝑎𝑟𝑎𝑛𝑐𝑒 𝑜𝑓 𝑎𝑛𝑒𝑣𝑒𝑛𝑡) = 1 − 𝑃(𝑜𝑐𝑐𝑢𝑟𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑎𝑡 𝑒𝑣𝑒𝑛𝑡)

Theorem 3

Additional theorem for two events

If A and B are any two events, the probability for occurrence of either A or B or both is given

𝑃(𝐴) ∪ 𝑃(𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)


Conditional probability

We have defined the probability for occurrence of an event A relative to the outcome Set
S. in other words, when events is considered, already an outcome set S, of which A is subset
specified. Hence it more appropriate to call the probability of A as the probability of A given S.
they may written as 𝑃(𝐴|𝑆).

Now we consider a different kind of conditional statement such as the probability of an


event A given another B has occurred, where B need not be sure event or the outcome set s. the
method of defining a conditional probability 𝑃(𝐴|𝐵), the probability of A given that the event B
has already occurred can be seen from following illustration.

Suppose a Population of N people contain 𝑁1 graduates and 𝑁2 males. Let A be the event
that ‘a person chosen at random is male’

𝑁1 𝑁2
𝑃 (𝐴) = , 𝑃(𝐵) =
𝑁 𝑁

Instead of entire population we may investigate the male sub population and require the
probability that “a male chosen at random is a graduate”. Let the number of makes who are
𝑁3
graduates be𝑁3 , which can be represented by event 𝐴 ∩ 𝐵 and we have 𝑃(𝐴 ∩ 𝐵) = .
𝑁

𝑁3
The probability that a male chosen at random is a graduate = , this is conditional
𝑁2
probability denoted by 𝑃(𝐴|𝐵)

𝑁3
𝑁3 𝑁 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = = =
𝑁2 𝑁2 𝑃(𝐵)
𝑁

Thus, the conditional probability of A given B is

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = 𝑃(𝐵) > 0
𝑃(𝐵)

Definition

Let A and B be any two events. The probability of the event A given that the event B has
already occurred or conditional probability A given B denoted by 𝑃(𝐴|𝐵) , defined as

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = 𝑃(𝐵) > 0
𝑃(𝐵)
Similarly the conditional probability of B given A defined as

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) = 𝑃(𝐴) > 0
𝑃(𝐴)

Independent events

Two events said to be independent, if the probability of the occurrence of one event will
not affect the occurrence or nonoccurrence of another event.

Dependent events

Two events said to be dependent, if the probability of the occurrence of one event will
affect the occurrence or nonoccurrence of another event.

Independence of Two events A and B

Two events A and B said to be independent if

𝑖)𝑃(𝐴|𝐵) = 𝑃(𝐴) 𝑜𝑟

𝑖𝑖) 𝑃(𝐵|𝐴) = 𝑃 (𝐵) 𝑜𝑟

𝑖𝑖𝑖)𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) 𝑃(𝐵)

Bayes Theorem

Let S be a sample space which partitioned into n mutually exclusive events


𝐵1 , 𝐵2 … . . 𝐵𝑛 such that 𝑃 (𝐵𝑖 ) > 0, 𝑖 = 1, 2, 3 … . 𝑛. Let A be any event of S for which𝑃 (𝐴) ≠
0. Then the probability for event 𝐵𝑖 (𝑖 = 1, 2, 3 … . 𝑛) given the event A is

𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 )
𝑃(𝐵𝑖 |𝐴) = ∑𝑛 , 𝑖 = 1, 2, 3 … . 𝑛.
𝑖=1 𝑃(𝐵𝑖 )𝑃( 𝐴|𝐵𝑖 )

Bays theorem gives the relationship between 𝑃(𝐵𝑖 |𝐴)𝑎𝑛𝑑 𝑃(𝐴|𝐵𝑖 ) and thus it involves a
type of inverse reasoning. Bays theorem plays an important role in business application. This
theorem is due to Thomas A Bayes.

Example

Three machines A, B, C produce 60, 30, and 10 percent respectively of the total
production of the factory. It is estimated that A produce 2% defectives. B produces 3 %
defectives and C produces 4% defectives in their production. An item is chosen randomly from
the total production is found to be defective. What is the probability that it has come machine A?
Or from machine B? Or from Machine C?
Solution

Let us define the events

B1- The machine produce the item

B2 – The machine B produce the item

B3 – The machine D produces the item

A- The item drawn from the production is defective.

So P (B1) = .60 P (B2) =.30 P (B3) = .10

𝑃(𝐴|𝐵1) = .02 𝑃(𝐴|𝐵2) =.03 𝑃(𝐴|𝐵3) =.04

𝑃(𝐵1 ). 𝑃(𝐴|𝐵1 )
𝑃(𝐵1 |𝐴) =
𝑃(𝐵1 ). 𝑃(𝐴|𝐵1) + 𝑃(𝐵2). 𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 ). 𝑃(𝐴|𝐵3 )

. 60 𝑋 .02
=
. 60 𝑋. 02 + .30𝑋. 03 + .10𝑋. 04
. 012 . 012 12
= = =
. 012 + .009 + .004 . 025 25
𝑃(𝐵2 ). 𝑃(𝐴|𝐵2 )
𝑃(𝐵2 |𝐴) =
𝑃(𝐵1 ). 𝑃(𝐴|𝐵1) + 𝑃(𝐵2 ). 𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 ). 𝑃(𝐴|𝐵3 )

. 30 𝑋 .03
=
. 60 𝑋. 02 + .30𝑋. 03 + .10𝑋. 04
. 009 . 009 9
= =
. 012 + .009 + .004 . 025 25

𝑃(𝐵3 |𝐴) = 1 − [ 𝑃(𝐵1 |𝐴) + 𝑃(𝐵2|𝐴)]

12 9 4
=1−( )+( )=
25 25 25

Random variable
A random variable (r.v) is real valued function defined over the sample space. So
it is domain of definition is the sample space S and range is real line extending from −∞ 𝑡𝑜 +
∞. A r.v is denoted as by X and the values taken by the X denoted by 𝑥1 , 𝑥2 , … . . 𝑥𝑛 .

Random variables are of two types (i) desecrate (ii) continuous. A random variable X is
said to be discrete if the range include finite number of values. The possible values of a discrete
random variable can be labeled as 𝑥1 , 𝑥2 , 𝑥3 , ….Example number of death happened in a day due
to Covid 19, number of printing mistake in a well designed book etc…

A random variable which is not discrete said to be continuous. That means it can assumes
infinite number of values from a specified intervals of the form [a, b]. Examples (i) life time of
tube, (ii) height of the randomly selected student etc…

Note that r.v.s are denoted as by the capital letters X,Y,Z etc. and corresponding small
letters are used to denote the values of a r.v.

By a distribution of random variable X we mean the assignment of probabilities to all


events defined in terms of random variables.

Probability distribution

• Discrete
The probability distribution of or simply distribution of a discrete r.v. is the list of
the distinct value of 𝑥𝑖 of X which associated probabilities of f (𝑥𝑖 ) = 𝑃(𝑋 = 𝑥𝑖 ).

Thus let x be a discrete random variable assuming the values 𝑥1 , 𝑥2 , … . . 𝑥𝑛 from the real
line. Let corresponding probabilities𝑓(𝑥1 ), 𝑓(𝑥2 ), … . . 𝑓(𝑥𝑛 ). Then 𝑃(𝑋 = 𝑥𝑖 ) = 𝑓(𝑖) is called
probability mass function or probability function of X, provided it satisfy the conditions.

(𝑖). 𝑓(𝑖) ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖

(ii).∑ 𝑓(𝑥𝑖 ) = 1

ii) Continuous

If X is continuous r.v. and if 𝑃(𝑥 ≤ 𝑋 ≤ 𝑥 + 𝑑𝑥) = 𝑓(𝑥)𝑑𝑥, then f(x) is called


probability density function (p.d.f) a continuous r.v X provided it satisfy the conditions.

i). 𝑓(𝑥)0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥


+∞
ii). ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
Once the p.d.f 𝑓(𝑥) of a continuous r.v is specified, the problem of calculating the
probability of interval becomes compute the area under the curve in a strip supported by the
interval.

Distribution function

Distribution function is an important concept associated with a random variable. The


distribution function is also called cumulative probability distribution function.

Definition

For any random variable X, the function of the real variable x defined as

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) Is called cumulative probability distribution function or simply


cumulative function (c.d.f) of X. We can note that the probability distribution of a random
variable X is determined by its distribution function.

If X is a discrete r.v with p.m.f P(x) then the cumulative distribution function is defined
as

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑𝑥−∞ 𝑝(𝑥)

If X is a continuous r.v with p.d.f f(x), then the distribution function is defined as
𝑥

𝑓(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑(𝑥)𝑑𝑥


−∞

Properties of distribution function

If F(x) is the distribution function of a r.v .X then it has following properties.

1. F(x) is defined for all real values of x.


2. F(−∞) = 0, 𝐹(+∞) = 1
3. 0 ≤ 𝐹(𝑥) ≤ 1.
4. 𝐹(𝑎) ≤ 𝐹(𝑏)𝑖𝑓 𝑎 < 𝑏 𝑇ℎ𝑎𝑡 𝑚𝑒𝑛𝑠 𝐹(𝑥)𝑖𝑠 𝑛𝑜𝑛 𝑑𝑒𝑐𝑟𝑒𝑠𝑖𝑛𝑔
5. For X is discrete, 𝐹(𝑏) − 𝐹(𝑎) = 𝑃(𝑎 < 𝑋 ≤ 𝑏)
6. For a discrete r.v the graph of 𝑓(𝑥) indicates that it is a step function
7. F(x) is continuous function of x on the right.
8. F(x) possesses a continuous graph, if X is continuous. If F(x) possesses a derivative , then
𝑑𝐹(𝑥)
= 𝑓(𝑥)
𝑑𝑥
9. The discontinuities of F(x) are at most countable.
Expected value of Random variable (Mathematical Expatiation)

Let X be the a discrete r.v assuming the values 𝑥1 , 𝑥2 , … . . 𝑥𝑛 with corresponding


probabilities 𝑝1 , 𝑝2 , … 𝑝𝑛 the expected values of X or mathematical expectations of X defined as

𝐸 (𝑋) = 𝑥1 𝑝1 + 𝑥2 𝑝2 + ⋯ . +𝑥𝑛 𝑝𝑛 = ∑ 𝑥𝑖 𝑝𝑖

We can define mathematical expectations of a r.v X given a probability mass function


f(x) as,

𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑒𝑠𝑐𝑟𝑒𝑡𝑒

= ∫ 𝑥𝑓(𝑥)𝑑𝑥 𝑖𝑓 𝑋𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠

Properties

1. 𝐸(𝐶) = 𝑐 , 𝑐 𝑏𝑒𝑖𝑛𝑔 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡


2. 𝐸(𝑐, 𝑋) = 𝑐 𝐸(𝑋)
3. 𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏
4. 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
5. 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)𝑖𝑓 𝑋 𝑎𝑛𝑑𝑌𝑎𝑟𝑒 𝑖𝑛𝑑𝑖𝑝𝑒𝑛𝑛𝑑𝑒𝑛𝑡

Expectation of a function of a r.v

Let X be a discrete r.v with pmf f(x). Let g (x) be any function of X

Then expected value of g (x) is defined as 𝐸[𝑔(𝑥)] = ∑ 𝑔(𝑥) = 𝑓(𝑥).

For example if 𝑔(𝑥) =𝑥 2 then E (𝑥 2 ) = ∑ 𝑥 2 𝑓(𝑥)

Summery measures

1. Arithmetic mean
Mean = 𝐸(𝑋)
2. Standard deviation
𝑉(𝑋) = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸[𝑋 − 𝐸(𝑋)]2 SD= √𝑉(𝑋)

To calculate variance uses following method

𝑉(𝑋) = 𝐸[𝑋 − 𝐸(𝑋)]2 = 𝐸(𝑋)2 − {𝐸(𝑋)}2 on simplication

Where 𝐸(𝑋) = ∑ 𝑥𝑖 𝑝𝑖 𝑎𝑛𝑑 𝐸(𝑋)2 = ∑ 𝑥𝑖 2 𝑝𝑖


3. Skewness
𝜇3 2
𝛽1 = , 𝛾1 = √𝛽1
𝜇2 3
4. Kurtosis
𝜇
𝛽2 = 𝜇 43 , 𝛾2 = 𝛽2 − 3
2
Module II

Discrete and Continuous Probability Distribution

The frequency distribution which we have already studied refers to sample drawn from
the given population. When the values of variate in the population are distributed according to
some law, which can be expressed mathematically, such distributions are known as theoretical
distribution. Binomial distribution, poisson distribution and Normal distribution are important
distribution.

Binomial Distribution
Binomial distribution is useful when there are dichotomy events that means there are two
possible outcomes, success denoted as S and failures denoted as F. for an example a fair coin is
tossed in 100 times, the number of time getting head is our events, which is considered to be the
success. And getting tail will be the failures.

Derivation of Binomial distribution

An experiment is to performed in which the outcomes can be classified either a success S


or as a Failure F. suppose the probability of success is to be known and denoted by p. the
experiment is repeated in n times and it given that P(S) remain the same throughout and the trails
are independent. When we interested in finding the probability distribution of X where X is
defined as the number of success in the trails. If P(S) =p then P (F) = 1-p =q. then the random
variable X can take the values 0, 1, 2, 3, n.

Definition
A random variable X is defined to have a binomial distribution if the probability density
function of X is given by

𝐹(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0.1.2.3 … 𝑛, 𝑝 + 𝑞 = 1

Mean and variance of Binomial distribution

𝑀𝑒𝑎𝑛 = 𝑛𝑝

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑛𝑝𝑞

𝑆𝐷 = √𝑛𝑝𝑞

Fitting of Binomial Distribution

It means the finding the expected or theoretical binomial frequency against the given
observed frequencies. The theoretical frequencies are obtained by multiplying the binomial
probability by total frequency.
Example 1

The mean of the binomial distribution are 2.5 and variance is 1.875. Obtain the binomial
probability distribution.

Solution

𝑀𝑒𝑎𝑛 = 𝑛𝑝 = 2.5

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑛𝑝𝑞 = 1.875

𝑆𝑜 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 2.5𝑞

2.5𝑞 = 1.875

1.875
𝑞= = .75
2.5
𝑆𝑜 𝑝 = 1 − 𝑞 = 1 − .75 = .25

𝑆𝑖𝑛𝑐𝑒 𝑛𝑝 = 2.5

. 25𝑛 = 2.5

2.5
𝑛= = 10
. 25
The binomial probability distribution is

F(x) =10𝐶𝑥 (.25)𝑥 (.75)10−𝑥 𝑛 = 0,1,2 … … .10

Example 2

8 coins are tossed 256 times. Number of head observed at each throw is recorded. Find
the expected frequencies. What are the theoretical values of men and variance?

Solution

The coin is unbiased one


1 1
So𝑝 = 2 , 𝑞 = 2, n= 8 and N=256
Let us number of head is observes as x= 0, 1, 8

𝐹(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 . 𝑁


1 1 1
So 𝑓(0) = 8𝑐0 ( )0 . (2)8−0 . 256 = (2)8 . 256 = 1
2

1 1 1 1
𝑓(1) = 8𝑐1 ( )1 . ( )8−1 . 256 = ( )1 . ( )7 256 = 8
2 2 2 2
1 1 1 1
𝑓(2) = 8𝑐2 ( )2 . ( )8−2 . 256 = ( )2 . ( )6 256 = 28
2 2 2 2
1 1 1 1
𝑓(3) = 8𝑐3 ( )3 . ( )8−3 . 256 = ( )3 . ( )5 256 = 56
2 2 2 2
1 1 1 1
𝑓(4) = 8𝑐4 ( )4 . ( )8−4 . 256 = ( )4 . ( )4 256 = 70
2 2 2 2
1 1 1 1
𝑓(5) = 8𝑐5 ( )5 . ( )8−5 . 256 = ( )5 . ( )3 256 = 56
2 2 2 2
1 1 1 1
𝑓(6) = 8𝑐6 ( )6 . ( )8−6 . 256 = ( )6 . ( )2 256 = 28
2 2 2 2
1 1 1 1
𝑓(7) = 8𝑐7 ( )7 . ( )8−7 . 256 = ( )7 . ( )1 256 = 8
2 2 2 2
1 1 1 1
𝑓(8) = 8𝑐8 ( )8 . ( )8−8 . 256 = ( )1 . ( )0 256 = 1
2 2 2 2
1
𝑀𝑒𝑎𝑛 = 𝑛𝑝 = . 8 = 4
2
1 1
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑛𝑝𝑞 = 8. . = 2
2 2

𝑆𝐷 = √𝑛𝑝𝑞 = √2

Poisson distribution

Poisson destruction is a discrete probability distribution. The distribution was developed


by famous French mathematician simon.D.Poisson in 1837.

Definition

A discrete r.v X is defined to have Poisson distribution if the density is X is given by


𝑒 −𝜆 𝜆𝑥
𝑓(𝑥) = , 𝑥 = 0,1,3, … . . ; 𝜆 > 0
𝑥!
= 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Where λ is called the parameter of the poison distribution

In this case we can write as 𝑋 → 𝑃(𝜆)

Some expel situations for Poisson distribution is

1. The number of total road traffic accidents per week in a given state

2. Number of printing mistakes per page of book

3. Number of customers enter in a bank a 20 mantis times etc.

Poisson Distibutribution as a limiting from of binomial

The poisson distribution is obtained as an approximation to binomial distribution under


condition.

(i).𝑛 𝑖𝑠𝑣𝑒𝑟𝑦 𝑙𝑎𝑟𝑔𝑒 (𝑛 → ∞)

(ii). P is very small ((𝑝 ⟶ 0)

(iii).𝑛𝑝 = 𝜆, 𝑎 𝑓𝑖𝑛𝑖𝑡𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦

Conclusion measures of Poisson distribution

Mean = λ

Variance =V(x) = λ

SD= √𝜆

Fitting of PD

By fitting a poisson distribution we mean to calculate the expected poisson frequencies


against the given observed frequencies. Theoretical frequencies are obtained by multiplying the
probability with total frequencies.

𝑒 −𝜆 𝜆𝑥
𝐹(𝑋) = 𝑁𝑃 (𝑋 = 𝑥) = 𝑥!
𝑁, 𝑥 = 0,1.2 … … .. λ>0
Example 1

On an average a certain intersection results in 3 traffic accidents per month. What is the
probability that n any given month at this intersection at least two accidents will occur?

Let X be the number of accidents occur in a month

Then X⟶P (3). We have to find P (𝑋 ≥ 2)

𝑒 −3 30 𝑒 −3 31
𝑃(𝑋 ≥ 2) = 1 − {𝑃(𝑋 = 0)} + 𝑃 (𝑋 = 2) = 1 − { }+{ }=
0! 1!

= 1 − 4𝑒 −3 = 1 − 4 ×0.0497=1-.1990=0.801

Example 2

Fit a poisson distribution of following data

𝑋 0 1 2 3 4
f 123 59 14 3 1

Solution

𝑒 −𝜆 𝜆𝑥
The poisson frequency function is 𝑓(𝑥) = . 𝑁 , X=0, 1,2,3,4
𝑥!

0 × 123 + 1 × 59 + 2 × 14 + 3 × 3 + 4 × 1 100
𝜆= = = 0.5
123 + 59 + 14 + 3 + 1 200

𝑒 −.5 . 50
∴ 𝑓(0) = × 200 = 200 × 𝑒 −.5 = 200 × .6065 = 121.30 ≈ 121
0!
𝑒 −.5 . 51
𝑓(1) = × 200 = 200 × 𝑒 −.5 × .5 = 60.65 ≈ 61
1!
𝑒 −.5 . 52 𝑒 −.5 × (.5)2
𝑓(2) = × 200 = 200 × = 15.16 ≈ 15
2! 2!
𝑒 −.5 . 53 𝑒 −.5 × (.5)3
𝑓(3) = × 200 = 200 × = 2.527 ≈ 3
3! 3!
𝑒 −.5 . 52 𝑒 −.5 × (.5)4
𝑓(4) = × 200 = 200 × = .0315 ≈ 0
4! 4!
Thus theoretical frequencies are

𝑋 0 1 2 3 4 total
f 121 61 15 3 0 200

Normal distribution

The most important continuous distribution used in statistics is the Normal distribution.
Great many of the techniques used in applied statistics are based upon the normal distribution.
Thus, normal distribution is in many ways the cornerstone of modern statistical theory. This
distribution was first discovered by Abraham De’Moivere. Later two mathaticiasian Pierre
Laplace and Kall Gauss developed the distribution indipendely. The normal distribution is
known as Gaussian distribution.

Definition

A random variable X is defined to be a normal distribution its density is given by


−(𝑥−𝜇)2
1
𝑓(𝑋) = 𝜎 𝑒 2𝜎2 , −∞ < 𝑋 < ∞
√2𝜋

Where the parameters𝜇 𝑎𝑛𝑑 𝜎 satisfy −∞ < 𝜇 < ∞ and 𝜎 > 0. any distribution defined by the
density function given above is called normal distribution.

When N follows normal distribution, we can called write it symbolically as

X⟶ N (𝜇, 𝜎) 𝑜𝑟 𝑋 → (𝜇, 𝜎 2 )

Once the shape of the 𝜇 𝑎𝑛𝑑 𝜎 are known the shape of the curve of normal distribution is
completely determined.

Properties of normal curve

1. The normal curve is symmetric about the ordinate at x=𝜇

2. The mean, median and mode are identical

3. The normal curve f(x) has minimum at x=𝜇

4. The normal curve extended from -∞ 𝑡𝑜 + ∞

5. The curve touches the X axis only at ±∞ ∴ 𝑋 𝑖𝑠 𝑎𝑛 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑒 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑢𝑟𝑣𝑒

6. for the normal curve 𝛽1 = 0 𝛽2 = 3

7. In a normal distribution QD= (2/3) SD and MD= (4/5) SD


8. All odd odder central moments are vanish

9. The even central moments given 𝑏𝑦 𝜇2𝑟 = 0 𝑟 = 0,1,2, …

10. The point of inflection on the curve 𝑋 = 𝜇 ± 𝜎

11. The lower and upper quartiles are equidistant from median.

12. The area under the normal curve is distributed as below

(i) 68.27% of the area lies between 𝜇 − 𝜎 𝑎𝑛𝑑 𝜇 + 𝜎

P (𝜇 − 𝜎 ≤ 𝑋 ≤ 𝜇 + 𝜎) = .6827

(ii) 95.45% of the area lies between 𝜇 − 2𝜎 𝑎𝑛𝑑 𝜇 + 2𝜎

𝑃(𝜇 − 2𝜎 ≤ 𝑋 ≤ 𝜇 + 2𝜎) = .9545

(iii) 99.73% of the area lies between 𝜇 − 3𝜎 𝑎𝑛𝑑 𝜇 + 3𝜎

𝑃(𝜇 − 3𝜎 ≤ 𝑋 ≤ 𝜇 + 3𝜎) = .9973

𝑀𝑒𝑎𝑛 = 𝐸(𝑋) = 𝜇

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑉(𝑋) = 𝜎 2

Standard normal distribution

When X⟶ N (𝜇, 𝜎) 𝑖𝑡𝑠 𝑝𝑑𝑓 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦


−(𝑥−𝜇)2
1
𝑓(𝑋) = 𝜎 𝑒 2𝜎2 , −∞ < 𝑋 < ∞
√2𝜋

𝑋−𝜇
Define Z= by changing the variable, the probability density function of Z is given by
𝜎

−(𝑥−𝜇)2
𝑑𝑧 1 1 2/2
𝐹(𝑧) = 𝐹(𝑥) | | = 𝜎 𝑒 2𝜎2 ×𝜎 = 𝑒 −𝑧 , −∞ < 𝑍 < ∞
𝑑𝑧 √2𝜋 √2𝜋

A normal variable is said to be standard normal variate if its pdf is given by

1 2/2
𝐹(𝑧) = 𝑒 −𝑧 , −∞ < 𝑍 < ∞
√2𝜋
And corresponding probability distribution is called standard normal distribution. we can see
that the men of the Z is 0 and variance is 1.that means the normal distribution with zero mean
and variance is unity is called standard normal distribution, and it is denoted as

𝑍 → 𝑁(0,1)

Example 1

X be random variable with mean 42 and standard deviation 4, find probability that value taken
by X is taken by (i) less than 50 (ii) greater than 50 (iii) less than 40 (iv) between 37 and 41

Solution

X be a normal variate with parameters 𝜇 = 42 𝑎𝑛𝑑 𝜎 = 4

𝑋 − 𝜇 𝑋 − 42
𝑍= = → 𝑁(0,1)
𝜎 4
𝑋−42 50−42
(i) 𝑃[𝑋 < 50] = 𝑃 [ − ] = 𝑃[𝑍 < 2]
4 4

= 𝑃(−∞ < 𝑍 < 0) + 𝑃(0 < 𝑍 < 2)

= [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 − ∞ 𝑡𝑜 0] + [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 0 𝑡𝑜 2]

= 0.5 + .4772 = 0.9772


𝑋−42 50−42
(ii) 𝑃[𝑋 > 50] = 𝑃 [ − ] = 𝑃[𝑍 < 2]
4 4

= 𝑃(−∞ < 𝑍 < 0) − 𝑃(0 < 𝑍 < 2)

= [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 − ∞ 𝑡𝑜 0] − [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 0 𝑡𝑜 2]

= 0.5 − .4772 = 0.0228


𝑋−42 50−42
(iii) 𝑃[𝑋 < 40] = 𝑃 [ − ] = 𝑃[𝑍 < .05]
4 4

= 𝑃(−∞ < 𝑍 < 0) + 𝑃(0.5 < 𝑍 < 0)

= [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 − ∞ 𝑡𝑜 0] − [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 0 𝑡𝑜 − .05]

= 0.5 − .1915 = .3085


37−42 𝑋−42 41−42
(iv) 𝑃[34 < 𝑋 < 41] = 𝑃 [ 4
< 4
< 4
] = 𝑃[−1.25 < 𝑍 < .025]
= [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 𝑜 𝑡𝑜 1.25} − [𝑎𝑟𝑒𝑎 𝑓𝑟𝑜𝑚 0 𝑡𝑜 0.25]

= 0.3944 − .09887 = 0.2957

Module III: Theory of Estimation

The distribution of random variable is called distribution of the population. We are


interested to study the characteristics of population based on the sample taken from the
population. The process of making inferences about the population making sample is known as
statistical inference or inferential statistics.

If 𝑋1, 𝑋2 … … … . . 𝑋𝑛 independently randomly distributed r.v.s we say that they constitute a


random sample from the population given but their common distribution. According to this
definition the random sample 𝑥1, 𝑥2 … … … . . 𝑥𝑛 of size ‘n’ is a collection of random variables
(𝑋1, 𝑋2 … … … . . 𝑋𝑛 ) such that the variable 𝑋𝑖, are all means observations of a random variables
from the repetitions of random experiment can be treated as i.i.d random variables.

Parameters and Statistics

Any measure calculated on the base of a population values is called a ‘Parameter’ for
example, population mean 𝜇, population standard diviation𝜎, population variance𝜎 2 , population
correlation coefficient 𝜌 𝑒𝑡𝑐. statistical inference are usually based on ‘statistics’ that is on
random variables 𝑋1, 𝑋2 … … … . . 𝑋𝑛 constituting a random sample. In other words any measures
computed from the basis of the sample values are called statistics. For example sample means𝑥,
sample standard deviation s, sample variance 𝑠 2 ,sample correlation co-efiieciant r etc.

Sampling distribution

Population elements included in different samples from the same population may
different. So value of statistic itself to vary from sample to another sample. Thus if a number of
samples, each of size n, are taken from sample population and if for each sample the values of
statistics is calculated, a series of values of statistics will be obtained. If the number of samples is
large these may be arranged in into frequency table. The frequency distribution of statistics that
would be obtained if the number of samples each of same size were large is called the sampling
distribution of the statistics. The probability distribution of is called sampling distribution. In
another words by a sampling distribution we mean the distribution of a statistics.

Standard error
The standard deviation of the sampling distribution is called standard error of the
statistics. If t is a sampling distribution f (t), the standard error (SE) of given by SE of t =√𝑉(𝑡)
where V (t) = E (𝑡)2 − {𝐸(𝑡)}2

Uses of Standard Error

Standard error plays an important role in the large sample theory and forms the basis of the
testing of hypothesis.

• Since SE is inversely proportionate to simple size n, it is very helpful in the determination


of the proper size of a simple to taken for the estimate the parameters.
• It is used for the testing of given hypothesis
• SE gives the idea of about the reliability of a sample. The reciprocal of the SE is measure
the reliability of the sample.
• SE is used to determine the confidence limit of the population parameters like mean,
proportion, standard deviation etc.

Sampling distribution of small sample


The probability distribution of statistic computed from a small sample drawn from
normal population is discussed here. Being small sample we get exact probability
distributions of this statistics or exact sampling distribution.

By a small sample we means sample of size less than 30, sample size more than
and equal to 30 is treated as large sampling.

Sampling distribution of sample mean


𝜎
Let 𝑥1, 𝑥2 … … … . . 𝑥𝑛 be a random sample size n drawn from 𝑥̅ ⟶N (𝜇, )
√𝑛

1. When 𝑥̅ is the mean of sample size n drawn from a population which not normal,
𝜎
sampling distribution of 𝑥̅ can be approximated normal N (𝜇, ) using the central
√𝑛
limit theorem, provided n sufficiently large.
𝜎
2. In the case of normal population, the distribution 𝑥̅ is normal N (𝜇, ) for any
√𝑛
sample size n.
𝜎2 𝜎
3. the above results shows that E(𝑥̅ ) = 𝜇, 𝑉(𝑥̅ ) = ∴ 𝑆𝐸 𝑜𝑓𝑥̅ =
𝑛 √𝑛
−𝑛(𝑥−𝜇)2
1
f (𝑥̅ ) = 𝜎 𝑒 2𝜎2 , −∞ < 𝑋 < ∞
√2𝜋
Chi Square Distribution
Kall Pearson described well known probability distribution call Chi Square
distribution of distribution of 𝜒 2 is a random variable used as test a test statistics. Let
random variable 𝑋1 , 𝑋2 , … . . 𝑋𝑛 be taken from a normal population with mean
𝜇 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 , 𝑋𝑖 → 𝑁(𝜇, 𝜎 2 ). we define 𝜒 2 statistic as the sum of squares of
standard normal variate.
𝑛
𝑋𝑖 − 𝜇) 2
𝜒 2 = ∑( )
𝜎
𝑖=1
Definition
A continuous r.v 𝜒 2 assuming values from 0 to∞, is said to follow a chi square
distribution with n degrees of freedom if its pdf is given by
1
( )𝑛/2 2 /2 𝑛
𝑓(𝜒 2 ) = 2
𝑛 𝑒 −𝜒 (𝜒 2 ) 2 −1 , 0≤ 𝜒 2 ≤ ∞

2
=0 otherwise
Where ⎾𝑛 = (𝑛 − 1)! and n is the parameter of the distribution and we write as
𝜒 2 → 𝜒 2 (𝑛)𝑑. 𝑓
2)
1. 𝑀𝑒𝑎𝑛 = 𝐸(𝜒 = 𝑛
2. . 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑉(𝜆2 ) = 2𝑛
3. 𝜒 2 𝑑𝑖𝑠𝑠𝑡𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑒𝑠 𝑓𝑜𝑟 𝑑𝑖𝑓𝑓𝑒𝑟𝑛𝑡 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

Degrees of freedom
There are n degrees of freedom in a sample of n observation but one degree of
freedom is used up in calculating 𝑥̅ , 𝑙𝑒𝑎𝑣𝑖𝑛𝑔 𝑜𝑛𝑙𝑦 𝑛 − 1 degrees of freedom for the
residuals (𝑥 − 𝑥̅ ) to calculate 𝑠 2 .

Sampling distribution of variance


The deviation of exact sampling distribution of sample variance 𝑆 2 needs
transformation of variables and associated mathematics.
𝑛𝑠 2 𝑛𝑠 2
2
→ 𝜒 2 (𝑛 − 1)𝑑𝑓 ∴ 𝐸 ( 2 ) = 𝑛 − 1
𝜎 𝜎
𝑛−1 𝑛−1
This yield E (𝑠 2 ) = 𝜎 2 similarly V (𝑠 2 ) = 2. 𝜎4
𝑛 𝑛2

Student’s ‘t” distribution


According to central limit theorem, if simple random sample of size n are taken
from a population whose mean and standard deviation are 𝜇 𝑎𝑛𝑑 𝜎 then the sample
𝜎
𝑥̅ will bdestibuted normally with mean 𝜇 𝑎𝑛𝑑 𝑠𝑡𝑎𝑡𝑛𝑑𝑎𝑟𝑑 𝑑𝑖𝑣𝑖𝑎𝑡𝑖𝑜𝑛
√𝑛
𝑥̅ −𝜇
In other words Z= → 𝑁(0,1)
𝜎/√𝑛
When the population standard deviation 𝜎 is not known and s is the sample standard
deviation then the also the statistics
𝑥̅ −𝜇
𝑍= → 𝑁(0,1) Provided n, the size of the sample is sufficiently large (n≥ 30). But
𝑠/√𝑛
in the case of small sample (n<30) the distribution Z is not approximately normal. We
can no obtain the value of z for a described level of confidence from the table areas under
standard normal curve. The distribution of statistics in such case is known as Student’s
‘t’distibution.

Definition
A continuous random variable t assuming the values from -∞ 𝑡𝑜 + ∞ with the pdf
given by
1 𝑡2 1
𝑓(𝑡) = (1 + )𝑛+2 − ∞ ≤ 𝑡 ≤ +
1 𝑛 𝑛
√𝑛𝛽( 2 2)

Is said to follow a student’s’ distribution with n degrees of freedom. The t distribution depends
only on ‘n’which is the parameter of the distribution.

Definition of‘t’ statistics

If the random variable Z→ 𝑁(0,1)𝑎𝑛𝑑 𝑌 → 𝜒 2 (𝑛) and if Z and Y are independent then
the statistics defined by
𝑍
𝑡= Follows student’s‘t’ distribution with n df.
√𝑦/𝑛

Snedecor’s F distribution

Another important distribution is F distribution named after honor of


Sir.Ronald.A.Ficher. The f distribution which we shall later find to be of considerable practical
interest, is the ratio of two independent chi square random variables dived by their respective
degrees of freedom.

If U and V are independently distributed with chi square distribution with 𝑛1 𝑎𝑛𝑑 𝑛2
degrees of freedom.
𝑈
𝑛1
Then, 𝐹= 𝑉 is a random variable following an F distribution with
𝑛2

(𝑛1 , 𝑛2 ) 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

Definition
A continuous random variable F, assuming values 0 to ∞ and having the pdf given by
𝑛
1 −1
𝑛1/𝑛 2 𝑛2/𝑛 2 𝐹2
𝑓(𝐹) = 1
𝑛 𝑛
2
× 𝑛1 +𝑛2 0 ≤ 𝐹 ≤ ∞ is said to follow an F Distribution with
𝛽( 1 2 )
2 2 (𝑛1 𝐹+𝑛2 ) 2

𝑛1 , 𝑛2 degrees of freedom.

Central limit theorem

The theorem states that the sum of a very large number of a random variable is
approximately normal distributed with men equal to sum of means of variables and variance
equal to sum of the variance of the variable provided the random variable satisfy certain very
general assumption.

Estimation theory

The main objectives of the statistical analysis are to draw valid conclusion about the
population on the basis of the sample drawn from the population. Thus statistical inference is the
process of making inference about unknown accepts of the distribution of population based on
the sample taken from it. The unknown aspect may be the form of distribution or the values of
the parameters involved in it. The characteristics like mean, median, mode, standard deviation
etc. of the population is referred to as parameters of the population.

Statistical inference is primarily inductive in nature because the unknown parameters of


the population are calculated by the sample taken from that population. There are mainly two
ways for interracial statistics.

1. Estimation of parameters
2. Test of hypothesis
The theory of estimation was expounded by Prof. R.A Fisher in his research papers
around 1930. The estimation can be made in two ways.
1. Point estimation
2. Interval estimation

Point estimation

If from the observation in a sample, a single value is calculated as an estimate of the


unknown parameter. The procedure is referred as point estimation. For example, if we use a
value of 𝑥 to estimate 𝜇 of population we are using point estimation correspondingly; we refer to
statistic 𝑥 aspoint estimator. That is the term estimator represent a rule of method of estimating
the population parameter and the estimate represent value produced by the estimator.
Various statistical properties of estimator can, thus be used to decide estimator is most
appropriate in a given situation. Which will expose us the smallest risk and which will give as
more information at the smallest cost and so forth. There are 4 criteria for finding a good
estimator,

1. Unbiasedness
2. Consistent
3. Efficiency
4. Sufficiency

1. Unbiasdness
An unbiased estimator is a statistic that has an expected value equal to the
unknown true value of the population parameter being estimated. An estimator not
having his property is said biased.
Let X be a random sample having the pdf 𝑓(𝑥, 𝜃) where 𝜃 may be unknown. Let
𝑋1 𝑋2 … . 𝑋𝑛 be a random sample taken from the population represented by X. Let 𝑡𝑛 =
𝑡(𝑋1 𝑋2 … . 𝑋𝑛 ) be an estimator of parameter 𝜃.
If E (𝑡𝑛 ) = 𝜃 for every n, then estimator 𝑡𝑛 is called an unbiased estimate of 𝜃.

2. Consistency
One of the basic properties of a good estimator is that it provides increasingly
more precise information about the parameter 𝜃 with increase of sample size n.
That is for 𝜖 > 0 lim 𝑝(|𝑡𝑛 − 𝜃|≤∈) = 1
𝑛→∞
That means when the n tends to infinity the probability of absolute value of
difference between the population parameter and statistic is equal to unity will be
maximum.
3. Efficiency
If there are several unbiased estimators of given parameter, we usually take the
value with smallest variance. Let 𝑡1 𝑡2 be a two unbiased estimators of parameter 𝜃. 𝑖𝑓 V
(𝑡1 ) < 𝑉(𝑡2 ) then 𝑡1 is more efferent than 𝑡2 .
4. Sufficiency
An estimator is said to be consistent if it provide all information contained in the
sample in respect to estimate the parameter.

Interval estimation

A point estimator is used to produce a single number, hopefully close to unknown


parameter. So there are problems for the point estimator whether the parameter and estimator is
differing to each other. We are therefore interested in finding , for any population parameter , an
interval is called ‘confidence interval’ within which population parameter may be expected to
certain degree of confidence, say 𝛼.

Interval estimation [𝑡1 , 𝑡2 ] is random interval; hence it will be differ in differ sample.
Some of these intervals will contain the true value of parameter some of them will not. This leads
to saying we are 100(1 − 𝛼)% confident that our single interval contain true parameter value.
The [𝑡1 , 𝑡2 ] is called confidence interval or fiducially interval. and 1 − 𝛼 is called confidence
coefficient.

Confidence interval for Mean of normal population N (𝜇, 𝜎)

Case 1. When 𝜎 𝑘𝑛𝑜𝑤𝑛

Let 𝑥 be the men of the random sample of size n drawn from a normal population N (𝜇, 𝜎)
𝑥̅ −𝜇
Then 𝑥 → (𝜇, 𝜎/√𝑛) ∴ 𝑍 = → 𝑁(0,1)
𝜎/√𝑛

From the area property of standard normal distribution we get

𝑃 {|𝑍| ≤ 𝑍𝛼 } = 1 − 𝛼 𝑃{−𝑍𝛼 ≤ 𝑍 ≤ +𝑍𝛼 } = 1 − 𝛼


2 2 2

𝑥̅ −𝜇
i.e., p {−𝑍𝛼 ≤ ≤ +𝑍𝛼 } = 1 − 𝛼
2 𝜎/√𝑛 2

i.e., p {−𝑍𝛼 𝜎/√𝑛 ≤ 𝑥̅ − 𝜇 ≤ +𝑍𝛼 𝜎/√𝑛} = 1 − 𝛼


2 2

i.e., p {−𝑥̅ − 𝑍𝛼 𝜎/√𝑛 ≤ −𝜇 ≤ −𝑥̅ + 𝑍𝛼 𝜎/√𝑛} = 1 − 𝛼


2 2

i.e., p {𝑥̅ + 𝑍𝛼 𝜎/√𝑛 ≤ 𝜇 ≤ 𝑥̅ − 𝑍𝛼 𝜎/√𝑛} = 1 − 𝛼


2 2

i.e., p {𝑥̅ − 𝑍𝛼 𝜎/√𝑛 ≤ 𝜇 ≤ 𝑥̅ + 𝑍𝛼 𝜎/√𝑛} = 1 − 𝛼


2 2

Hence the interval P [𝑥̅ − 𝑍 𝜎/√𝑛 ≤ 𝜇 ≤ 𝑥̅ + 𝑍 𝜎/√𝑛] is called 100(1 − 𝛼)% confidence
𝛼 𝛼
2 2
interval for the mean 𝜇 of a normal population. here 𝑍𝛼 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑡𝑎𝑏𝑙𝑒 showing
2
the area under standard normal curve in such a way that area under the normal curve to its right
𝛼
equal to .
2

Illustration 1

1. If 𝛼 = .005 𝑍𝛼 = 1.96 so 95 percentage confidence interval for 𝜇 𝑖𝑠


2
[𝑥̅ − 1.96𝜎/√𝑛,̅𝑥 + 1.96 𝜎/√𝑛]

2. If 𝛼 = .002 𝑍𝛼 = 2.326 so 98 percentage confidence interval for 𝜇 𝑖𝑠


2

[𝑥̅ − 2.362𝜎/√𝑛,̅𝑥 + 2.362 𝜎/√𝑛]

3. If 𝛼 = .001 𝑍𝛼 = 2.58 so 99 percentage confidence interval for 𝜇 𝑖𝑠


2

[𝑥̅ − 2.58𝜎/√𝑛,̅𝑥 + 2.58 𝜎/√𝑛]

Case 2. When 𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛, n is large (n≥ 30)

The confidence interval for 𝜇 are obtained by replaysing 𝜎 𝑏𝑦 𝑆

From the above discussion we observed that the confidence interval for mean can be constructed
using the expression {𝑥̅ ± 𝑡. 𝑆𝐸} where t denotes the table value. So when 𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 and n
small the 95% confident interval for 𝜇 is{𝑥̅ ± 𝑡. 𝑆/√𝑛 − 1} where table value is obtained by
referring t table for (n-1) df and 𝛼 = .005

Example 1

A random sample of 16 values from a normal population showed a mean of 41.5 inches
and sum of square deviation from the men equal to 135 inches. Obtain 95% and 99% confidence
interval for the population mean.

Here n=16 𝑥̅ = 41.5 ∑(𝑥 − 𝑥̅ )2 = 𝑛𝑠 2 = 135

The 95 percentage confidence interval for 𝜇 is {𝑥̅ ± 𝑡𝛼/2 . 𝑆/√𝑛 − 1}

From the table 𝑡(15,0.5)=2.131 so required confidence interval is

{41.5 ± 2.131 × 3/4} = {39.29, 43.71}

For 99% CI for 𝜇 𝑡(15,0.1)=2.947

∴ 99% 𝐶𝐼 CI for 𝜇 is {41.52.947 × 3/4} = {39.29, 43.71}


Module IV: Testing of Hypothesis

The theory of testing statistical hypothesis concerns a special class of decision problem.
Here many important decision made in the face of uncertainty involve a choice between just two
alternatives. The mangers of a company may to choose between news paper and television for
his advertisement campaign, to compare the difference in mark of two class before and after the
online class etc. statistical techniques can be used to evaluate or quantity such risk and, if
possible , to provide criteria for minimizing the risk of making wrong decision. Such area of
statistics inference is called testing of hypothesis.

Test of hypothesis

If on the supposition that a particular hypothesis is true we find the result observed in a
random sample differ markedly from those expected under the hypothesis on the basis of the
pure chance of using sampling theory , we would say that the observed deference are significant
and would be inclined or reject the hypothesis.

Rule or procedures which enables us to decide whether accept or reject the hypothesis or
determine whether observed sample differ significantly from expected result are called test of
hypothesis or test of significant.

Null Hypothesis

The hypothesis to be tested is usually referred to as the ‘Null Hypothesis’ and it is


denoted as the symbol 𝐻0 . The null hypothesis is a preposition of zero differences. Fisher has
emphasized that every experiment may be said to exit only in order to give the chance of
disproving the null hypothesis. Thus a hypothesis which is set up with the possibility of its being
rejected at some defined probability value is called a null hypothesis. For example , if we want to
show that students of college A have a higher average IQ than students of college B then we
might formulate the hypothesis that there is no difference in IQ of the studence of both colleges

𝐻0 ; 𝜇𝐴 = 𝜇𝐵

Alternative Hypothesis

In the testing process 𝐻0 is either rejected or not rejected. If 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑 it means
that the data on which the test is based do not provide sufficient evidence to cause rejection. But
𝐻0 rejected means that the data on hand are not compatible with null hypothesis 𝐻0 but
compactable with some other hypothesis. This hypothesis known as alternative hypothesis
denoted by𝐻1 . The rejection or not rejection of 𝐻0 is meaningful when it is being tested against
rival𝐻1 .
For the example of IQ of two colleges 𝐻0 ; 𝜇𝐴 = 𝜇𝐵 when we formulate alternative
hypothesis as 𝐻1 ; 𝜇𝐴 > 𝜇𝐵 or𝐻1 ; 𝜇𝐴 < 𝜇𝐵 , But the only choice is to be made before test the
procedure.

Type I and Type II errors

Research requires the testing of hypothesis. In this process two type of wrong inference
can be drawn. These called type I and type II errors.

Rejecting a true null hypothesis 𝐻0 when n it is called type I error or Error of first rank.

Accepting and 𝐻0 a null hypothesis when it is wrong is called type I error or error of second
rank.

State of nature 𝐻0 𝑡𝑟𝑢𝑒 𝐻0 𝑓𝑎𝑙𝑠𝑒

Action
Reject 𝐻0 Type I error No Error
Accept 𝐻0 No Error Type II error

Any test of 𝐻0 will tell us either to accept𝐻0 or reject 𝐻0 based on the observed sample
values. Thus it is not possible to commit both errors simultaneously.

𝛼= P (type I error) =P (rejecting 𝐻0 𝑔𝑖𝑣𝑒𝑛𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)

=P (rejecting 𝐻0 |𝐻0 )

𝛽= P (type II error) =P (accepting 𝐻0 |𝐻1 )

Every test of 𝐻0 has values for the pair (𝛼, 𝛽) assocoated with it. It would be seem ideal
if we would find the test symultaniously minimize both 𝛼 𝑎𝑛𝑑 𝛽 but this not possible.since each
of 𝛼 𝑎𝑛𝑑 𝛽 is probability we know that𝛼 ≥ 0 𝑎𝑛𝑑 𝛽 ≥ 0; that is 0 is the minimum value of
each. No matter what 𝐻0 and 𝐻1 state and what observed value occur in the sample., we could
use the test ; accept 𝐻0 . with this test we could never commit a type 1 error, since we would not
reject 𝐻0 no matter what the sample values were. Thus for this test 𝛼 = 0 𝑤ℎ𝑖𝑐ℎ 𝑖𝑚𝑝𝑙𝑖𝑒𝑠 𝛽=1.
The convese of this test which would always reject 𝐻0 given𝛽 = 0 , 𝛼 = 1.

Test statistics

The testing statistical of hypothesis is explicit of rules for deciding whether for deciding
either to accept or reject the null hypothesis in favor of the alternative hypothesis , consisting
with result obtained from the random sampling taken from the population. As the sample itself is
a set of observations, usually an approximate function of sample observation is chosen and
decision either to accept or reject the hypothesis taken on the value of this function. This
function is called test statistics or test of criterion, in order to distinguish it from an ordinary
descriptive statistic or estimator such as 𝑥 or𝑠 2 .

Critical region

The basis of the testing of hypothesis is the partition of sample space into exclusive
region namely the region of acceptance and region of rejection. If the sample point falls in the
point falls in the rejection point 𝐻0 is rejected. The region of rejection is called critical region.
Thus critical region is the set of those values of test statistics which leads to rejection of the null
hypothesis. Critical region denoted by 𝜔.

Level o Significance

The validity of 𝐻0 against 𝐻1 can be tested at a certain level of significance. The level of
significance is defined as the probability of rejecting the null hypothesis 𝐻0 when it is true or the
probability of type I error. Actually this is the probability of test statistics falling in the critical
region when the hypothesis true. So significance is called size of critical region, size of the test
or producers risk. It is denoted by𝛼. 𝛼 Is usually expressed as a percentage such as 1%, 2%, and
5% so on.

i.e. 𝛼 = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔𝐻0 |𝐻0 ) = 𝑃(𝑥 ∈ 𝜔|𝐻0 )

for instance if the hypothesis accept at 5% level, the statisticians in the long run, will be making
wrong decision 5 out of 100 cases. If the hypothesis is rejected at the same level, he runs the risk
rejecting a true hypothesis about 5% of the time.

Power of a test

The probability of rejecting the null hypothesis 𝐻0 when it is actually not true is called
the power of a test and given by 1-𝛽. Power of a test is also called power of critical region.

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 |𝐻1 𝑖𝑠 𝑡𝑟𝑢𝑒 ) = 1 − 𝑃(𝑎𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0 |𝐻1 )

= 1 − 𝑃(𝑠𝑎𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔𝐻0 |𝐻1 )

= 1 − 𝑃(𝑡𝑦𝑝𝑒 𝑡𝑤𝑜 𝑒𝑟𝑟𝑜𝑟) = 1 − 𝛽

Critical Value

The value of test statistics which separates the critical region and acceptance region is
called the critical value. The critical value is usually referred as 𝑍𝛼 𝑜𝑟 𝑡𝛼 depending on the
sampling distribution of the test statistics and level of significance.
Steps involved in testing of statistical hypothesis

1. State the null hypothesis 𝐻0 and alternative hypotheis 𝐻1


2. Choose the level of significance 𝛼.
3. Determine the test statics.
4. Determine the probability distribution of test statistics
5. Determine best critical region
6. Calculate value of test statistics
7. Decision; if the calculated value of test statics fall in the critical region reject the
null hypothesis 𝐻0 , otherwise accept it.ie; if the calculated value exceeds the table
value reject the 𝐻0 , otherwise accept it.

Parameter test of hypothesis

Population parameter like mean, proportion and the variance are of great in business in
business and economic application. If the pdf of the test statistics is unknown, the test
construction will become meaningless. The statistics test may be grouped into two (a) large
sample tests (b) small sample test. For small sample test exact sampling distribution of test
statics will be known. In large sample test the normal distribution plays a key role and
justification for it, is found in the famous central limit theorem (CLT). That is when sample size
is large most of the statistics are normally are ablest approximately normally distributed. Let Y
be a statistic satisfying the condition of CLT then the statistic given by
𝑌−𝐸(𝑌)
𝑍= → 𝑁(𝑂 1) for large n
√𝑉(𝑌)

𝑌−𝐸(𝑌)
Here √𝑉(𝑌) is called the standard error of Y.𝑍 = 𝑆𝐸 𝑜𝑓 𝑌 → 𝑁(𝑂 1)

If Z is chosen as a test statistic, the critical region for a given significance level can be
determined from the normal table. The test based on normal distribution is called Normal test or
Z test.

Example 1

A sample of 25 items are taken from a population with standard deviation 10and sample
mean found be 65. Can it be regarded as sample from a normal population with𝜇 = 60?

Solution

Given n= 25 𝜎 = 10, 𝑥 = 65 𝜇 = 60

We have test 𝐻0 ; 𝜇 = 60 againest 𝜇 ≠ 60 𝛼 = .005

The bet critical region is |Z|≥ 1.96 here the test statics is
𝑥−𝜇 65−60 25
𝑍= = = = 2.5 ∴ |𝑍| = 2.5 > 1.96
𝜎/√𝑛 10/√25 10

Since the Z lies in the critical region, calculated value is more than table value,𝐻0 rejected. That
is the sample cannot be drawn from a normal population 𝜇 = 60.

Example 2

A sample of 900 members is found to have mean 3.4 cm standard deviation 2.61. Could
it be reasonably regarded as sample from a large population whose mean is 3.25 cm. use two tail
test and 𝛼 = .001.

Solution

Given n = 900 𝑠 = 2.61, 𝑥 = 3.4 𝜇 = 3.25

We have test 𝐻0 ; 𝜇 = 3.25 againest 𝜇 ≠ 3.25 𝛼 = .001

𝑥−𝜇 3.4−3.25 .15×30


𝑍= = =. =1.724 ∴ |Z|=1.724 <2.58
𝑠/√𝑛 2.61√900 2.61

Since the calculated value is less than the table value the null hypothesis is accepted.

The test of comparing equality of means of two populations.

Suppose we want to test the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 against the alternative


hypothesis 𝐻1 : 𝜇1 > 𝜇2 . Or 𝐻1 : 𝜇1 < 𝜇2 𝑜𝑟 𝐻1 : 𝜇1 ≠ 𝜇2 from independent sample size 𝑛1 and 𝑛2
from two populations having the mean 𝜇1 𝑎𝑛𝑑 𝜇2 and the known variance 𝜎1 2 𝑎𝑛𝑑 𝜎2 2 for
significance level 𝛼
𝑥1 −𝑥2
The test statics is 𝑍 = calculate value of Z using sample information and if it lies in
𝜎 2 𝜎 2
√ 1 + 2
𝑛1 𝑛2

critical region reject otherwise accept the null hypothesis.

Example

A random sample of 1000 workers from factory A shows a mean wage of Rs 47 per
week with standard deviation Rs 23. A random sample of 1500 workers from factory B gives
mean wage of 49 per week with standard deviation Rs 30. Is there any significant difference
between these two mean wages?
Solution

Given 𝑛1 = 1000, 𝑛2 = 1500 = 900 𝑥1 = 47, 𝑥2 =49 𝑠1 2 = 23 𝑠2 2 = 30

We have to test 𝐻0 : 𝜇1 = 𝜇2 againest 𝐻1 : 𝜇1 ≠ 𝜇2 at 𝛼 = .05

𝑥1 − 𝑥2
𝑍=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

47 − 49
=𝑍=
2 2
√ 23 + 30
1000 1500

= −2/√. 529 + .600

𝑍 =-1.882, |Z|=1.882 <1.96

Since calculated value is less than table value the null hypothesis is accepted. The two mean
wages are equal.

Small sample tests

The student t test

The test of hypothesis biased on student t test is called t test.

1. The test for population mean

The test of population means using student t distribution. Which have parent population
is normal, the sample must be small n<30 and 𝜎 𝑖𝑠 𝑢𝑛𝑠𝑛𝑜𝑤𝑛.

By testing the men of a normal population, we are testing the significant difference between
sample means and a hypothetical value𝜇.

Here we are testing 𝐻0 : 𝜇 = 𝜇0 againest alternative 𝐻1 : 𝜇 > 𝜇2 or 𝐻1 : 𝜇 ≠ 𝜇2

The test statics is


𝑥−𝜇 1
𝑡= 𝑠 Where 𝑠 = ∑(𝑥𝑖 − 𝑥)2
𝑛
√𝑛−1

For the significant level 𝛼, 𝑡ℎ𝑒 𝑏𝑒𝑠𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑟𝑒𝑔𝑖𝑜𝑛𝑠 𝑎𝑟𝑒 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦.
𝜔 ≡ 𝑡 < −𝑡𝛼, 𝜔 ≡ 𝑡 > 𝑡𝛼, 𝑎𝑛𝑑 𝜔 ≡ |𝑡| ≥ 𝑡𝛼/2

Where 𝑡𝛼, 𝑎𝑛𝑑 𝑡𝛼/2 obtained by referring t table with 𝑛 − 1 𝑑𝑓

Example 1

A sample of 10obsevation giving a mean 38 and SD 4. Can we conclude population mean is 40.

Solution

Assume the population is normal, n=10 𝑥 = 38 s=4 and 𝜇 = 40

𝑥−𝜇
∴𝑡= 𝑠
√𝑛 − 1
38 − 40 6
∴𝑡= = − = −1.5 ∴ |𝑡| = 1.5 < 2.262
4 4
√10 − 1
Since the calculated value is less than table value the null hypothesis is accepted. That is the
population mean is 40.

2. The t test for the equality of two sample means

By testing the equality of testing of two sampling means we mean that the test of
significantly of two sample mean. In other words we are deciding whether two samples are
drawn from normal population having same mean.

Assumption

1. The two population which the samples are drawn should follow a normal population.

2. The sample observation should be independent and random.

3. The size of sample should be small 𝑛1 < 30 𝑛2 < 30

4. The two population variance should be 𝜎1 2 𝑎𝑛𝑑 𝜎2 2 must be equal but unknown.

Here we are testing 𝐻0 : 𝜇1 = 𝜇2 against one of the alternatives 𝐻1 : 𝜇1 > 𝜇2 . Or 𝐻1 : 𝜇1 <


𝜇2 𝑜𝑟 𝐻1 : 𝜇1 ≠ 𝜇2

𝑥1 −𝑥2
The test statistic 𝑡 =
𝑛1 𝑠1 2 +𝑛2𝑠22 1 1
√ ( + )
𝑛1 +𝑛2 −2 𝑛1 𝑛2
Example 1

The mean life of a sample of 10 electrical bulbs was observed to be 1309 hours with SD
420 hours. A second sample of 16 bulbs with mean 1205 hours and sd of 390 hours. Test
significance of two sample means.

Assume that the population drawn from a normal population

Given 𝑛1 = 10 𝑛2 = 16 𝑥1 = 1309, 𝑥2 = 1205 𝑠1 = 420, 𝑠2 = 390

𝑥1 − 𝑥2
𝑡 =
𝑛 𝑠 2 + 𝑛2 𝑠2 2 1 1
√ 1 1 ( + )
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2

1309 − 1205
𝑡 =
10 × 4202 + 16 × 3902 1 1
√ (10 + 16)
10 + 162 − 2

= 0.697 ∴ |𝑡| = 0.697 < 2.064

Since the calculated value is less than the table value null hypothesis accepted. Which
means there is no significant difference between two means.

Example 2

The following are the samples from two indipend normal population. test the hypothesis
that they have same mean assuming the variance are equal taking the significant as 5 %.

Sample 1 14 18 12 9 16 24 20 21 19 17
Sample 2 20 24 18 16 26 25 18

Solution

𝑥1 14 18 12 9 16 24 20 21 19 17 170
𝑥1 − 𝑥1 -3 1 -5 -8 -1 7 3 4 2 0
(𝑥1 − 𝑥1 )2 9 1 25 64 1 49 9 16 4 0 178

∑ 𝑥1 170 ∑ 𝑥2 147
𝑥1 = = = 17, 𝑥2 = = = 21
𝑛1 10 𝑛2 7

(𝑥1 − 𝑥1 )2 = 𝑛1 𝑠1 2 = 178 (𝑥2 − 𝑥2 )2 = 𝑛2 𝑠2 2 =94


𝑥2 20 24 18 16 26 25 18 147
𝑥1 − 𝑥2 -1 3 -3 -5 5 4 -3
(𝑥2 1 9 9 25 25 16 9 94
− 𝑥2 )2

Here we have to test 𝐻1 : 𝜇1 = 𝜇2 against one or two sides of alternatives 𝐻1 : 𝜇1 ≠ 𝜇0 at


𝛼 = .005

Test stistics
𝑥1 − 𝑥2
𝑡 =
𝑛 𝑠 2 + 𝑛2 𝑠2 2 1 1
√ 1 1
𝑛1 + 𝑛2 − 2 (𝑛1 + 𝑛2 )

17 − 21
𝑡 =
√ 178 + 94 ( 1 + 1)
10 + 7 − 2 10 7

= −1.906

𝑠𝑖𝑛𝑐𝑒 𝑐𝑎𝑙𝑢𝑙𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑡ℎ𝑒 𝑡𝑎𝑏𝑎𝑙𝑒 𝑣𝑎𝑙𝑢𝑒 𝑡ℎ𝑒 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑

3. t test for dependent samples (Paired t test)

In the case of some human measures like blood sugar, there should be two kinds of
samples before fasting and after food for testing the treatment procedure. Here the difference is
the two samples are drawn same subject.

We may wish to test null hypothesis 𝐻1 : 𝜇 = 𝜇0 against one or two sides of alternatives

𝐻1 : 𝜇 > 𝜇0 . Or 𝐻1 : 𝜇 ≠ 𝜇0

ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑠𝑡𝑖𝑐𝑠 𝑖𝑠 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑑, 𝑑 𝑖 = 𝑥 𝑖 − 𝑌𝑖

𝑥 𝑖, 𝑌 𝑖 Have the same normal distributions.

𝑑
𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑠𝑡𝑠𝑖𝑡𝑐𝑠 𝑡 =
𝑠𝑑/√𝑛 − 1

∑ 𝑑𝑖 1
Where𝑑 = , SD=√ ∑ 𝑑𝑖 2 − 𝑑𝑖 2 , 𝑑 𝑖 = 𝑥 𝑖 − 𝑌𝑖 , 𝑖 = 1,2,3 … ..
𝑛 𝑛
Example 1

To test the efficiency of sleeping pills, a drug company uses sample of insomniacs. The
time in minits falling asleep is observed for for each them. Few days later the same persons are
given sleeping pills and time until failing the sleep measured. Test the pills are effective.

Person A B C D E
No pills (𝑥𝑖 ) 65 35 80 40 50
With pill (𝑦𝑖 ) 45 15 61 31 20

Solution

We are testing 𝐻1 : 𝜇 = 0 against one or two sides of alternatives 𝐻1 : 𝜇 ≠ 0

From the t table 𝛼 = .005 𝑎𝑡 9 𝑑𝑓 = 2.262

𝑑
The test statistics is 𝑡 =
𝑠𝑑/√𝑛−1

𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖

Person A B C D E
No pills (𝑥𝑖 ) 65 35 80 40 50
With pill 45 15 61 31 20
(𝑦𝑖 )
𝑥𝑖 − 𝑦𝑖 20 20 19 9 30 ∑ 𝑑𝑖
= 98
400 400 361 81 900 ∑ 𝑑𝑖 2
(𝑥𝑖 − 𝑦𝑖 )2
= 2142

98 𝑑𝑖 2 2 2142
𝑑 = ∑ 𝑑𝑖 /5 = = 19.6 𝑠𝑑 2 =∑ −𝑑 = − (19.6)2=44.24
5 5 5

19.6
𝑡= = 5.89
6.65/√4

|t|=5.89>2.775

Since the calculated value is greater than table value the null hypothesis is rejected. That is the
slipping pills is not effective
Non parametric test

Contrary to parametric test non parametric test do not make any distortive
assumption about the shape of the population distribution. The only requirement for the test is
population distribution should be continuous. This is known as distribution free or non
parametric test.

Advantages

1. They do not require the assumption the population distributed in normal curve or their
specified shape.

2. Generally easily to understand

3. Some of the event even formal ordering is not required.

Limitation

1. They ignore the certain amount of information

2. They are not efficient or sharp as parametric test

• Chi square test


An important non parametric test is chi-square test. 𝜒 2 Test is used to draw the
inference about the population dispersion mainly the variance. The important application
of 𝜒 2 test are.
1. To test the variance of a normal population (parametric test)
2. To test the goodness of fit (non parametric test)
3. To test the independence of attribute (non parametric test)

1. 𝜒 2 Test for population variance.

This is parametric test. This is conducted in when we want to test if the given normal
population has a specified variance.𝜎 2 =𝜎0 2 . The chi square tests for variance
generally a right tailed test.
So we are testing 𝐻0 = 𝜎 2 =𝜎0 2 𝑎𝑔𝑎𝑖𝑛𝑒𝑠𝑡 𝐻1 = 𝜎 2 > 𝜎0 2
𝑛𝑠2
The test statistics is 𝜒 2 =
𝜎0 2
For 𝛼 = .005 𝑖𝑠 the best critical region 𝜔 ≡ 𝜒 2 > 𝜒𝛼 2 is obtained by referring the chi
square table for n-1 d.f and probability α. Calculated value exceeds the table value
reject the null hypothesis otherwise accept it.
Example 1
A manufacturing process expected to produce goods with specified weight
with variance less than 5 units. A random sample of 10 was found to have variance
6.2 units. Is there reason to suspect the process variance has increased.

Solution
Given 𝜎0 2 = 5 𝑛 = 10 𝑠 2 = 6.2

We are testing 𝐻0 = 𝜎 2 =5 𝑎𝑔𝑎𝑖𝑛𝑒𝑠𝑡 𝐻1 = 𝜎 2 > 5 𝑔𝑖𝑣𝑒𝑛 𝛼 = .005


2 𝑛𝑠2
The test statistics is 𝜒 =
𝜎0 2

10 × 6.2
𝜒2 = = 12.4
5
The table value of chi square with 9 df = 16.92
Since calculated value is less than the table value we can accept the null hypothesis.
That is there is no reason to suspect the process variance has increased.

2. Test of goodness of fit

The application for 𝜒 2 distribution is the test the goodness of fit. Here we shall
find a test to determine the if a population specified theoretical distribution, in other
word we are testing the discrepancy between observed frequencies and corresponding
expected or theoretical frequencies.

Consider a set of possible events 1, 2, 3….n arranged in n classes or cells. Let


these events occur with frequencies 𝑂1 , O2 , … On called the observed frequencies. Let them
expected to occur –using the concept of probability with frequencies 𝐸1 , E2, … . En called
expected frequencies.

A measure of discrepancy existing between observed an expected frequencies can be


(𝑂𝑖 −𝐸𝑖 )2
found using the test statistic 𝜒 2 and is given by 𝜒 2 = ∑𝑛𝑖=1 if N= toatal
𝐸𝑖
frequencies, then we must have ∑𝑛𝑖=1 𝑂𝑖 = ∑𝑛𝑖=1 𝐸𝑖 .

In this test we assume there is no discrepancy between observed frequencies and


the corresponding expected frequencies.
(𝑂𝑖 −𝐸𝑖 )2
The test stststics is 𝜒 2 =∑
𝐸𝑖
The table value of 𝜒 2 is obtained by referring the 𝜒 2 table for (n-r-1) df and probability𝛼.
Here n is the number of cells or classes; r denotes the number of restrictions, if any.
Calculated𝜒 2 . If the table value exceeds the calculated value rejected the null hypothesis
otherwise accept it.

Example 1

For coins are tossed 80 times. The distribution of number of heads is given below.

Number of 0 1 2 3 4 Total
heads
Frequencies 4 20 32 18 6 80

Solution

Let X is a number of head obtained when four coins are tossed. Then

X→ (4, 1/2). If the coin is unbiased. Therefore the expected frequencies are given by

F(x) = 𝑛𝐶𝑥 𝑃𝑥 𝑞𝑛−𝑥 .𝑁, 𝑋=0,1,2,3,4

∴ 𝑓(0) = 4𝐶0 (1/2)0 (1/2)4−0 .80= 5

𝑓(1) = 4𝐶1 (1/2)1 (1/2)4−1 .80= 20

𝑓(2) = 4𝐶2 (1/2)2 (1/2)4−2 .80= 30

𝑓(3) = 4𝐶3 (1/2)3 (1/2)4−3 .80= 20

𝑓(4) = 4𝐶4 (1/2)4 (1/2)4−4 .80= 5

Number of Number of classes (𝑂𝑖 − 𝐸𝑖 ) (𝑂𝑖 − 𝐸𝑖 )2 (𝑂𝑖 − 𝐸𝑖 )2


heads (x) Observed Expected 𝐸𝑖
frequencies frequencies
𝑂𝑖 𝐸𝑖
0 4 5 -1 1 1/5 = .20
1 20 20 0 0 0/20=0
2 32 30 -2 4 4/30=.13
3 18 20 -2 4 4/20=.20
4 6 5 -1 1 1/5=.20
𝑛=5 ∑ 𝑂𝑖 = 80 ∑ 𝐸𝑖 = 80 .73

∴ 𝜒 2 = .73 the table value of chi square at 𝛼 = .001 with 5-1= 4 df is 13.28.
Since the calculated value is less than table value the null hypothesis is accepted. That is
there is no significant difference between observed and theoretical frequencies.

3. The 𝜒 2 for the independence.


We shall now apply the chi square test for testing independence of two attributes
having to or more levels or categories whose frequencies are expressed by means of a
contingency table having r rows and c colleums. Here we assume that the attributes
(𝑂𝑖 −𝐸𝑖 )2
are independent. The test stsistics is 𝜒 2 =∑ , the table value is chi square given
𝐸𝑖
by referring chi square table for (r-) (c-) df at given significant level. Calculate chi
square, if the chi square exceeds the table value reject the null hypothesis otherwise
accept it.

Example
From the following table showing number of plants having certain characteristics. Test
the hypothesis that the color is independent of flatness of laves.
Laves flat curled Total

Flowers
White flower 99 36 135
Red flower 20 5 25
total 119 41 160

Solution

On the hypothesis that the flower color is independent of the flatness of leaves. The
expected frequencies having white flowers and lat laves = (135 X 119)/160=100 approximate.
The other expected frequencies follow from the fact that the border frequencies for the row and
column remain unchanged.

Leaves flat curled Total

Flowers
White flower 100 135-100=35 135
Red flower 119-100=19 25-19=6 25
total 119 41 160

(𝑂𝑖 −𝐸𝑖 )2 (99−100)2 (36−35)2 (20−19)2 (5−6)2


Hence 𝜒 2 =∑ = + + + =.249
𝐸𝑖 100 35 19 6

The number of df = (2-1) (2-1) =1 the 5% level value of chi square at 1 df =3.841.
Since the calculated values are less than table value the null hypothesis is accepted. It follows
flower color is independent of flatness of the laves.
Module V: Analysis of Variance

The techniques of Analysis of Variance were first derived by Sir.R.Fisher. In ANOVA


testing we are dealing with comparing two population means. When more than two experimental
or fields samples are to be compared, more general method are required. Analysis of Variance
(ANOVA) is used when simultaneous comparisons are made of measurements from more than
two samples. When samples are influenced by several kinds of effect offering simultaneously,
this statistical technique is adopted to decide which effects are important, and to estimate such
effect.
The ANOVA is a procedure, which separates the variation assignable to one set of causes
from variation assignable to another set of cause. For example, four varieties of wheat are sown
in a plots and their yield per acre was recorded. We wish to test the null hypothesis that the four
varieties of wheat have same yield. This can be done by taking all possible pairs of means and
testing significance of their difference. But the variation in yield is due to difference is in wheat
seeds, fertilizers of the soil of the various plots and the different kinds of fritzes used. Where
interesting to test variation in yields is due to differences in varieties, difference is due to fritzes
or difference is in both. In an Anova we can estimate contributing made by each of the factor to
the total variation. Now we can split the total variation into two component (i) variation between
the samples (ii) variation within the sample. If the variance between the sample means
significantly greater the variance with in the sample, it can be concluded that sample drawn from
different population. However the total variation may be due to experimental error and variation
due to the differences in of wheat or may be used the experimental error and the variation due to
the fertilizer used.
When the data classify on the basis of variety of wheat alone it is known as one way
classification. On the other hand when we classify observation on the basis of variety of wheat
and type of fertilizer it is called two way classifications.
1. One way classification
2. Two way classification
The technique of analyzing the variance in case of one factor and two factors is similar.
In the case of one way ANOVA the total variance is divide into parts namely; variance between
samples and (ii) variance with in the samples. The variance with in the sample is also known as
the residual variance. in the case of two factor analysis the total variance divide into three parts
namely (i) variance due to factor one (ii) variance due to factor two and (iii) residual variance.
However, irrespective of three type of classification the analysis of variance is a
technique of portioning the total sum of squared deviation of all the sample values from grand
mean and is divided into two parts.
1. Sum of square between samples and
2. Sum of square with in the samples

Assumption of ANOVA
1. Population from which samples have been drawn are normally distributed
2. Population from which the samples are drawn has same variance.
3. The observation, in the sample are normally selected from the population
4. The observation is non correlated random variables
5. Any observation is the sum of the effects of the effects of the population influencing it.
6. The random error is normally distributed with mean 0 and Common variance 𝜎 2 .
One way classification
The term one factor analysis of variance refers to the fact that a single variable of the
factor of the interest is controlled and its effect on the elementary units is observed. In other
words, in one way classification the data are classified according to only one criterion. Suppose
we have independent sample 𝑛1 , 𝑛2 , … … . 𝑛𝑘 observation from k populations. The means are
denoted but𝜇1 , 𝜇2, … … 𝜇𝑘 . The one way analysisof varience is designed to test the null
hypothesis
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘
The arithmetic mean of the population from which the k samples randomly drawn are equal to
one another.

Steps involved in caring out the analysis,

1. Calculated variance between samples


Sum of squares is a measure of variance. The sum of square between samples is
dented by SSC. For calculating variance between samples, we take the total of the squares of the
variations of the mean of varies samples from the grand average and divide this total by degrees
of freedom. That is

• calculate mean from each sample 𝑋1, 𝑋2 … … 𝑋𝑘


• Calculate grand average𝑋.

𝑋1 +,𝑋2 +⋯…+𝑋𝑘
𝑋=
𝑁1 +𝑁2 +⋯..+𝑁𝑘

• Take the difference between the means of varies samples and the grand average.
• square these deviation and obtain the total will give the sum of squares between samples,
and divide the total step by step the degrees of freedom. The df will be one less than the
number of samples.

2. Calculate the variance with in the samples


The variance (sum of squares) within the samples measures those inter sample difference
arise due to chance only. It denoted by SSE. For calculating the variance with in the samples
we take total of the sum of square of deviation of various items from the mean values of
respective samples and divide this by df.
• Calculate the mean value of each sample 𝑋1, 𝑋2 … … 𝑋𝑘
• Take deviation of the various items in a sample from the mean values of respective
samples.
• Square this deviation and obtain the total which gives the sum of squares with in the
samples,
• Divide this total obtained in the step (c) by df.

3. Calculate F ratio
Calculate F ratio

𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑚𝑝𝑙𝑒𝑠 𝑆1 2


𝐹∗ = 𝑜𝑟 𝐹 = 2
𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑆2

4. Compare the calculated value with F

Compare the calculated value with F within the table value of F given df at certain critical
level. If the calculated value greater than table value, it indicates the difference in sample means
are significant.

Example

As a head of department of consumer’s research organisation, you have to responsibility


for testing and comparing the lifetime of light bulb for 4 brands. Suppose you have test the life
time of 3 bulbs of each of the 4 brands. Your test data are shown below. Each entry shows the
life time of bulb, measure in hundred hours.

A B C D
20 25 24 23
19 23 20 20
21 21 22 20

Test the mean value of four brands are equal

Solution

The null hypothesis is that the average life time of 4 brands are equal

𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4

Let𝑋1, 𝑋2 , 𝑋3 , 𝑋4 , denotes the mean life time of the 4 brands A, B, C and D

𝑋1 𝑋2 𝑋3 𝑋4
20 25 24 23
19 23 20 20
21 21 22 20
𝑋1 =20 𝑋2 =23 𝑋3 =22 𝑋4 =21
𝑋1 +, 𝑋2 + 𝑋3 + 𝑋4 20 + 23 + 22 + 21 86
𝑋= = = = 21.5
4 4 4
The variance between samples

𝑋 𝑋 (𝑋 − 𝑋) (𝑋 − 𝑋)2
20 21.5 -1.5 2.25
23 21.5 1.5 2.25
22 21.5 .50 .25
21 21.5 -.50 .25
∑ 𝑋 − 𝑋)2 = 5

∑ 𝑋 − 𝑋)2 5
𝑆𝑋 2 = =
(𝑛 − 1) 3

5
𝑛𝜎𝑋 2 = 3 × =5
3
Here n denotes the sample size and not the number of samples. Therefore out of first estimate of
population variation is based on variance between sample means and is given by 𝑠1 2 = 5.

A B C D
X (𝑋 − 𝑋)2 X (𝑋 − 𝑋)2 X (𝑋 − 𝑋)2 X (𝑋 − 𝑋)2
20 0 25 4 24 4 23 4
19 1 23 0 20 4 20 1
21 1 21 4 22 0 20 1
60 2 69 8 66 8 63 6

(𝑋−𝑋)2 2
For brand A, 𝑋 = 20 sample varience 𝑆1 2 = = =1
𝑛−1 2

2 (𝑋−𝑋)2 8
For brand B, 𝑋 = 23 sample varience 𝑆2 = 𝑛−1
= =4
2

(𝑋−𝑋)2 8
For brand C, 𝑋 = 22 sample varience 𝑆3 2 = = =4
𝑛−1 2

(𝑋−𝑋)2 6
For brand D, 𝑋 = 21 sample varience 𝑆4 2 = = =3
𝑛−1 2

Therefore pooled 𝑆 2 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛

𝑆1 2 + 𝑆2 2 + 𝑆3 2 + 𝑆4 2 1 + 4 + 4 + 3
𝑆2 = = =3
4 3
Thus the second estimate of the population variation based on within the sample is given by
𝑆2 2 =3

𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑆1 2 5


F= = = = 1.67
𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑤𝑖𝑡ℎ 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑆2 2 3

From the f table F value (3, 8) d.f and 5% level significant level 4.07. Since the computed value
lass than table value null hypothesis accepted. Hence difference is significant and we can infer
the average life time of different brands of blues is equal.

The ANOVA table

Since there are several steps involved in the computation both the between and within the
sample variance. Entire set of result may be organised into an analysis of variance table.

Source of Sum of squares Degrees of Mean Sum of Variance Ratio F


variance SS freedom Square MS
Between sample SSC c-1 𝑆𝑆𝐶
𝑀𝑆𝐶 = 𝑀𝑆𝐶
𝐶−1 𝐹=
Within Samples SSE n-c 𝑆𝑆𝐸 𝑀𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑐
SST n-1

Examples

Below is the yield (in Kg) per acre for 5 trail plots of 4 varieties of treatment

Treatment

Plot No. 1 2 3 4
1 42 48 68 80
2 50 66 52 94
3 62 68 76 78
4 34 78 64 82
5 52 70 70 66

Carry out an analysis of variance and state you conclusion

Solution

I(𝑋1 ) II(𝑋2) III(𝑋3) IV(𝑋4)


42 48 68 80
50 66 52 94
62 68 76 78
34 78 64 82
52 70 70 66
240 330 330 400

𝑇 = 𝑆𝑢𝑚𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 42 + 50 + ⋯ . +66 = 1300


𝑇2 13002
= =84500
𝑛 20

𝑇2
𝑆𝑆𝑇 = 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑟𝑒𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑣𝑎𝑡𝑖𝑜𝑛 =
𝑛
= (422 + 502 + ⋯ . +662 ) − 84500 = 4236

(∑ 𝑥1 )2 (∑ 𝑥2 )2 (∑ 𝑥3 )2 (∑ 𝑥4 )2 𝑇 2 2402 3302 3302 4002


𝑆𝑆𝐸 = + + + − = + + + − 84500
𝑛1 𝑛2 𝑛3 𝑛4 𝑛 5 5 5 5
= 2580

𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐶 = 4236 − 2580 = 1656

𝑆𝑆𝑆 2580 𝑆𝑆𝐶 1656


𝑀𝑆𝐶 = = = 860, 𝑀𝑆𝐸 = = = 103.5
𝐶−1 3 𝑛 − 𝑐 20 − 4
The degrees of freedom =(C-1, n-c) = (3, 16)

Source of Sum of squares Degrees of Mean Sum of Variance Ratio F


variance SS freedom Square MS
Between sample SSC=2580 c-1=3 𝑆𝑆𝐶
𝑀𝑆𝐶 = =860
𝐶−1 860
𝑀𝑆𝐸 = 𝐹= = 8.3
Within Samples SSE=1656 n-c=16 𝑆𝑆𝐸 103.5
=103.5
𝑛−𝑐

SST=4236 n-1=19

The table value of F at 5% level of significance for (3,16)df=3.24

Since the calculated value is above the table value the null hypothesis is rejected. The treatment
has not same effect.
II) TWO WAY CLASSIFICATION

In one factor analysis of variance explained above the treatment constitute different level
of single factor which controlled in one experiment. However many situation in which the
response variable of interest of interest may be affected by more than one factor. For example,
sale maxfactor cosmetics, in addition to being affected by the point of sale display, might be
effected the price charged, the size and location of the firm, number of competitor product in the
store etc.

When it believed that two independent factor might have an effect on the response
variable in interest, it is possible to design the test so that an analysis of variance can be used to
test so that for the for the effect of two factors simultaneously. Such test is called Two Factor
Analysis of variance.

Thus with two factor analysis of variance we can two sets of hypothesis with same data at
same time. We can plan an experiment in such a way as to study the effect of two factors in same
experiment. For each factor there will be a number of classes or levels. The procedure of
Analysis of variance is somewhat different than the one way ANOVA. The steps are follows

• (a) Assume that all means of columns are equal. Choose the hypothesis as
𝐻0 =𝛼1 = 𝛼2 = ⋯ … . = 𝛼𝑐
(b) Assume that the means of all rows are equal. Choose the hypothesis as
𝐻0 =𝛽1 = 𝛽2 = ⋯ … . = 𝛽𝑟
• Find T, sum of all observations
𝑇2
• Calculate SST = Sum of squares of all observations , 𝑛
(∑ 𝑥1 )2 (∑ 𝑥2 )2 𝑇2
• Find 𝑆𝑆𝑅 = + + ⋯− where ∑ 𝑥1 , ∑ 𝑥2 … … are row totals.
𝑛1 𝑛2 𝑛
(∑ 𝑥1 )2 (∑ 𝑥2 )2 𝑇2
• Find 𝑆𝑆𝑅 = + + ⋯− where ∑ 𝑥1 , ∑ 𝑥2 … … are column totals.
𝑛1 𝑛2 𝑛
• 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐶 − 𝑆𝑆𝑅
𝑆𝑆𝐶 𝑆𝑆𝑅 𝑆𝑆𝐸
• 𝑀𝑆𝐸 = , MSR= , (𝐶−1)(𝑟−1)
where C= number of column, r= number of
𝑐−1 𝑟−1
columns.
𝑀𝑆𝐶 𝑀𝑆𝑅
• 𝐹𝑐 = 𝑎𝑛𝑑 𝐹𝑟 =
𝑀𝑆𝐸 𝑀𝑆𝐶
• Determine table value, if the calculated value is higher than table value rejected the null
hypothesis.
Source of Sum of Squares Degrees of Mean square Variance Ratio
variation SS freedom MS F
Between SSC c-1 MSC 𝑀𝑆𝐶
𝐹𝑐 =
column 𝑀𝑆𝐸

Within the row SSR r-1 MSR 𝑀𝑆𝑅


𝐹𝑟 =
𝑀𝑆𝐸

Residual SSE (c-1)(r-1) MSE


Within the SST n-1
sample

Example
The following represents the number of units of production per day turned out by
4 different workers using 5 different types of machine.
Worker A B C D E Total
1 4 5 3 7 6 25
2 6 8 6 5 4 29
3 7 6 7 8 8 36
4 3 5 4 8 2 22
total 20 24 20 28 20 112

On the basis of information can you conclude that?


(i) The mean productivity is same for different machine
(ii) The workers don’t different regard productivity

Solution
Let us take the hypothesis (i) mean productivity of machines are same
That is 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 = 𝜇5 (ii) works don’t have difference productivity.
𝑇2 1122
Correction factor= = = 627.2
𝑛 20
Sum of sores between machines
202 242 202 282 202
𝑆𝑆𝐶 = + + + + − 𝐶𝐹
4 4 4 4 4

= 100 + 144 + 100 + 196 + 100 − 672.2 = 640 − 627.2 = 12.8

𝑑𝑓 = 𝑣 = 𝑐 − 1 = 5 − 1 = 4

Sum of square between workers


252 292 362 222
𝑆𝑆𝑅 = + + + − 𝑐𝑓
5 5 5 5
= 125 + 168.2 + 259.2 + 98.8 − 627. = 649.2 − 627.2 = 22

𝑑𝑓 = 𝑣 = (𝑟 − 1) = 4 − 1 = 3

Total sum of square =

SST=42 + 62 + 72 + 32 + 52 + 82 + 62 + 52 + 32 + 62 + 72 + 42 +
8 + 8 + 62 + 42 + 82 + 22 − 627.2 = 64.8
2 2

𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 = 64.8 − 12.5 − 22.0 = 30

𝑑𝑓 = 𝑣 = (𝑐 − 1)(𝑟 − 1) = (5 − 1)(4 − 1) = 12

Source of Sum of Degrees Mean Variance Ratio F


variation Squares SS of square MS
freedom
Between SSC=12.8 c-1=4 MSC=3.20 𝑀𝑆𝐶
𝐹𝑐 = =3.20/2.5=1.28
𝑀𝑆𝐸
column
𝑀𝑆𝑅
Within the SSR=22.0 r-1=3 MSR=7.33 𝐹𝑟 = =7.33/2.5=2.93
𝑀𝑆𝐸
row

SSE=30.0 (c-1)(r- MSE=2.50


Residual 1)=12
Within the SST=64.8 n-1=19
sample

For (4, 12) df the table value of F at 5% level of significance 3.26. The calculated value of 𝐹𝐶 is
less than the table value. Our hypothesis is true. There is no significant difference in the
productivity of machine.

For the table value of 𝐹𝑟 (2, 12) df at 5 percentage level is 3.49. The calculate value is less than
the table value. Our hypothesis is true. There is no difference is in the productivity of workers.
References
• Taro Yamane, Statistics: An Introductory Analysis, Harper & Row, Edition 3, 1973.
• Hoel PG: Introduction to Mathematical Statistics, John Wiley & Sons, Edition 4, 1971
• YP Agarwal: Statistical Methods: Concepts, Application and Computation, Sterling
• Publishers 1986
• Sidney Siegal, N. John Castellan: Non parametric Statistics for Behaviour Sciences,
• Edition 2, 1988, Mc Graw-Hill
• Tulsian, P.C and Vishal Pandey: Quantitative Techniques, Pearson Education, New Delhi
• 28
• S.P. Gupta: Statistical Methods, Sulthan Chand and Sons, New Delhi.
• Hooda R.P: Statistics for Business and Economics , Mac Million, New Delhi
• Alpha C Chiang: Fundamental Methods of Mathematical Economics, 2nd Ed. -Inter
• National Student Edition, Mc Grawhill
• Edward T Dowling: Introduction to Mathematical Economics, Third Edition, Shaum’s
• Outlines, Tata Mc Grawhill Publishing Co. Ltd, New Delhi.
• Sreenath Baruah: Basic Mathematics and its applications in Economics, Macmillan India
• Ltd.
• Joseph K.X, Quantitative Techniques, CUCCS Ltd, Calicut University

𝒁 𝑻𝒂𝒃𝒍𝒆
𝑇 𝑇𝑎𝑏𝑙𝑒
F Table
Chi squre table

You might also like