0% found this document useful (0 votes)

13 views

Module Wise Important Formulae

Uploaded by

Aisha Yahya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Module Wise Important Formulae

Uploaded by

Aisha Yahya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Module 1

Introduction to Statistics & Data Analysis

Measures of Central Tendency:

1. Mean:

 Arithmetic mean of a set of observations is their sum divided by number of

observations, for example, the arithmetic mean 𝑥̅ of n observations
𝑥1, 𝑥2 , 𝑥3 , … , 𝑥𝑛 is given by:

𝒏
𝟏 𝟏
̅ = (𝒙𝟏 +𝒙𝟐 + 𝒙𝟑 + ⋯ + 𝒙𝒏 ) = ∑ 𝒙𝒊
𝒙
𝒏 𝒏
𝒊=𝟏

 In case the frequency distribution 𝑓𝑖 , 𝑖 = 1,2,3, … , 𝑛, where 𝑓𝑖 is the

frequency of the variable 𝑥𝑖 ,
𝑓1 𝑥1 +𝑓2 𝑥2 +𝑓3 𝑥3 +⋯+𝑓𝑛 𝑥𝑛 1
𝑥̅ = 𝑓1 +𝑓2 +𝑓2 +⋯+𝑓𝑛
= 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 , where 𝑁 = ∑𝑛𝑖=1 𝑓𝑖

 In case of grouped or continuous frequency distribution, the arithmetic is

𝑛
h
𝑥̅ = 𝐴 + ∑ 𝑓𝑖 𝑑𝑖
𝑁
𝑖=1

2. Median:

 In case of ungrouped data, if the number of observations is odd then median is

the middle value after the values have been arranged in ascending or descending
order of magnitude.
 In case of even number of observations, there are two middle terms are median
is obtained by taking the arithmetic mean of the middle terms.
 In the case of continuous frequency distribution, the class corresponding to the
𝑁
c.f. just greater than 2 is called the median class and the value of median is
obtained by the following formula:
h 𝑁
Median = 𝑙 + ( − 𝑐)
𝑓 2
Where 𝑙 is the lower limit of the median class
𝑓 is the frequency of the median class
h is the magnitude of the median class
𝑐 is the c.f. of the class preceding the median class
𝑁 = ∑𝑓
3. Geometric mean:

 The geometric mean, usually abbreviated as G.M. of a set of 𝑛 observations is

the 𝑛𝑡h root of their product. Thus, if 𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 are the 𝑛
observations then their G.M. is given by
1
𝑛
𝐺. 𝑀 = √𝑋1 × 𝑋2 × 𝑋3 × … × 𝑋𝑛 = (𝑋1 × 𝑋2 × 𝑋3 × … × 𝑋𝑛 )𝑛

If 𝑛 = 2 i.e., if we take two observations, then 𝐺. 𝑀 = √𝑋1 × 𝑋2

 The logarithm of the G.M of a set of observations is the arithmetic mean of

their logarithms.
1
𝐺. 𝑀 = Antilog ( ∑ 𝑙𝑜𝑔𝑋)
𝑛

4. Harmonic Mean:

 If 𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 is a given 𝑛 set of observations, then their harmonic

mean, abbreviated as H.M.
1
𝐻=
1 1 1 1 1
+ + + ⋯+ 𝑋 ]
𝑛 [𝑋1 𝑋2 𝑋3 𝑛

1
𝐻=
1 1
∑( )
𝑛 𝑋

 In case of frequency distribution, we have

1 1 𝑓1 𝑓2 𝑓𝑛
= ( + + ⋯+ )
𝐻 𝑁 𝑋1 𝑋2 𝑋𝑛

Where, 𝑁 = ∑ 𝑓
𝑋= mid-value of the variable or mid-value of the class

𝑓= frequency of 𝑋
Measures of variability or Dispersion:

1. Range:
Range is the difference between the greatest (maximum) and the smallest (minimum)
observation of the distribution.
𝑅𝑎𝑛𝑔𝑒 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛

2. Quartile deviation:

It is a measure of dispersion based on the upper quartile 𝑄3 and the lower quartile 𝑄1.

Q 3 − Q1
Quartile deviation (Q. D) =
2

𝐡 𝐍∗𝐢
Where, 𝐐𝐢 = 𝒍 + 𝐟 ( − 𝐜) , 𝐟𝐨𝐫 𝐢 = 𝟏, 𝟐, 𝟑
𝟒

Q1 = first quartile deviation

Q 2 = second quartile deviation
Q 3 = third quartile deviation

3. Mean Deviation (or) Absolute mean deviation:

For ungrouped data or raw data:

1
𝑀. 𝐷 = ∑|𝑥𝑖 − 𝑥̅ |
𝑛
𝑖

For frequency distribution:

1
𝑀. 𝐷 = ∑ 𝑓𝑖 |𝑥𝑖 − 𝑥̅ |
𝑁
𝑖

4. Standard Deviation:
1 1
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = ∑(𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖 2 − 𝑥̅ 2
𝑛 𝑛

1 1
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑆. 𝐷 = 𝜎 = √𝑛 ∑(𝑥𝑖 − 𝑥̅ )2 (or) √𝑛 ∑ 𝑥𝑖 2 − 𝑥̅ 2

𝑄3 −𝑄1
Coefficient of dispersion = 𝑄3 +𝑄1
𝜎
Coefficient of variation= 𝐶. 𝑉 = 100 × 𝑥̅
Skewness:
 Mean(𝑀), median 𝑀𝑑 and mode 𝑀0 fall at different points i.e., 𝑀𝑒𝑎𝑛 ≠ 𝑀𝑒𝑑𝑖𝑎𝑛 ≠ 𝑀𝑜𝑑𝑒
 Quartiles are not equidistant from median
 The curve drawn with the help of the given data is not symmetrical but stretched more
to one side than to the other.

Measures of Skewness:

Various measures of Skewness (𝑆𝑘 ) are:

 𝑆𝑘 = 𝑀 − 𝑀𝑑
 𝑆𝑘 = 𝑀 − 𝑀0
 𝑆𝑘 = (𝑄3 − 𝑀𝑑 ) − (𝑀𝑑 − 𝑄1 )

These are the absolute measures of Skewness.

1. Prof. Karl Pearson’s Coefficient of Skewness:

𝑀−𝑀0
𝑆𝑘 = , 𝜎 is the standard deviation of the distribution.
𝜎
If mode is ill-defined, then using the empirical relation, 𝑀0 = 3𝑀𝑑 − 2𝑀, for a
moderately asymmetrical distribution, we get

3(𝑀 − 𝑀𝑑 )
𝑆𝑘 =
𝜎
𝑆𝑘 = 0, if 𝑀 = 𝑀0 = 𝑀𝑑 . Hence for a symmetrical distribution all are coincide.

2. Prof. Bowley’s Coefficient of Skewness:

(𝑄3 − 𝑀𝑑 ) − (𝑀𝑑 − 𝑄1 ) 𝑄3 + 𝑄1 − 2𝑀𝑑

𝑆𝑘 = =
(𝑄3 − 𝑀𝑑 ) + (𝑀𝑑 − 𝑄1 ) 𝑄3 − 𝑄1

𝑆𝑘 = 0, if 𝑄3 − 𝑀𝑑 = 𝑀𝑑 − 𝑄1 . Hence for a symmetrical distribution. Median is

equidistant from the upper and lower quartiles.

3. Based upon the moments, coefficient of skewness:

√𝛽1 (𝛽2 + 3)
𝑆𝑘 =
2(5𝛽2 − 6𝛽1 − 9)
𝑆𝑘 = 0, if either 𝛽1 = 0 or 𝛽2 = −3
Kurtosis:

 Prof. Karl Pearson’s calls as the ‘convexity of the frequency curve’ or Kurtosis.
 Kurtosis enables the flatness or peakedness of the frequency curve.
 It is measured by the coefficient 𝛽2 or its derivation is given by
𝜇
𝛽2 = 𝜇 42 , 𝛾2 = 𝛽2 − 3
2

A: Leptokurtic Curve (which is more peaked than the normal curve𝛽2 > 3 𝑖. 𝑒. 𝛾2 > 0 )
B: Normal Curve or Mesokurtic Curve (which is neither flat nor peaked

𝛽2 = 3 𝑖. 𝑒. 𝛾2 = 0 )

C: Platykurtic Curve (which is flatter than the normal curve 𝛽2 < 3 𝑖. 𝑒. 𝛾2 < 0)
Module 2
Probability

 Probability:

If a random experiment or a trial results ‘n’ exhaustive, mutually exclusive and equally
likely outcomes, out of which ‘m’ are favourable to the to the occurrence of event E,
then the probability ‘p’ of occurrence or happening of E, usually denoted by P(E), is
given by

𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠 𝑚

𝑝 = 𝑃(𝐸) = =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑒𝑥h𝑎𝑢𝑠𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠 𝑛

 Conditional Probability:

Let 𝑆 be the sample space of a random experiment. Let 𝐶1 ⊂ 𝑆, further let 𝐶2 ⊂ 𝐶1 , then
the conditional event 𝐶1 has already occurred, denoted by 𝑃(𝐶2 /𝐶1 ) is defined as

𝑃(𝐶2 ∩ 𝐶1 )
𝑃(𝐶2 /𝐶1 ) = , 𝑖𝑓 𝑃(𝐶1 ) ≠ 0
𝑃(𝐶1 )

𝑃(𝐶2 ∩ 𝐶1 ) = 𝑃(𝐶1 )𝑃(𝐶2 /𝐶1 )

Note:

If 𝐶1 , 𝐶2 , 𝐶3 are any three events, then

𝑃(𝐶1 ∩ 𝐶2 ∩ 𝐶3 ) = 𝑃(𝐶1 )𝑃(𝐶2 /𝐶1 )𝑃(𝐶3 /𝐶1 ∩ 𝐶2 ), …

 Bayes theorem:

Let 𝐶1 , 𝐶2 , 𝐶3 , … , 𝐶𝑛 be a partition of sample space and let C be any event which is a

subset of ⋃𝑛𝑖=1 𝐶𝑖 such that
𝑃(𝐶𝑖 )𝑃(𝐶/𝐶𝑖 )
P(𝐶) > 0, 𝑡h𝑒𝑛 𝑃(𝐶i /C) = 𝑛
∑𝑖=1 𝑃(𝐶𝑖 )𝑃(𝐶/𝐶𝑖 )

Discrete Random Variable:

A Random Variable which takes on a finite (or) countably infinite number of values is
called a Discrete Random Variable.

Continuous Random Variable:

A Random Variable which takes on non-countable infinite number of values is called as

non- Discrete (or) Continuous Random Variable.

Probability Mass Function (P.M.F):

The set of ordered pairs (𝑥, 𝑓(𝑥)) is a probability function of Probability Mass

Function of a Discrete Random Variable 𝑥.

If for each possible outcome 𝑥, 𝑓(𝑥) must be

(i) 𝑓(𝑥) ≥ 0
(ii) ∑ 𝑓(𝑥) = 1

(iii) 𝑃(𝑋 = 𝑥) = 𝑓(𝑥)

The Probability Mass Function is also denoted by 𝑃𝑋 (𝑥) = 𝑃(𝑋 = 𝑥).

Probability Density Function (P.D.F):

The function 𝑓(𝑥) is a Probability Density Function for the Continuous Random
Variable 𝑥 defined over the set of real numbers 𝑅, 𝑖𝑓

(i) 𝑓(𝑥) ≥ 0, ∀ 𝑥 ∈ 𝑅

+∞
(ii) ∫−∞ 𝑓(𝑥) 𝑑𝑥 = 1
𝑏
(iii) 𝑃(𝑎 < 𝑋 < 𝑏) = ∫𝑎 𝑓(𝑥) 𝑑𝑥

Cumulative Distribution Function:

The Cumulative density distribution function of a discrete random variable 𝑋 with

probability distribution function 𝑓(𝑥) as

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑ 𝑓(𝑡)

𝑡≤𝑥

Mathematical Expectation, Variance and Standard deviation:

 Let 𝑋 be a random variable with probability distribution 𝑓(𝑥), then the mean
or mathematical expectation of 𝑋 is denoted by 𝐸(𝑋) and it is denoted by
𝐸(𝑋) = ∑ 𝑥 𝑓(𝑥), where 𝑋is a discrete random variable

+∞
𝐸(𝑋) = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 ,where 𝑋is a continuous random variable

 𝑋 be a random variable with pdf 𝑓(𝑥) and the mean 𝜇, then the variance of 𝑋
is
 𝑉(𝑥) = 𝜎 2 = E[(𝑋 − 𝜇)2 ] = ∑(𝑋 − 𝜇)2 𝑓(𝑥), where 𝑋is a discrete random
variable
+∞
 𝑉(𝑥) = 𝜎 2 = ∫−∞ (𝑋 − 𝜇)2 𝑓(𝑥), where 𝑋is a continuous random variable
 The positive square root of variance is a standard deviation of 𝑋. It is denoted
by 𝜎(𝑆. 𝐷).
 𝐸(𝑥 2 ) = ∑ 𝑥 2 𝑓(𝑥) (Discrete)
+∞
 𝐸(𝑥 2 ) = ∫−∞ 𝑥 2 𝑓(𝑥) (Continuous)

Marginal Probability Distribution

 Let (𝑋, 𝑌) be a two-dimensional discrete random variable. Then the marginal

probability function of the random variable 𝑋 is defined as
𝑚

𝑃(𝑋 = 𝑥𝑖 ) = ∑ 𝑃𝑖𝑗 = 𝑃𝑖∗

𝑗=1
 The marginal probability function of the random variable 𝑌 is defined as
𝑛

𝑃(𝑌 = 𝑦𝑗 ) = ∑ 𝑃𝑖𝑗 = 𝑃∗𝑗

𝑖=1
 The marginal distribution of 𝑋 is the coefficient of pairs (𝑥𝑖 , 𝑃𝑖∗ ) and of 𝑌 is (𝑦𝑗 , 𝑃∗𝑗 ).

Conditional Probability Distribution

Let (𝑋, 𝑌) be two-dimensional discrete random variable, then

𝑃(𝑋=𝑥𝑖 ,𝑌=𝑦𝑗 ) 𝑃𝑖𝑗

 𝑃 (𝑋 = 𝑥𝑖 /𝑌 = 𝑦𝑗 ) = =
𝑃(𝑌=𝑦𝑗 ) 𝑃∗𝑗

𝑃(𝑋=𝑥𝑖 ,𝑌=𝑦𝑗 ) 𝑃𝑖𝑗

 𝑃 (𝑌 = 𝑦𝑗 /𝑋 = 𝑥𝑖 ) = =
𝑃(𝑋=𝑥𝑖 ) 𝑃𝑖∗

Continuous random variables 𝑿 𝒂𝒏𝒅 𝒀:

Joint Probability Density function of (𝑿, 𝒀)

Let (𝑋, 𝑌) be a two-dimensional continuous random variable such that

𝑑𝑋 𝑑𝑋 𝑑𝑌 𝑑𝑌
𝑃 (𝑋 − ≤𝑋≤𝑋+ , 𝑌− ≤ 𝑌 ≤ 𝑌 + ) = ∬ 𝑓(𝑋, 𝑌) 𝑑𝑋𝑑𝑌
2 2 2 2
Then 𝑓(𝑋, 𝑌) is called the joint density function of (𝑋, 𝑌), if it satisfies the following conditions:

(i) 𝑓(𝑋, 𝑌) ≥ 0, 𝑓𝑜𝑟 𝑎𝑙𝑙 (𝑋, 𝑌) ∈ 𝑅, where R is the range space.

∞ ∞
(ii) ∫−∞ ∫−∞ 𝑓(𝑋, 𝑌) 𝑑𝑋𝑑𝑌 = 1

Moreover, if (𝑎, 𝑏), (𝑐, 𝑑) ∈ 𝑅, then

𝑏 𝑑
(iii) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏, 𝑐 ≤ 𝑌 ≤ 𝑑) = ∫𝑎 ∫𝑐 𝑓(𝑋, 𝑌) 𝑑𝑋𝑑𝑌 = 1

Marginal Probability Distribution:

When (𝑋, 𝑌) be a two-dimensional continuous random variable, then the marginal density function of
the random variable 𝑋 is defined as
∞
𝑓𝑋 (𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦
−∞
The marginal density function of the random variable 𝑌 is defined as
∞
𝑓𝑌 (𝑦) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑥
−∞

Conditional Probability Distribution

Let (𝑋, 𝑌) be two-dimensional continuous random variable, then

𝑓(𝑥, 𝑦)
𝑓(𝑥/𝑦) =
𝑓𝑌 (𝑦)
is conditional probability function of 𝑋 given 𝑌.

𝑓(𝑥,𝑦)
𝑓(𝑦/𝑥) = 𝑓𝑋 (𝑥)
is conditional probability function of 𝑌 given 𝑋.

Moments:
 The 𝑟 𝑡ℎ moment about the origin of a random variable 𝑋 denoted by 𝜇𝑟 is 𝐸(𝑋 𝑟 ), i.e.,
𝜇0 = 𝐸(𝑋 0 ) = 𝐸(1) = 1
𝜇1 = 𝐸(𝑋1 ) = 𝐸(𝑋) = 𝜇
2
𝜇2 = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝐸(𝑋 2 ) − 𝜇2
𝐸(𝑋 2 ) = 𝜎 2 + 𝜇2

Moment Generating function (MGF):

 The MGF of the distribution of a random variable completely describes the nature of the
distribution.

 Let having PDF 𝑓(𝑋), then the MGF of the distribution of 𝑋 is denoted by 𝑀(𝑡) and is
defined as 𝑀(𝑡) = 𝐸(𝑒 𝑡𝑥 ).

∑ 𝑒 𝑡𝑥 𝑓(𝑥) , if 𝑥 is discrete
Thus, the MGF 𝑀(𝑡) = { ∞ 𝑡𝑥
∫−∞ 𝑒 𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑢𝑜𝑢𝑠

We know that 𝑀(𝑡) = 𝐸(𝑒 𝑡𝑥 )

∞
𝑡𝑟 ′
𝑀(𝑡) = ∑ 𝜇
𝑟! 𝑟
𝑟=0

𝑡𝑟
The coefficient of is about the origin is 𝜇𝑟 ′ .
𝑟!

 If 𝑋 be a continuous random variable, then MGF is

∞
𝑀(𝑡) = ∫ 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥
−∞

∞
𝑀′ (𝑡) = ∫ 𝑥. 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥
−∞

∞
𝑀′′ (𝑡) = ∫−∞ 𝑥 2 . 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥, …

Now at 𝑡 = 0

𝑀(0) = 𝐸(1) = 1

𝑀′ (0) = 𝐸(𝑥) = 𝜇

𝑀′′ (0) = 𝐸(𝑥 2 ) = 𝜎 2 + 𝜇2

Mean is 𝜇 = 𝑀′ (0)
2
Variance is 𝑀′′ (0) = 𝑀′′ (0) − (𝑀′ (0))

𝜕𝑟
𝜇𝑟 ′ = (𝑀(𝑡)) ; 𝑟 = 0,1,2, …
𝜕𝑡 𝑟

Characteristic function:
 The characteristic function is defined as

∑ eitX f(x), for discrete probability distribution

∅X (t) = E(eitX ) = x
itX
∫e f(x)dx, for continuous probability distribution
{

 If 𝐹𝑋 (𝑥) is the distribution function of a continuous random variable 𝑋, then

∞
∅𝑋 (𝑡) = ∫ 𝑒 𝑖𝑡𝑋 𝑑𝐹(𝑥)
−∞

∞
(𝑖𝑡)𝑟 ′
𝑀(𝑡) = ∑ 𝜇
𝑟! 𝑟
𝑟=0

(𝑖𝑡)𝑟
The coefficient of 𝑟!
is about the origin is 𝜇𝑟 ′ .
Module 3

Correlation and Regression

Karl Pearson’s coefficient of Correlation (Covariance method):

 Correlation coefficient between two variables 𝑋 𝑎𝑛𝑑 𝑌, usually denoted by 𝑟(𝑋, 𝑌) or

simply 𝑟𝑋𝑌 or simply 𝑟, is a numerical measure of linear relationship between them
and is defined as:

𝐶𝑜𝑣(𝑋,𝑌)
𝑟𝑋𝑌 = 𝜎𝑋 𝜎𝑌

 If (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3 ), … , (𝑥𝑛 , 𝑦𝑛 ) are 𝑛 pairs of observations of the variables 𝑋 𝑎𝑛𝑑 𝑌
in a bivariate distribution, then

1 1 1
𝐶𝑜𝑣(𝑥, 𝑦) = 𝑛 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅); 𝜎𝑥 = √ ∑(𝑥 − 𝑥̅ )2 , 𝜎𝑦 = √ ∑(𝑦 − 𝑦̅)2
𝑛 𝑛

 Summation being taken over 𝑛 pairs of observations.

∑ 𝑑𝑥 𝑑𝑦
𝑟=𝑟=
√∑ 𝑑𝑥 2 ∑ 𝑑𝑦 2

Where, 𝑑𝑥 = 𝑥 − 𝑥̅ 𝑎𝑛𝑑 𝑑𝑦 = 𝑦 − 𝑦̅.

 Corrected value of 𝑟 is given by

𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ] × [𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
Properties of correlation coefficient:

 Pearson coefficient cannot exceed 1 numerically. In other words, it lies between -1

and +1 i.e., −1 ≤ 𝑟 ≤ 1
 Correlation coefficient is independent of the change of origin and scale.
Mathematically, if X and y are the given variables and they are transformed to the
new variables 𝑢 𝑎𝑛𝑑 𝑣 by the change of origin and scale
𝑋−𝐴 𝑦−𝐵
𝑢= ℎ
𝑎𝑛𝑑 𝑣 = , ℎ > 0, 𝑘 > 0.
𝑘

Where, A, B, h and k are constants, ℎ > 0, 𝑘 > 0, then the correlation between
𝑥 𝑎𝑛𝑑 𝑦 is same the correlation coefficient between 𝑢 𝑎𝑛𝑑 𝑣 i.e., 𝑟(𝑥, 𝑦) = 𝑟(𝑢, 𝑣)

𝑟𝑥𝑦 = 𝑟𝑢𝑣

∑(𝑢 − 𝑢̅)(𝑣 − 𝑣̅ )
𝑟𝑢𝑣 =
√∑(𝑢 − 𝑢̅)2 ∑(𝑣 − 𝑣̅ )2

𝑛 ∑ 𝑢𝑣 − (∑ 𝑢)(∑ 𝑣)
𝑟𝑢𝑣 =
√[𝑛 ∑ 𝑢2 − (∑ 𝑢)2 ] × [𝑛 ∑ 𝑣 2 − (∑ 𝑣)2 ]

 Two independent variables are uncorrelated i.e., 𝑟𝑥𝑦 = 0.

𝑎×𝑐
 𝑟(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) = |𝑎×𝑐| . 𝑟(𝑋, 𝑌)

Rank Correlation method:

 Spearman’s rank correlation coefficient, usually denoted by 𝜌 (Rho) is given by the
formula

6 ∑ 𝑑2
𝜌=1−
𝑛(𝑛2 −1)

Where, 𝑑 is the difference between the pair of ranks of the same individual in the two
characteristics and 𝑛 is the number of pairs.
Repeated ranks:
𝑚(𝑚2 −1)
 In the Spearman’s formula add the factor
12
to ∑ 𝑑2 , where 𝑚 is the number of

times is repeated. This correction factor is to be added for each repeated value in both
the series.

Linear Regression:
 Let us suppose that the in the bivariate distribution(𝑥𝑖 , 𝑦𝑖 ); 𝑖 = 1,2,3, … , 𝑛; 𝑦 is
dependent variable and 𝑥 is independent variable. Let the line of regression is the line
of 𝑦 on 𝑥 be

𝑦 = 𝑎 + 𝑏𝑥

 The line of regression of 𝑌 𝑜𝑛 𝑋 passes through the point (𝑥̅ , 𝑦̅ )

𝑦̅ = 𝑎 + 𝑏𝑥̅

Regression coefficients:

 Equations of the line of regression of 𝑥 𝑜𝑛 𝑦 is

𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅)

 Equation of line of regression of 𝑦 𝑜𝑛 𝑥 is

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

Coefficient of Determination:

 The coefficient is given by the square of the correlation coefficient i.e.,

explained variance
𝑟2 =
total variance

Coefficient of Partial correlation:

 The partial correlation coefficient between 𝑋1 𝑎𝑛𝑑 𝑋2 , usually denoted by 𝑟12.3 is

given by
𝑟12 − 𝑟13 𝑟23
𝑟12.3 =
√(1 − 𝑟13 2 )(1 − 𝑟23 2 )
𝑟13 − 𝑟12 𝑟32
𝑟13.2 =
√(1 − 𝑟12 2 )(1 − 𝑟32 2 )
𝑟23 − 𝑟21 𝑟31
𝑟23.1 =
√(1 − 𝑟21 2 )(1 − 𝑟31 2 )

 The multiple correlations in terms of total and partial correlations:

2 𝑟12 2 + 𝑟13 2 − 2𝑟12 𝑟13 𝑟23

1 − 𝑅1.23 = 1−
1 − 𝑟23 2

1 − 𝑟23 2 − 𝑟12 2 − 𝑟13 2 + 2𝑟12 𝑟13 𝑟23

=
1 − 𝑟23 2

Note:
𝜔
1 − 𝑅1.23 2 =
𝜔11

1 𝑟12 𝑟13
Where, 𝜔 = |𝑟21 1 𝑟23 | = 1 − 𝑟12 2 − 𝑟13 2 − 𝑟23 2 + 2𝑟12 𝑟13 𝑟23
𝑟31 𝑟32 1

1 𝑟23
and 𝜔11 = | | = 1 − 𝑟23 2
𝑟32 1
Module 4
Discrete Probability Distributions

Bernoulli’s Distribution:

 A random variable 𝑋 which takes two values 0 and 1 with probability 𝑞 𝑎𝑛𝑑 𝑝
respectively. That is 𝑃(𝑋 = 0) = 𝑞 𝑎𝑛𝑑 𝑃(𝑋 = 1) = 𝑝, 𝑞 = 1 − 𝑝 is called a
Bernoulli’s discrete random variable. The probability function of Bernoulli’s
distribution can be written as

𝑃(𝑋) = 𝑝 𝑋 𝑞 𝑛−𝑋 = 𝑝 𝑋 (1 − 𝑝)𝑛−𝑋 ; 𝑋 = 0,1

Note:

 Mean of Bernoulli’s distribution discrete random variable 𝑋

𝜇 = 𝐸(𝑋) = ∑ 𝑋𝑖 . 𝑃(𝑋𝑖 ) = 𝑝

 Variance of 𝑋 is

𝑉(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = ∑ 𝑋𝑖 2 𝑃(𝑋𝑖 ) − 𝜇 2

= (02 × 𝑞) + (12 × 𝑝) − 𝑝2 = 𝑝 − 𝑝2 = 𝑝(1 − 𝑝) = 𝑝𝑞

 The standard deviation is 𝜎 = √𝑝𝑞

Binomial Distribution:

𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 , 𝑥 = 0,1,2,3, … , 𝑛

𝑃(𝑋 = 𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝑛 𝑎𝑛𝑑 𝑝 are known as parameters.

 Mean of 𝑋 is
𝜇 = 𝐸(𝑋) = 𝑛𝑝
2)
 Variance 𝑉(𝑋) = 𝐸(𝑋 − 𝐸(𝑋)2
𝜎 2 = 𝑉(𝑋) = 𝑛𝑝𝑞

MGF Binomial Distribution:

 Let 𝑋~𝐵(𝑛, 𝑝), then

𝑀(𝑡) = 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = (𝑞 + 𝑝𝑒 𝑡 )𝑛

Characteristic Function of Binomial distribution:

∅𝑋 (𝑡) = 𝐸(𝑒 𝑖𝑡𝑥 ) = (𝑞 + 𝑝𝑒 𝑖𝑡 )𝑛

Cumulative Binomial distribution:

The Binomial probabilities can be obtained from cumulative distribution as follows

𝑏(𝑥; 𝑛, 𝑝) = 𝐵(𝑥; 𝑛, 𝑝) − 𝐵(𝑥 − 1; 𝑛, 𝑝)

Poisson Distribution:
 A random variable X taking on one of the non-negative values with parameter λ, λ >0,
is said to follow Poisson distribution if its probability mass function is given by
λ𝑥 𝑒 −λ
𝑃(𝑥; λ) = 𝑃(𝑋 = 𝑥) = { , 𝑥 = 0,1,2,3, …
𝑥!
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

 Poisson parameter, λ = np

 Mean
𝜇 = 𝐸(𝑋) = λ

𝜇 = λ = np

 Variance

𝑉(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2

𝑉(𝑋) = 𝜎 2 = λ

Cumulative Poisson distribution:

λ𝑘 𝑒 −λ
𝐹(𝑥; λ) = 𝑃(𝑋 ≤ 𝑥) = ∑𝑥𝑘=0
𝑘!

Moment generating function:

𝑡 −1)
𝑀(𝑡) = 𝑒 λ(𝑒

Characteristic function:
𝑖𝑡 −1)
∅(𝑡) = 𝑒 λ(𝑒
Hyper geometric Distribution:

 A discrete random variable X is said to follow the hypergeometric distribution with

parameters 𝑁, 𝑀 𝑎𝑛𝑑 𝑛, if it assumes only non-negative values and its probability
mass function is given by
(𝑀
𝑘
)(𝑁−𝑀
𝑛−𝑘
)
, 𝑘 = 0,1,2,3, … , min(𝑛, 𝑀)
𝑃(𝑋 = 𝑥) = ℎ(𝑘; 𝑁, 𝑀, 𝑛) = { (𝑁)
𝑛
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Where 𝑁 is a positive integer, 𝑀 is a positive integer not exceeding 𝑁 and 𝑛 is a

positive integer that is at most 𝑁.

𝑀𝐶𝑘 × 𝑁 − 𝑀𝐶𝑛−𝑘
𝑃(𝑋 = 𝑥) = ℎ(𝑘; 𝑁, 𝑀, 𝑛) =
𝑁𝐶𝑛

𝑛𝑀 𝑀
 Mean is 𝐸(𝑋) = 𝑛𝑝 = 𝑁
, 𝑤ℎ𝑒𝑟𝑒 𝑝 = 𝑁
𝑛𝑀(𝑁−𝑀)(𝑁−𝑛)
 Variance is 𝑣𝑎𝑟(𝑋) = 𝑛𝑝𝑞 = N2 (𝑁−1)

Covariance:

 The covariance of two random variables X and Y is

𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)
Or
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝐸(𝑋))(𝑌 − 𝐸(𝑌))]
Module 5

Continuous Probability Distribution

Uniform Distribution:

 A random variable 𝑋 is said to follow uniform distribution over an interval (a, b), if its
probability density function is constant = k (say), over the entire range of X,

𝑘, 𝑎<𝑋<𝑏
𝑓(𝑋) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

1
, a<X<b
f(X) = {(b − a)
0, otherwise

∞ 𝑏 1
 ∫−∞ 𝑓(𝑋)𝑑𝑋 = ∫𝑎 (𝑏−𝑎)
𝑑𝑋 = 1, 𝑎 < 𝑏, a and b are two parameters of the uniform

distribution on (a, b).

 Since 𝐹(𝑋) is not continuous at 𝑥 = 𝑎 and 𝑥 = 𝑏, it is not differentiable at these

𝑑 1
points. Thus 𝑑𝑋 𝐹(𝑋) = 𝑓(𝑋) = (𝑏−𝑎) ≠ 0 exists everywhere except the points 𝑥 = 𝑎

and 𝑥 = 𝑏.

Moments:

𝑏+𝑎
 Mean =
2
(𝑏−𝑎)2
 Variance = 12
Normal Distribution:

 A random variable X is said to have a normal distribution, if its density function or

probability distribution is given by
1 −(𝑥−𝜇)2
𝑓(𝑥; 𝜇, 𝜎) = 𝑒 2𝜎2 , −∞ < 𝑥 < ∞, −∞ < µ < ∞, σ > 0.
𝜎√2𝜋

Where, 𝜇 is the mean and 𝜎 is the standard deviation of 𝑥.

 The random variable that follows this distribution is denoted by z. If a variable x

follows normal distribution with mean µ and s.d. σ, the variable z defined as
𝑥−µ
𝑍=
𝜎
has standard normal distribution with mean 0 and standard deviation 1. This is also
referred as z-score.
 The normal curve is symmetric about mean, the total area under the normal curve is 1,
that is
𝑃(−∞ < 𝑋 < ∞) = 1
Also

𝑃(−∞ < 𝑍 < ∞) = 1

 The standard normal probability in the form of cumulative distribution function
(CDF)

𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝐹(𝑏) − 𝐹(𝑎)

 When 𝑋 is normal distribution with mean 𝜇 and standard deviation 𝜎

𝑎−𝜇 𝑏−𝜇
𝑃(𝑎 < 𝑋 ≤ 𝑏) = 𝑃 ( <𝑍≤ )
𝜎 𝜎
𝑏−𝜇 𝑎−𝜇
= ∅( ) −= ∅ ( )
𝜎 𝜎
 𝐹(−𝑍) = 1 − 𝐹(𝑍)
Exponential Probability Distribution:

 A continuous random variable 𝑋 is said to follow an exponential distribution with

parameter 𝜆 > 0, if its probability density function is given by
𝜆𝑒 −𝜆𝑥 , 𝑥≥0
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 The general form of the exponential distribution is
𝑥
1
𝑓(𝑥) = 𝑎 𝑒 −𝑎 , 𝑎 > 0, 𝑥 ≥ 0 with parameter 𝑎.

MGF of Exponential Distribution:

𝜆
 The MGF is 𝑀𝑋 (𝑡) = 𝜆−𝑡
1
 Mean = 𝜆
1
 Variance= 𝜆2

 The cumulative distribution function is

𝑥 𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = 1 − 𝑒 −𝜆𝑥
0 0
−𝜆𝑥
𝐹(𝑥) = { 1 − 𝑒 , 𝑥≥0
0, 𝑥<0

Exponential Distribution possesses memoryless property:

 𝑃(𝑋 > 𝑠 + 𝑡 /𝑋 > 𝑡) = 𝑃(𝑋 > 𝑠), for any 𝑠, 𝑡 > 0


𝑃(𝑋 > 𝑠 + 𝑡 ∩ 𝑋 > 𝑡) 𝑃(𝑋 > 𝑠 + 𝑡)
𝑃(𝑋 > 𝑠 + 𝑡/ 𝑋 > 𝑡) = = = 𝑃(𝑋 > 𝑠)
𝑃(𝑋 > 𝑡) 𝑃(𝑋 > 𝑡)
Gamma Distribution:

 A continuous random variable 𝑋 is said to follow general Gamma distribution with

two parameters 𝜆 > 0 and 𝑘 > 0, if its probability density function is given by

𝜆𝑘 𝑥 𝑘−1 𝑒 −𝜆𝑥
𝑓(𝑥) = { ,𝑥 ≥ 0
Γ(𝑘)
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Note:
 When 𝒌 = 𝟏, the distribution is called exponential distribution
∞ ∞ Γ(𝑘)
 ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1 (Since, ∫0 𝑥 𝑘−1 𝑒 −𝑎𝑥 𝑑𝑥 = 𝑎𝑘
)

MGF of Gamma Distribution:

 The probability density function of the general Gamma random variable 𝑋 is

𝜆𝑘 𝑥 𝑘−1 𝑒 −𝜆𝑥
𝑓(𝑥) = { ,𝑥 ≥ 0
Γ(𝑘)
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝜆 𝑎𝑛𝑑 𝑘 are the parameters.

 The MGF is
𝜆 𝑘
𝑀𝑋 (𝑡) = ( )
𝜆−𝑡
𝑘
 Mean = 𝜆
𝑘
 Variance =𝜆2
Beta Distribution:

 A continuous random variable X takes on values in the interval from 0 to 1. It has to

follow the Beta distribution, if its probability density is given as
Γ(𝛼 + 𝛽) 𝛼−1
𝑥 (1 − 𝑥)𝛽−1 , 𝑓𝑜𝑟 0 < 𝑥 < 1, 𝛼 > 0, 𝛽 > 0
𝑓(𝑥) = {Γ(𝛼)Γ(𝛽)
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝛼
 Mean 𝜇 =
𝛼+𝛽
𝛼𝛽
 Variance 𝜎 2 = (𝛼+𝛽)2 (𝛼+𝛽+1)

Note:

 If 𝛼 = 1 𝑎𝑛𝑑 𝛽 = 1, we obtain as special case the uniform distribution.

Weibull distribution:

 The random variable X is said to follow Weibull distribution, if its probability

distribution is given by
𝛽−1 −𝛼𝑥 𝛽
𝑓(𝑥) = {𝛼𝛽𝑥 𝑒 , 𝑥>0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where, 𝛼 > 0 𝑎𝑛𝑑 𝛽 > 0 are two parameters of the Weibull distribution.

Note:

 When 𝛽 = 1, the Weibull distribution reduces to the exponential distribution with

parameter 𝛼.
1
− 1
 Mean = 𝐸(𝑋) = 𝜇 = 𝛼 𝛽 Γ (1 + 𝛽)

2 1 2
 Variance= 𝜎 2 = 𝛼 −2/𝛽 {Γ (1 + 𝛽) − [Γ (1 + 𝛽)] }
Cumulative distribution function:
𝛽
−𝛼𝑥
𝐹(𝑥; 𝛼, 𝛽) = { 1 − 𝑒 , 𝑥≥0
0, 𝑥<0
Module-6
Hypothesis Testing-I

Population Parameters Sample Statistics

Population mean (𝜇) Sample mean (𝑋̅)
Population standard deviation (𝜎) Sample standard deviation (𝑆)
Population size (𝑁) Sample size (𝑛)
Population proportion (𝑃) Sample proportion (𝑝)

Sampling distribution of mean (𝝈 𝒌𝒏𝒐𝒘𝒏):

If ̅𝑋 is the mean of a random sample of size 𝑛 taken from a population having the mean 𝜇
and the finite variance 𝜎 2 , then

̅𝑋 − 𝜇
𝑍= 𝜎
√𝑛

Sampling distribution of mean (𝝈 𝒖𝒏𝒌𝒏𝒐𝒘𝒏):

If ̅𝑋 is the mean of a random sample of size 𝑛 taken from a normal population having the
∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 − 𝑋)
mean 𝜇 and the finite variance 𝜎 2 , and 𝑠 2 = then
𝑛−1

̅𝑋 − 𝜇
𝑡= 𝑠
√𝑛
is a random variable having t-distribution with parameter 𝜈 = 𝑛 − 1.

Hypothesis testing:
Hypothesis testing is a method for testing a claim/hypothesis about a parameter in a
population using data measured in a sample.

Statistical hypothesis:
 In 𝐻0 , a statement involving equality (=, ≥, ≤)
 In 𝐻1 , a statement involving equality (≠, >, <)
Types of test:
 Suppose we test for population mean, then
Null hypothesis 𝐻0 : 𝜇 = 𝜇0
Alternative hypothesis 𝐻1 : 𝜇 ≠ 𝜇0 𝑜𝑟 𝜇 > 𝜇0 𝑜𝑟 𝜇 < 𝜇0
If 𝜇 ≠ 𝜇0 , then the test is called two-tailed test.
If 𝜇 > 𝜇0 , then the test is called right-tailed test (One-tailed)
If 𝜇 < 𝜇0 , then the test is called left-tailed test (One-tailed)

Level of significance (𝜶) Types of test

One-tailed Two-tailed
5% (0.05) +1.645 or - 1.645 ± 1.96
1% (0.01) +2.33 or - 2.33 ± 2.58
10% (0.1) +3.09 or - 3.09 ± 3.30

 If 𝑛 ≥ 30 is a large sample

If 𝑛 < 30 is a small sample

1. Test of single mean condition:

 𝐻0 : 𝜇 = 𝜇0

Test statistic: Statistic for test concerning mean (𝜎 known) is

̅𝑋 − 𝜇0
𝑍= 𝜎
√𝑛
Which follows standard normal distribution

 Critical regions for testing 𝜇 = 𝜇0 (standard normal distribution and 𝜎 be known)

Alternative hypothesis 𝐻0 Reject Null hypothesis

𝜇 < 𝜇0 𝑍 < −𝑍𝛼
𝜇 > 𝜇0 𝑍 > 𝑍𝛼
𝜇 ≠ 𝜇0 𝑍 < − 𝑍𝛼/2 or 𝑍 > 𝑍𝛼/2
2. Hypothesis test concerning two mean:
̅𝑋 − ̅𝑌
𝑍=
𝑠 2 𝑠 2
√ 1 + 2
𝑛1 𝑛2

Inferences concerning Proportions:

1. Test for single Proportion (Large sample):

Therefore, the test statistics ′𝑍′ is given by

𝒑−𝑷
𝒁=
√𝑷𝑸
𝒏

Where, 𝑄 = 1 − 𝑃
𝑋
𝑝 = 𝑛 is a sample proportion in a random sample of size 𝑛.

2. Test for the difference between two sample

Proportions:

 Let 𝑃1 𝑎𝑛𝑑 𝑃2 be the proportions of successes in two large samples of size

𝑛1 𝑎𝑛𝑑 𝑛2 respectively, drawn from the sample population or from two populations
with the same proportion 𝑃1 = 𝑃2 = 𝑃.

Test statistics:
𝑝1 − 𝑝2
𝒁=
1 1
√𝑃𝑄 (𝑛 + 𝑛 )
1 2

Where population proportion mean 𝑃 is known.

If, 𝑃 is not known, an unbiased estimate of 𝑃 based on the both samples, given by
𝑛1 𝑝1 +𝑛2 𝑝2
𝑃= is used in the place of 𝑃.
𝑛1 +𝑛2
Module 7

Hypothesis Testing-II

Student’s t-distribution:
1. Statistic for small sample test concerning one mean:

Null hypothesis: 𝐻0 : 𝜇 = 𝜇0

Test Statistic:
𝑋̅ − 𝜇0
𝑡= 𝑠
√𝑛

Follows t-distribution with 𝑛 − 1 degrees of freedom.

∑(𝑋𝑖 −𝑋̅ )2
Here, 𝑠 2 =
𝑛−1

is an unbiased estimator of population standard deviation 𝜎 2 .

2. Test of difference of means:

Null hypothesis: 𝐻0 : 𝜇1 − 𝜇2 = 𝑑

Test Statistic:

𝑥
̅̅̅1 − ̅̅̅
𝑥2
𝑡=
1 1
√𝑠2 ( + )
𝑛1 𝑛2
follows t-distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom.

̅̅̅1̅)2 +∑(𝑥2𝑖 −𝑥
∑(𝑥1𝑖 −𝑥 ̅̅̅2̅)2
Where, 𝑠 2 =
𝑛1 +𝑛2 −2

Or
2
𝑛1 𝑠1 2 + 𝑛2 𝑠2 2
𝑠 =
𝑛1 + 𝑛2 − 2

F-distribution:

 F-distribution is used to test the equality of the variances of two populations from
which two samples have been drawn.

Null hypothesis: 𝐻0 : 𝜎1 2 = 𝜎2 2

Test statistics:

𝑠1 2
𝐹=
𝑠2 2

̅̅̅1̅)2
∑(𝑥1𝑖 −𝑥 ̅̅̅2̅)2
∑(𝑥2𝑖 −𝑥
Where, 𝑠1 2 = 𝑛1 −1
𝑎𝑛𝑑 𝑠2 2 = 𝑛2 −1

Note:
 The larger among 𝒔𝟏 𝟐 𝒂𝒏𝒅 𝒔𝟐 𝟐 will be the numerator.
 Here ′𝐹′ follows F-distribution with (𝑛1 − 1, 𝑛2 − 1) degrees of freedom.
 The critical region value is 𝐹(𝑛1 −1,𝑛2 −1) .
Chi-square distribution (or) 𝝌𝟐 − 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧:
 Hypothesis concerning one variance
 Goodness of fit
 Test for independence of attributes

1. Hypothesis concerning one variance:

Null hypothesis 𝑯𝟎 : 𝜎 2 = 𝜎0 2

Test statistics:
2
(𝑛 − 1)𝑠 2
𝜒 =
𝜎0 2

Where 𝑛 is the sample size

𝑠 2 is the sample variance

𝜎0 2 is the value of 𝜎 2 given by null hypothesis.

The degrees of freedom of a 𝜒 2 −distribution is ′𝑛 − 1′.

2. Goodness of fit:

 Chi-square test of goodness of fit is

𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 = ∑[ ]
𝐸𝑖
𝑖=1

The degrees of freedom (df) for Chi-square distribution is ′𝑛 − 1′.

Note:
If the data is given in series of ′𝑛′ numbers, then

1. In case of Binomial distribution, 𝑑𝑓 = 𝑛 − 1

2. In case of Poisson distribution, 𝑑𝑓 = 𝑛 − 2
3. In case of Normal distribution, 𝑑𝑓 = 𝑛 − 3.

3. Chi-square test for independence of attributes:

 An attribute may be marked by its presence (position) or absence in a number of a

given population.
 Let us consider two attributes A and B. A is divided into two classes and B is divided
in two classes. The various cell frequencies can be expressed in the following table
known as 2x2 contingency tale.

𝐴 𝑎 𝑏
𝐵 𝑐 𝑑

𝑎 𝑏 𝑎+𝑏
𝑐 𝑑 𝑐+𝑑
𝑎+𝑐 𝑏+𝑑 𝑁
The expected frequencies are given by

(𝑎 + 𝑐)(𝑎 + 𝑏) (𝑏 + 𝑑)(𝑎 + 𝑏) 𝑎+𝑏

𝐸(𝑎) = 𝐸(𝑎) =
𝑁 𝑁

(𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑) 𝑐+𝑑

𝐸(𝑎) = 𝐸(𝑎) =
𝑁 𝑁

𝑎+𝑐 𝑏+𝑑 𝑁 (total frequency)

Design experiments:

 When comparing means across two samples, we use Z-test or t-test.

 If more than two samples are test for their means, we use ANOVA.

ANOVA:
Analysis of Variance is a hypothesis testing technique used to test the equality of two or more
population means by examining the variances of samples that are taken.

Assumptions of ANOVA:
 All populations involved follow a normal distribution.
 All populations have the same variances.
 The samples are randomly selected and independent of one another or the
observations are independent.

Types of ANOVA:
 One-way ANOVA: Completely Randomized Design (CRD)
 Two-way ANOVA: Randomized Based Design (CBD)
 Three-way ANOVA: Latin Square Design (LSD)

1. Scheme for one-way classification or Completely

Randomized Design (CRD):

𝑘 𝑛𝑖

𝑆𝑆𝑇 = ∑ ∑ 𝑦𝑖𝑗 2 − 𝐶
𝑖=1 𝑗=1
𝑘 2
𝑇𝑖
𝑆𝑆𝐵 = ∑ –𝐶
𝑛𝑖
𝑖=1

Where, 𝐶 is called the correction factor for the mean is given by

2 𝑘 𝑛𝑖
𝐺
𝐶= , 𝑁 = ∑ 𝑇𝑖 , 𝑇𝑖 = ∑ 𝑦𝑖𝑗
𝑁
𝑖=1 𝑗=1
Test statistic:

 To test the 𝐻0 that 𝐾 population mean is equal, we shall compare two estimates of 𝜎 2 .
One based on the variation between the sample mean.

One based on the variation within the sample mean.

 Each sum of squares first converted to a mean square

𝒔𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒔
𝐌𝐞𝐚𝐧 𝐬𝐪𝐮𝐚𝐫𝐞 =
𝒅𝒆𝒈𝒓𝒆𝒆𝒔 𝒐𝒇 𝒇𝒓𝒆𝒆𝒅𝒐𝒎

Mean of sum of squares between sample

2
𝑺𝑺𝑩 𝑦𝑖 − 𝑦
∑𝑘𝑖=1 𝑛𝑖 (̅̅̅̅ ̅)
𝐌𝐒𝐁 = =
𝑫𝑭𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝑲−𝟏

Mean sum of squares within sample

𝑛𝑖 2
𝑺𝑺𝑾 ∑𝑘𝑖=1 ∑𝑗=1 (𝑦𝑖𝑗 − ̅𝑦
̅̅̅)
𝑖
𝐌𝐒𝐖 = =
𝑫𝑭𝒘𝒊𝒕𝒉𝒊𝒏 𝑵−𝑲
 Test statistic:
𝑴𝑺𝑩
𝑭=
𝑴𝑺𝑾

F-distribution follows 𝑲 − 𝟏 and 𝑵 − 𝑲 degrees of freedom.

ANOVA table:

Source of Degrees of Sum of Mean squares 𝐹

variation freedom squares
Between groups 𝑲−𝟏 SSB MSB 𝐹
Within groups 𝑵−𝑲 SSW MSW 𝑀𝑆𝐵
=
𝑀𝑆𝑊

Total 𝑵−𝟏 SST

Decision:
If 𝐹 > 𝐹𝛼,(𝑁−1,𝑁−𝐾) , reject the null hypothesis 𝐻0 .
2. Two-way ANOVA classification:

∑𝑪
𝒊=𝟏 𝑻𝒊.
𝟐
𝑺𝑺(𝑻𝒓) = − 𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝒇𝒂𝒄𝒕𝒐𝒓
𝒄

2
Block sum square, 𝑆𝑆(𝐵𝑙) = 𝐶 ∑𝑟𝑗=1(𝑦
̅̅̅
.𝑗 − 𝑦
̅.. )

∑𝒓𝒋=𝟏 𝑻.𝒋 𝟐
𝑺𝑺(𝑩𝒍) = − 𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝒇𝒂𝒄𝒕𝒐𝒓
𝒓
2
Error sum of square, 𝑆𝑆𝐸 = ∑𝐶𝑖=1 ∑𝑟𝑗=1(𝑦𝑖𝑗 − 𝑦̅𝑖. − ̅̅̅
𝑦.𝑗 + 𝑦̅.. )
2
Total sum of square, 𝑆𝑆𝑇 = ∑𝐶𝑖=1 ∑𝑟𝑗=1(𝑦𝑖𝑗 − 𝑦̅.. )

𝑪 𝒓
𝟐
𝑺𝑺𝑻 = ∑ ∑(𝒚𝒊𝒋 − 𝒚̅.. ) − 𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝒇𝒂𝒄𝒕𝒐𝒓
𝒊=𝟏 𝒋=𝟏

𝑻.. 𝟐
Where, correction factor is given by 𝑪 = 𝑪𝒓

𝑇𝑖 . = the sum of the 𝑟 observations for the ith treatment

𝑇.𝑗 = the sum of the 𝐶 observations for the jth block

𝑇.. = the grand total of all observations

F-ratio for treatment or between sample

𝑺𝑺(𝑻𝒓)
𝑴𝑺(𝑻𝒓) ( 𝑪−𝟏 )
𝑭𝑻𝒓 = =
𝑴𝑺𝑬 𝑺𝑺𝑬
( )
(𝑪 − 𝟏)(𝒓 − 𝟏)

Decision:

Reject for 𝐻0 , if 𝐹𝑇𝑟 > 𝐹(𝐶−1,(𝐶−1)(𝑟−1))

F-ratio for blocks

𝑺𝑺(𝑩𝒍)
𝑴𝑺(𝑩𝒍) ( 𝒓−𝟏 )
𝑭𝑩𝒍 = =
𝑴𝑺𝑬 𝑺𝑺𝑬
( )
(𝑪 − 𝟏)(𝒓 − 𝟏)
Decision: reject for 𝐻0 , if 𝐹𝐵𝑙 > 𝐹𝛼, (𝑟−1,(𝐶−1)(𝑟−1))

Two-way ANOVA table for results

Source of Degrees of Sum of Mean squares 𝐹

variation freedom squares
Treatments 𝒓−𝟏 SS(Tr) 𝑀𝑆(𝑇𝑟) 𝑀𝑆(𝑇𝑟)
𝑆𝑆(𝑇𝑟) 𝐹𝑇𝑟 =
𝑀𝑆𝐸
=
𝑟−1

Blocks 𝑪−𝟏 SS(Bl) 𝑀𝑆(𝐵𝑙)

𝑀𝑆(𝐵𝑙) 𝐹𝐵𝑙 =
𝑀𝑆𝐸
𝑆𝑆(𝐵𝑙)
(𝐶 − 1)(𝑟 − 1) =
𝐶−1
Error SSE
𝑀𝑆𝐸
𝑆𝑆𝐸
=
(𝑟 − 1)(𝐶 − 1)
Total (𝐶𝑟 − 1) SST

3. Latin Square Design (LSD) (or) Three-way ANOVA:

Null hypothesis: There is no significant difference in the means of columns

(Groups), rows (Blocks), and treatments.

Alternative hypothesis: There is at least one mean in column which differs

from others. Also, there is at least one mean in the rows which differs from
others. Similarly, for treatments.

Degrees of freedom:
𝑫𝑭𝒓𝒐𝒘𝒔 = 𝒏 − 𝟏

𝑫𝑭𝒄𝒐𝒍𝒖𝒎𝒏𝒔 = 𝒏 − 𝟏

𝑫𝑭𝒕𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔 = 𝒏 − 𝟏

𝑫𝑭𝑬𝒓𝒓𝒐𝒓 = (𝒏 − 𝟏)(𝒏 − 𝟐)
Critical region:

𝐹(𝑛−1,(𝒏−𝟏)(𝒏−𝟐) )

𝐺 = ∑ ∑ 𝑥𝑖𝑗

𝐺2
Correction factor is 𝐶. 𝐹 =
𝑁

Sum of squares total:

𝑆𝑆𝑇 = ∑ ∑ 𝑥𝑖𝑗 2 − 𝐶. 𝐹

sum of squares:
𝐶𝑗 2
𝑆𝑆𝐶 = ∑ − 𝐶𝐹
𝑛
Where, 𝐶𝑗 is the column sum of the jth column.

𝑅𝑖 2
𝑆𝑆𝑅 = ∑ − 𝐶𝐹
𝑛
Where, 𝑅𝑖 is the row sum of the ith row.

𝑇𝑖 2
𝑆𝑆𝑇𝑟 = ∑ − 𝐶𝐹
𝑛
Where, 𝑇𝑖 is called the treatment sum of ith treatment.

𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇𝑟

ANOVA table:

Source of Sum of Degrees of Mean squares 𝑭

variation Squares freedom (MS)
(SS)
Columns SSC 𝑛−1 𝑆𝑆𝐶 𝐹1
𝑛−1 𝑀𝑆𝐶
=
𝑀𝑆𝐸
Rows SSR 𝑛−1 𝑆𝑆𝑅 𝐹2
𝑛−1 𝑀𝑆𝑅
=
𝑀𝑆𝐸

Treatments SSTr 𝑛−1 𝑆𝑆𝑇𝑟 𝐹3

𝑛−1 𝑀𝑆𝑇𝑟
=
𝑀𝑆𝐸

Error SSE (𝑛 − 1) (𝑛 − 2) 𝑆𝑆𝐸

(𝑛 − 1) (𝑛 − 2)

Statistics Study Guide: Measures of Central Tendancy
No ratings yet
Statistics Study Guide: Measures of Central Tendancy
2 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Stats Review
No ratings yet
Stats Review
65 pages
Theory and Formula
No ratings yet
Theory and Formula
42 pages
Chapter 3 - Statistics
No ratings yet
Chapter 3 - Statistics
16 pages
Maths - Class - 12 - Statistics and Probability
No ratings yet
Maths - Class - 12 - Statistics and Probability
9 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
OCR MEI S1 Revision Notes
No ratings yet
OCR MEI S1 Revision Notes
7 pages
Lec 6
No ratings yet
Lec 6
20 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Statistics English 781679327228760
No ratings yet
Statistics English 781679327228760
15 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
المحاضرة الثالثة
No ratings yet
المحاضرة الثالثة
16 pages
Inter Material Iindyearem Mathsiia Measures of Dispersion
No ratings yet
Inter Material Iindyearem Mathsiia Measures of Dispersion
29 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Statistics Chapter-IV
No ratings yet
Statistics Chapter-IV
59 pages
Statistics S1 Theory
No ratings yet
Statistics S1 Theory
8 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
Statistics
100% (1)
Statistics
72 pages
Chapter 4 Measures of Dispersion (Variation)
No ratings yet
Chapter 4 Measures of Dispersion (Variation)
34 pages
Quantitative Analysis and Business Development (UNIT-1)
No ratings yet
Quantitative Analysis and Business Development (UNIT-1)
31 pages
IE101 Reviewer
No ratings yet
IE101 Reviewer
22 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
stat jee
No ratings yet
stat jee
7 pages
FIRST TERM ECONOMICS YR11 (1)
No ratings yet
FIRST TERM ECONOMICS YR11 (1)
33 pages
Univariate Statistics
No ratings yet
Univariate Statistics
7 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
S Ah Z5 BW Pwtoebk KRHR JB
No ratings yet
S Ah Z5 BW Pwtoebk KRHR JB
24 pages
11 Statistics
No ratings yet
11 Statistics
14 pages
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
No ratings yet
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
10 pages
Properties of The Normal Distribution
No ratings yet
Properties of The Normal Distribution
16 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
Appendix B
No ratings yet
Appendix B
22 pages
Sta301 Final Term Preperation Formulas
No ratings yet
Sta301 Final Term Preperation Formulas
46 pages
Formula Sheet
No ratings yet
Formula Sheet
18 pages
BBS I Formulae Statistics
No ratings yet
BBS I Formulae Statistics
22 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
STATISTICS & PROBABILITY
No ratings yet
STATISTICS & PROBABILITY
9 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
Univariate Statistics
No ratings yet
Univariate Statistics
4 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Stats
No ratings yet
Stats
3 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
QT Formulae ONLY
No ratings yet
QT Formulae ONLY
4 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
EXP-1- Statistics and Plotting
No ratings yet
EXP-1- Statistics and Plotting
23 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Measures of Dispersion - BMRC
No ratings yet
Measures of Dispersion - BMRC
62 pages
Basic Statistics
No ratings yet
Basic Statistics
31 pages
Statistics Tutorial 1
No ratings yet
Statistics Tutorial 1
12 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Geophysical Research Letters - 2011 - Krevor
No ratings yet
Geophysical Research Letters - 2011 - Krevor
5 pages
Dos 2 - Unit 4
No ratings yet
Dos 2 - Unit 4
15 pages
Exm 2021
No ratings yet
Exm 2021
12 pages
Simulations Electricity
No ratings yet
Simulations Electricity
10 pages
Inductive Proximity Sensors v8 t3 Ca08100010e
No ratings yet
Inductive Proximity Sensors v8 t3 Ca08100010e
102 pages
Aging of Slings
No ratings yet
Aging of Slings
7 pages
Remaining Portion Interven Interference
No ratings yet
Remaining Portion Interven Interference
11 pages
M.SC - Thesis Laterallyloadedbarrettes
No ratings yet
M.SC - Thesis Laterallyloadedbarrettes
162 pages
Zeeman Effect in Mercury
No ratings yet
Zeeman Effect in Mercury
7 pages
KATRA
No ratings yet
KATRA
1 page
Ee8002 - Dem - Question Bank - Unit5
No ratings yet
Ee8002 - Dem - Question Bank - Unit5
4 pages
Notes
No ratings yet
Notes
3 pages
Sobczyk, 2021
No ratings yet
Sobczyk, 2021
29 pages
Periodic Table and Constants
No ratings yet
Periodic Table and Constants
1 page
Notes Unit II Part1
No ratings yet
Notes Unit II Part1
27 pages
2-Bipolar Junction Transistor
No ratings yet
2-Bipolar Junction Transistor
85 pages
12 Continuity
No ratings yet
12 Continuity
3 pages
VanePump Catalog
No ratings yet
VanePump Catalog
3 pages
Photovoltaic Inverter Reliability Assessment: Adarsh Nagarajan, Ramanathan Thiagarajan, Ingrid Repins, and Peter Hacke
No ratings yet
Photovoltaic Inverter Reliability Assessment: Adarsh Nagarajan, Ramanathan Thiagarajan, Ingrid Repins, and Peter Hacke
54 pages
Past Papers 2021-2023
No ratings yet
Past Papers 2021-2023
227 pages
Advanced Computer Graphics Bamu University Questions
No ratings yet
Advanced Computer Graphics Bamu University Questions
2 pages
Lightning
No ratings yet
Lightning
2 pages
Design and Fabrication of Mini Ball Mill
100% (3)
Design and Fabrication of Mini Ball Mill
16 pages
Ohm S Law Tutorial
No ratings yet
Ohm S Law Tutorial
17 pages
Softening, Hardening, and Precipitation Evolution of The AA6082 T651 Heat Affected Zone Caused by Thermal Cycles During and After Welding
No ratings yet
Softening, Hardening, and Precipitation Evolution of The AA6082 T651 Heat Affected Zone Caused by Thermal Cycles During and After Welding
15 pages
Czochralski Silicon Calculations
No ratings yet
Czochralski Silicon Calculations
44 pages
A Primer On Calculus
No ratings yet
A Primer On Calculus
17 pages
Human Physiology and Pathophysiology
No ratings yet
Human Physiology and Pathophysiology
10 pages
Tertiary Windings in Autotransformers
No ratings yet
Tertiary Windings in Autotransformers
5 pages
Practice Exam 8-2
No ratings yet
Practice Exam 8-2
3 pages