Lectures Ma 2203
Lectures Ma 2203
Lecture slides
by
Dr. Suchandan Kayal
Department of Mathematics
National Institute of Technology Rourkela
Rourkela - 769008, Odisha, India
Autumn, 2020
Outline (Part-I)
.
Birth of a offspring:
Experiment
An experiment is observing something happen or conducting
something under certain conditions which result in some
outcomes.
Example
Rainfall: It is a consequence of several things such as cloud
formation, elnino occurrence, humidity, atmospheric pressure
etc. Finally, we observe that there is rainfall. Thus, observing
weather is an experiment.
Types of experiment
Deterministic experiment: It results known outcomes under
certain conditions.
Random experiment: Under fixed conditions, the outcomes
are not known.
Basic notions (random experiment)
Random experiment
An experiment is said to be a random experiment if the
following conditions are satisfied.
The set of all possible outcomes of the experiment is known
in advance.
The outcomes of a particular performance (trial) of the
experiment can not be predicted in advance.
The experiemnt can be repeated under identical conditions.
Sample space
The collection of all possible outcomes of a random experiment
is called the sample space. It is denoted by Ω.
Basic notions (sample space and event)
Sample space/examples
Throwing of a die. Here Ω = {1, 2, 3, 4, 5, 6}.
Throwing of a die and tossing of a coin simultaneously.
Ω = {1, 2, 3, 4, 5, 6} × {H, T }
A coin is flipped repeatedly until a tail is observed.
Ω = {T, HT, HHT, HHHT, · · · }
Lifetime of a battery. Here Ω = [0, 10000].
Event
An event is a set of outcomes of an experiment (a subset of the
sample space) to which a probability is assigned.
Basic notions
Remarks on event
When the sample space is finite, any subset of the sample
space is an event. In this case, all elements of the power set
of the sample space are defined as events.
This approach does not work well in cases where the
sample space is uncountably infinite. So, when defining a
probability space it is possible, and often necessary to
exclude certain subsets of the sample space from being
events.
In general measure theoretic description of probability
spaces an event may be defined as an element of a selected
sigma-field of subsets of the sample space.
Basic notions (impossible and sure events)
Impossible event
An event is said to be impossible if the probability of
occurrence of that event is zero. For example, during the rolling
of a six faces die, the event that the face 7 will occur.
Sure event
An event with probability of occurrence one is called the sure
event. The sample space of any random experiment is always a
sure event. ANother example could be that the lifetime of a
battery is a nonnegative number.
Basic notions
Various operations
Union:
A ∪ B means occurrence of at least one of A and B.
∪ni=1 Ai means occurrence of at least one of Ai , i = 1, · · · , n.
∪∞i=1 Ai means occurrence of at least one of Ai ,
i = 1, · · · , ∞.
Intersection:
A ∩ B means simultaneous occurrence of both A and B.
∩ni=1 Ai means simultaneous occurrence of Ai , i = 1, · · · , n.
∩∞
i=1 Ai means simultaneous occurrence of Ai , i = 1, · · · , ∞.
Exhaustive events:
If ∪ni=1 Ai = Ω, we call A1 , · · · , An to be exhaustive events.
Basic notions
A. Classsical approach
Assumptions:
A random experiment results in a finite number of equally
likely outcomes.
Let Ω = {ω1 , · · · , ωn } be a finite sample space with n ∈ N
possible outcomes, N denotes the set of natural numbers.
For a subset E of Ω, |E| denotes the number of elements in
E.
Result:
The probability of occurrence of an event E is given by
Observations
For any event E, P (E) ≥ 0
For mutually exclusive events E1 , · · · , En ,
Pn n n
| ∪ni=1 Ei | i=1 |Ei |
X |Ei | X
P (∪ni=1 Ei ) = = = = P (Ei )
n n i=1
n i=1
|Ω|
P (Ω) = |Ω| = 1.
Methods of assigning probabilities/Classsical approach
(cont...)
Example-1
Suppose that in your section, we have 150 students born
in the same year. Assume that a year has 365 days. Find
the probability that all the students of your section are
born on different days of the year.
Solution
Denote the event that all the students are born on different
days of the year by E. Here,
Solution
Denote the event that getting exactly two heads in three
tosses of a fair coin by E. Here,
and
E = {HHT, HT H, T HH}.
|E|
Thus, P (E) = |Ω| = 38 .
Methods of assigning probabilities/Classsical approach
(cont...)
Drawbacks
The random experiment must produce equally likely
outcomes.
The total number of outcomes of the random experiment
must be finite.
Methods of assigning probabilities
Observations
For any event E, P (E) ≥ 0
For mutually exclusive events E1 , · · · , En ,
n n
!
[ X
P Ei = P (Ei )
i=1 i=1
P (Ω) = 1.
Methods of assigning probabilities/Relative frequency
Example-3
After tossing a fair coin, we have the following outputs:
Solution
Note that
2k−1
3k−2 ,
k = 1, 2, · · ·
an 1 2 2 3 4 4 2k
= , , , , , , · · · = 3k−2 , k = 1, 2, · · ·
n 1 2 3 4 5 6
2k , k = 1, 2, · · ·
3k
an 2
Thus, lim = = P (H).
n→∞ n 3
Methods of assigning probabilities/Relative frequency
approach (cont...)
Drawbacks
The probability has been calculated based on an
approximation.
The random experiment has to be conducted a large
number of times. This is not always possible since some
experiments are costly (launching satellite).
√
n
lim = 0 ⇒ P (E) = 0 (not correct !).
n→∞ n
√
n− n
lim = 1 ⇒ P (E) = 1 (not correct !).
n→∞ n
Axiomatic approach to probability
Basic concepts
A set whose elements are themselves set is called a class of
sets. For example, A = {{2}, {2, 3}}.
A set function is a real-valued function whose domain is a
class of sets.
A sigma-field of subsets of Ω is a class F of subsets of Ω
satisfying the following properties:
(i) Ω ∈ F
(ii) E ∈ F ⇒ E c = Ω − E ∈ F (closed under complement)
(iii) Ei ∈ F, i = 1, 2, · · · ⇒ ∪∞
i=1 Ei ∈ F (closed under countably
infinite unions)
F = {φ, Ω} is a sigma (trivial) field.
Suppose A ⊂ Ω. Then, F = {φ, Ω, A, Ac } is a sigma field of
subsets of Ω.
Axiomatic approach to probability (cont...)
Definition
Let Ω be a sample space of a random experiment. Let F be the
event space or a sigma field of subsets of Ω. Then, a probability
function or a probability measure is a set function P , defined on
F, satisfying the following three axioms:
For any event E ∈ F, P (E) ≥ 0 (nonnegativity)
For a countably infinite collection of mutually exclusive
events E1 , E2 , · · · , we have
∞
[ ∞
X
P( Ei ) = P (Ei )
i=1 i=1
Proof
See it during lecture.
Inequalities
Proof
See it during the lecture.
Note
To prove Boole’s inequality for the countable set of events, we
can use ∪ni=1 Ei → ∪∞
i=1 Ei for n → ∞ along with the continuity
of the probability measure P.
Inequalities (cont...)
Bonferroni’s inequality
Let (Ω, F, P ) be a probability space and let E1 , · · · , En ∈ F,
where n ∈ N. Then,
n n
!
\ X
P Ei ≥ P (Ei ) − (n − 1).
i=1 i=1
Proof
See it during the lecture.
Note
The Bonferroni’s inequality holds only for the probability of
finite intersection of events!
Conditional probability
Example
Let us toss two fair coins. Let A denote that both coins show
same face and B denote at least one coin shows head. Obtain
the probability of happening of A given that B has already
occured.
Solution
Listen to my lecture.
Definition
Let (Ω, F, P ) be a probability space and B ∈ F be a fixed event
such that P (B) > 0. Then, the conditional probability of event
A given that B has already occured is defined as
P (A ∩ B)
P (A|B) = .
P (B)
Conditional probability (cont...)
Example
Solution
Clearly,
13 13 39 13
6 5 1 + 6
P (A ∩ B) = P (A) = 52 and P (B) = 52 .
6 6
(13
6)
Thus, P (A|B) = .
( )(39
13
5
+ 13
1) (6)
Conditional probability (cont...)
Note
For events E1 , E2 · · · , En ∈ F, n ≥ 2, we have
P (E1 ∩ E2 ) = P (E1 )P (E2 |E1 ) if P (E1 ) > 0
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 |E1 )P (E3 |E1 ∩ E2 ) if
P (E1 ∩ E2 ) > 0. This condition also gurantees that
P (E1 ) > 0, since E1 ∩ E2 ⊂ E1
P (∩ni=1 Ei ) =
P (E1 )P (E2 |E1 )P (E3 |E1 ∩E2 ) · · · P (En |E1 ∩E2 ∩· · ·∩En−1 ),
provided P (E1 ∩ E2 ∩ · · · ∩ En−1 ) > 0, which also
guarantees that P (E1 ∩ E2 ∩ · · · ∩ Ei ) > 0, for
i = 1, 2, · · · , n − 1.
Conditional probability (cont...)
Example
An urn contains four red and six black balls. Two balls are
drawn successively, at random and without replacement,
from the urn. Find the probability that the first draw
resulted in a red ball and the second draw resulted in a
black ball.
Solution
Let A denote the event that the first draw results in a red
ball and B that the second ball results in a black ball.
Then,
4 6 12
P (A ∩ B) = P (A)P (A|B) = × = .
10 9 45
Total probability
Proof
Let F = ∪i∈A Ei . Then, P (F ) = P (Ω) = 1 and
P (F c ) = 1 − P (F ) = 0. Again,
E ∩ F c ⊂ F c ⇒ 0 ≤ P (E ∩ F c ) ≤ P (F c ) = 0.
Total probability (cont...)
Proof (cont...)
Thus,
P (E) = P (E ∩ F ) + P (E ∩ F c )
= P (E ∩ F )
= P (∪i∈A (E ∩ Ei ))
X
= P (E ∩ Ei )
i∈A
X
= P (E|Ei )P (Ei ),
i∈A
Theorem
Let (Ω, F, P ) be a probability space and let {Ei ; i ∈ A} be a
countable collection of mutually exclusive and exhaustive events
with P (Ei ) > 0 for i ∈ A. Then, for any event E ∈ F, with
P (E) > 0, we have
P (E|Ej )P (Ej )
P (Ej |E) = P , j ∈ A.
i∈A P (E|Ei )P (Ei )
Proof
For j ∈ A,
P (Ej ∩E) P (E|Ej )P (Ej ) P (E|Ej )P (Ej )
P (Ej |E) = P (E) = P (E) = P P (E|E )P (E )
from
i∈A i i
the theorem of total probability.
Bayes theorem (cont...)
Note
P (Ej ), j ∈ A are known as the prior probabilities.
P (Ej |E) are known as the posterior probabilities.
Bayes theorem (cont...)
Example
Urn U1 contains four white and six black balls and urn U2
contains six white and four black balls. A fair die is cast and
urn U1 is selected if the upper face of die shows 5 or 6 dots,
otherwise urn U2 is selected. A ball is drawn at random from
the selected urn.
Given that the drawn ball is white, what is the conditional
probability that it came from U1 .
Given that the ball is white, find the conditional
probability that it came from urn U2 .
Solution
W → drawn ball is white;
E1 → Urn U1 is selected;
E2 → Urn U2 is selected.
Bayes theorem (cont...)
Solution (contd...)
E1 and E2 are mutually exclusive and exhaustive events.
P (W |E1 )P (E1 )
(i) P (E1 |W ) =
P (W |E1 )P (E1 ) + P (W |E2 )P (E2 )
4 2
10 × 6 1
= 4 2 6 4 = 4.
10 × 6 + 10 × 6
Note
If P (B) = 0, then P (A ∩ B) = 0 = P (A)P (B) for all
A ∈ F. That is, if P (B) = 0, then any event A ∈ F and B
are independent.
If P (B) > 0, then A and B are said to be independent if
and only if P (A|B) = P (A).
Independence
Let (Ω, F, P ) be a probability space. Let A ⊂ R be an index set
and let {Eα : α ∈ A} be a collection of events in F.
Events {Eα : α ∈ A} are said to be pairwise independent if
any pair of events Eα and Eβ , α 6= β in the collection
{Ej : j ∈ A} are independent, that is, if
P (Eα ∩ Eβ ) = P (Eα )P (Eβ ), α, β ∈ A and α 6= β.
Let A = {1, 2, · · · , n} for some n ∈ N . The events
E1 , · · · , En are said to be independent if for any sub
collection {Eα1 , · · · , Eαk } of {E1 , · · · , En } (k = 2, 3, · · · , n)
n
Y
P (∩nj=1 Eαj ) = P (Eαj ).
j=1
Independence
Independence ⇒ pairwise independence
pairwise independence ; Independence (always!)
Solution
See during the lecture!
Assignment-I
Problems
Q1. A student prepares for a quiz by studying a list of ten
problems. She only can solve six of them. For the quiz, the
instructor selectes five questions at random from the list of
ten. What is the probability that the student can solve all
five problems on the examination?
Q2. A total of n shells is fired at a target. The probability that
the ith shell hitting the target is pi , i = 1, · · · , n. Find the
probability that at least two shells out of n find the target.
Q3. A bag contains 5 white and 2 black balls and balls are
drawn one by one without replacement. What is the
probability of drawing the second white ball before the
second black ball?
Assignment-I (cont...)
Problems
Q4. Balls are drawn repeatedly and with replacement from a
bag consisting of 60 white and 30 black balls. What is the
probability of drawing the third white ball before the
second black ball?
Q5. Let A and B be two events which are independent. Then,
show that A and B c , Ac and B, and Ac and B c are
independent.
Q6. Consider the experiment of tossing a coin three times. Let
Hi , i = 1, 2, 3, denote the event that the ith toss is a head.
Assuming that the coin is fair and has an equal probability
of landing heads or tails on each toss, the events H1 , H2
and H3 are mutually independent.
Assignment-I (cont...)
Problems
Q7. When coded messages are sent, there are sometimes errors
in transmission. In particular, Morse code uses “dots" and
“dashes", which are known to occur in the proportion of
3 : 4. This means that for any given symbol,
3 4
P (dot sent) = and P (dash sent) = .
7 7
Suppose there is interference on the transmission line, and
with probability 18 a dot is mistakenly received as a dash,
and vice versa. If we receive a dot, can we be sure that a
dot was sent? (Ans. 21/25)
Solve more problems other than these exercises if you are
willing to have good grade.
Part-II
Random variable
Motivation
Someone may not be interested in the full physical
description of the sample space or events. Rather, one may
be interested in the numerical characteristic of the event
considered.
For example, suppose some components have been put on a
test. After ceratain time t > 0, we may be interested that
how many of these are functioning or how many are not
functioning. Here, we are not interested which unit have
failed to work.
To study certain phenomena of a random experiment, it is
required to quantify the phenomena. One option is to
associate a real number to every outcome of the random
experiment. This encourages us to develop the concept of
the random variable.
Random variable (cont...)
Definition
Let (Ω, F, P ) be a probability space and let X : Ω → R be a
given function. We say that X is a random variable if
Alternative
Let (Ω, F, P ) be a probability space. Then, a real valued
measurable function defined on the sample space is known as
the random variable.
Random variable (cont...)
Theorem
Let (Ω, F, P ) be a probability space and let X : Ω → R be a
given function. Then, X is a random variable if and only if
for all a ∈ R.
Random variable (cont...)
Example
Consider the experiment of tossing of a coin. Then, the sample
space is Ω = {H, T }. Define X as the number of heads. Then,
X(H) = 1 and X(T ) = 0. Consider
Definition
A function F : R → R defined by
Theorem
Let FX be the distribution function of a random variable X.
Then,
FX is non-decreasing.
FX is right continuous.
FX (∞) = 1 and FX (−∞) = 0.
Distribution function (cont...)
Example
Suppose that a fair coin is independently flipped thrice. Then,
the sample space is
Example (cont...)
The distribution function of X is
0, x<0
1
8,
0≤x<1
1
FX (x) = 2 , 1 ≤ x < 2
7
8, 2 ≤ x < 3
1, x ≥ 3.
Note
Let −∞ < a < b < ∞. Then,
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a)
P (a < X < b) = P (X < b) − P (X ≤ a)
P (a ≤ X < b) = P (X < b) − P (X < a)
P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a)
P (X ≥ a) = 1 − P (X < a)
P (X > a) = 1 − P (X ≤ a)
Theorem
Let G : R → R be a non-decreasing and right continuous
function for which G(−∞) = 0 and G(+∞) = 1. Then, there
exists a random variable X defined on a probability space
(Ω, F, P ) such that the distribution function of X is G.
Distribution function (cont...)
Example
Consider a function G : R → R defined by
(
0, x<0
G(x) = −x
1−e , x≥0
Observations
Clearly, G is nondecreasing, continuous and satisfies
G(−∞) = 0 and G(∞) = 1. Thus, G is a distribution
function for a random variable X.
Since G is continuous, we have
P (X = x) = G(x) − G(x− ) = 0 for all x ∈ R, where G(x− )
is the left hand limit of G at the point x.
Distribution function (cont...)
Example (cont...)
For −∞ < a < b < ∞, P (a < X < b) = P (a ≤ X < b) =
P (a ≤ X ≤ b) = P (a < X ≤ b) = G(b) − G(a).
P (X ≥ a) = P (X > a) = 1 − G(a) and
P (X < a) = P (X ≤ a) = G(a).
P (2 < X ≤ 3) = G(3) − G(2) = e−2 − e−3
P (−2 < X ≤ 3) = G(3) − G(−2) = 1 − e−3
P (X ≥ 2) = 1 − G(2) = e−2
P (X > 5) = 1 − G(5) = e−5 .
Note that the sum of sizes of jumps of G is 0.
Types of the random variables
Definition
A random variable X is said to be of discrete type if there
exists a non-empty and countable set SX such that
and
Theorem
Let X be a random variable with distribution function FX and
let DX be the set of discontinuity points of FX . Then, X is of
discrete type if and only if
P (X ∈ DX ) = 1.
Definition
Let X be a discrete type random variable with support SX . The
function fX : R → R defined by
(
P (X = x), x ∈ SX
fX (x) = c
0, x ∈ SX
Example
Let us consider a random variable X having the distribution
function FX : R → R defined by The distribution function of X
is
0, x<0
1
8,0≤x<2
1
4,
2≤x<3
1
FX (x) = 2 , 3 ≤ x < 6
4
5 , 6 ≤ x < 12
7
8 , 12 ≤ x < 15
1, x ≥ 15.
Solution
The set of discontinuity points of FX is DX = {0, 2, 3, 6, 12, 15}
and P (X ∈ DX ) = x∈DX [FX (x) − FX (x− )] = 1. Thus, the
P
Remark
The PMF of a discrete type random variable X having support
SX satisfies the following properties:
c .
(i) fX (x) > 0 for all x ∈ SX and fX (x) = 0 for all x ∈ SX
P P
(ii) x∈SX fX (x) = x∈SX P (X = x) = 1
Conversely, if a function satisfies the above two properties, then
it is a probability mass function.
Continuous and absolutely continuous random variables
Definition
A random variable X is said to be of
continuous type if its distribution function FX is
continuous everywhere.
absolutely continuous type if there exits an integrable
function fX : R → R such that fX (x) ≥ 0 for all x ∈ R and
Z x
FX (x) = fX (t)dt, ∀x ∈ R.
−∞
Note
If fX (x) is the probability density function of an absolutely
continuous random variable X, then
(i) fX (x) ≥ 0 for all x ∈ R
R∞
(ii) −∞ fX (t)dt = FX (∞) = 1.
Example
Let X be a random variable having the distribution function
(
0, x<0
FX (x) = −x
1−e , x≥0
Example (cont...)
Also, Z x
FX (x) = fX (t)dt, ∀x ∈ R,
−∞
where
(
0, t<0
fX (t) = −t
e , t≥0
P (X = x) = FX (x) − FX (x− ) = 0, ∀x ∈ R.
Continuous and absolutely continuous random variables
(cont...)
Note
Let X be a random variable of absolutely continuous type.
Then, X is also of continuous type and thus, P (X = x) = 0 for
all x ∈ R. Consequently,
Rx
P (X < x) = P (X ≤ x) = FX (x) = −∞ fX (t)dt for all
x ∈ R.
R∞
P (X ≥ x) = P (X > x) = x fX (t)dt for all x ∈ R.
for −∞ < a < b < ∞, P (a < X ≤ b) = P (a ≤ X < b) =
P
Rb
(a < X < b) =
Ra
P (a ≤ X ≤ b)R = FX (b) − FX (a) =
b
−∞ fX (t)dt − −∞ fX (t)dt = a fX (t)dt.
Continuous and absolutely continuous random variables
(cont...)
Note
Suppose that the distribution function FX of a random variable
X is differentiable at every x ∈ R. Then,
Rx 0
FX (x) = −∞ FX (t)dt for all x ∈ R.
It follows that if FX is differentiable everywhere then the
random variable X is of absolutely continuous type and one
may take its probability density function to be
Example
Let X be a random variable with probability density function
(
k − |x|, |x| < 12
fX (x) =
0, otherwise,
where k ∈ R. Then,
(i) find the value of k.
(ii) Evaluate P (X < 0), P (X ≤ 0), P (0 < X ≤ 41 ),
P (0 ≤ X < 14 ) and P (− 81 ≤ X ≤ 14 ).
(iii) Find the distribution function of X.
Solution
See it during the lecture!
Expectation
Definition
Let X be a discrete type random variable with probability
mass function fX and support SX . We say that the
expected value of X (denoted by E(X)) is finite and equals
X
E(X) = xfX (x)
x∈SX
Example
Let X be a random variable with probability mass function
(
( 12 )x , x ∈ {1, 2, 3, · · · }
fX (x) =
0, otherwise,
Example
Let X be a random variable with probability mass function
(
6
π 2 x2
, x ∈ {−1, +1, −2, +2, −3, +3, · · · }
fX (x) =
0, otherwise,
Example
Let X be a random variable with probability density function
e−|x|
(
2 , −∞ < x < ∞
fX (x) =
0, otherwise,
Example
Let X be a random variable with probability density function
(
1 1
π 1+x2 , −∞ < x < ∞
fX (x) =
0, otherwise,
Theorem
Let X be a random variable of discrete type with support
SX and probability mass function fX . Let h : R → R be a
Borel function and let T = h(X). Then,
X
E(T ) = h(x)fX (x),
x∈SX
provided it is finite.
Let X be a random variable of (absolutely) continuous
type with probability density function fX . Let h : R → R
be a Borel function and let T = h(X). Then,
Z ∞
E(T ) = h(x)fX (x)dx,
−∞
provided it is finite.
Expectation and moments
Definition
Let X be a random variable defined on some probability space.
µ01 = E(X), provided it is finite, is called the mean of the
random variable X.
For r ∈ {1, 2, · · · }, µ0r = E(X r ), provided it is finite, is
called the rth moment of X.
For r ∈ {1, 2, · · · }, µr = E((X − µ01 )r ), provided it is finite,
is called the rth central moment of X.
µ2 = E((X − µ01 )2 ), provided it is finite, is called the
variance of X. We denote V ar(X) = E((X − µ01 )2 ). The
√ q
quantity σ = µ2 = E((X − µ01 )2 ) is called the standard
deviation of X.
Expectation and moments (cont...)
Theorem
Let X be a random variable.
For real constants a and b, E(aX + b) = aE(X) + b,
provided the involved expectations are finite.
If h1 , · · · , hm are Borel functions, then
m m
!
X X
E hi (X) = E(hi (X))
i=1 i=1
Theorem
Let X be a random variable with finite first two moments and
let E(X) = µ. Then,
V ar(X) = E(X 2 ) − (E(X))2
V ar(X) ≥ 0. Moreover, V ar(X) = 0 if and only if
P (X = µ) = 1
E(X 2 ) ≥ (E(X))2 [Cauchy-Schwarz inequality]
For any real constants a, b,
V ar(aX + b) = a2 V ar(X).
Expectation and moments (cont...)
Example
Let X be a random variable with probability density function
1
2,
−2 < x < −1
fX (x) = x
, 0<x<3
9
0, otherwise.
Solution
See during lecture!
Mean, median and mode
Mean
The mean of a random variable X is given by µ01 = E(X).
Mean of a probability distribution gives us idea about the
average observed value of X in the long run.
Median
A real number m satisfying FX (m− ) ≤ 21 ≤ FX (m), that is,
P (X < m) ≤ 12 ≤ P (X ≤ m) is called the median of X.
The median of probability distribution divides SX into two
equal parts each having the same probability of occurrence.
If X is continuous, then median m is given by FX (m) = 21 .
For discrete case, the median may not be unique.
Mean, median and mode (cont...)
Mode
The mode m0 of a probability distribution is the value that
occurs with highest probability. It is defined as
m0 = sup{fX (x) : x ∈ SX }.
Example
Consider a random variable X with distribution function
0,
x<0
FX (x) = x3 , x ∈ [0, 1]
1, x > 1,
Solution
See the lecture.
Measures of skewness and kurtosis
Skewness
Skewness of a probability distribution is a measure of
asymmetry (or lack of symmetry).
A measure of skewness of the probability distribution of X
is defined as
µ3
β1 = 3/2 .
µ2
For symmetric distribution, β1 = 0.
β1 > 0 indicates that the data is positively skewed and
β1 < 0 indicates that the data is negatively skewed
Measures of skewness and kurtosis (cont...)
Kurtosis
Kurtosis of a probability distribution of X is a measure of
peakedness and thickness of tail of probability density
function of X relative to the peakedness and thickness of
tails of the density function of normal distribution.
A distrbution is said to have higher (lower) kurtosis than
the normal distribution if its density function in
comparison with the density function of a normal
distribution, has sharper (rounded) peak and longer, flatter
(shorter, thinner) tails.
Measures of skewness and kurtosis (cont...)
Kurtosis (cont..)
A measure of kurtosis of the probability distribution of X
is defined by
µ4
γ1 = 2 .
µ2
For normal distribution with mean µ and variance σ 2 ,
γ1 = 3. The quantity
γ2 = γ1 − 3
Definition
Let X be a random variable and let
A = {t ∈ R : E(|etX |) = E(etX ) is finite}. Define MX : A → R
by
MX (t) = E(etX ), t ∈ R
Theorem
Let X be a random variable with moment generating function
(r)
MX . Then, for each r ∈ {1, 2, · · · } µ0r = E(X r ) = MX (0),
where MXr (.) is the rth derivative of M (t).
X
Examples
(i) Obtain the momet generating function of X, with
probability mass function
e−λ λx
fX (x) = , λ > 0, x = 0, 1, 2, · · · .
x!
(ii) Obtain the moment generating function of X, with
probability density function
Problems
Q1. Consider a random variable X with probability mass
function
(
x−1 3 x−2
16 ( 4 ) , x = 2, 3, 4, · · ·
fX (x) =
0, otherwise.
Q5. Let us caste two fair die. Denote the sum of the outcomes
by X. Show that X is a random variable.
Q6. Derive mean and variance of X with the probability mass
function given by
( n
x n−x ,
fX (x) = x p (1 − p) x = 0, 1, 2, 3, 4, · · · , n
0, otherwise,
Bernoulli distribution
A random experiment is said to be a Bernoulli experiment
if its each trial results in just two possible outcomes:
success and failure.
Each replication of a Bernoulli experiment is called a
Bernoulli trial.
A discrete random variable X with support SX = {0, 1} is
said to follow Bernoulli distribution if its probability mass
function is given by
1 − p, x=0
(
px (1 − p)1−x , x = 0, 1
fX (x) = p, x=1 =
0, otherwise,
0, otherwise,
Binomial distribution
Physical conditions for Binomial distribution: (we get the
binomial distribution under the following experimental
conditions)
Each trial results in two mutually disjoint outcomes,
termed as success and failure.
The number of trials n is finite.
The trials are independent of each other.
The probabilty of success p is constant for each trial.
Distributions
Examples
1. Ten coins are thrown simultaneously. Find the probability
of getting at least seven heads. (Ans: 176/1024)
2. The mean and variance of binomial distribution are 4 and
4/3, respectively. Find P (X ≥ 1). (Ans: 0.9986)
3. Let X be binomially distributed with parameters n and p.
What is the distribution of n − X? (Ans: Binomial(n,1-p))
9
2
4. The moment generating function of X is 3 + 31 et . Find
P (0.2 < X < 5.6).
5. Let X1 , . . . , Xk be a random sample from Binomial(3, 0.4).
Find the distribution of S = ki=1 Xi .
P
Solution
See the lecture.
Distributions
Poisson distribution
A discrete type random variable X with support
SX = {0, 1, . . . , ∞} is said to follow Poisson distribution if its
probability mass function is given by
e−λ λx
(
x! , x ∈ SX
fX (x) = P (X = x) = c ,
0, x ∈ SX
where λ > 0.
Some situations
Number of occurrences in a given time interval.
Number of accidents in a particular junction of a city.
Number of deaths from a disease such as heart attack or
due to snake bite.
Distributions
P (X = 2) = 9P (X = 4) + 90P (X = 6),
Hypergeometric distribution
An urn has 1000 balls: 700 green, 300 blue.
(a)
Distributions
p = 700/1000 = 0.7.
Distributions
Hypergeometric distribution
An urn contains N number of balls with K number of
green balls and N − K number of blues balls. A sample of
n balls is drwan without replacement. What is the
probability that there are k number of green balls?
Let random variable X be the number of green balls
drawn. Then,
K N −K
k n−k
P (X = k) = N
.
n
np(1−p)(N −n)
E(X) = np and V ar(X) = N −1 , where p = K/N.
Distributions
Example
An urn has 1000 balls: 700 green, 300 blue. A sample of 7 balls
is drawn. What is the probability that it has 3 green balls and
4 blue balls?
Sampling with replacement (binomial): Ans: 0.0972405
Sampling without replacement (hypergeometric): Ans:
0.0969179
Distributions
br+1 −ar+1
For r ∈ {1, 2, . . .}, µ0r = E(X r ) = (r+1)(b−a) .
For r ∈ {1, 2, . . .}, µr = E((X − µ01 )r )=
(b−a)r , r = 2, 4, 6, . . .
2r (r+1)
0, r = 1, 3, 5, . . . ,
Distributions
MGF:
ebt −eat , t 6= 0
(b−a)t
MX (t) =
1, t = 0,
Distributions
Exponential distribution
A continuous type random variable X is said to have
exponential distribution with parameter λ > 0 if the probability
density function is given by
(
λe−λx , x > 0
fX (x|λ) =
0, otherwise,
Mean=E(X)=1/λ. Var(X)=1/λ2 .
CDF:
(
1 − e−λx , x > 0
FX (x) =
0, otherwise,
Distributions
Theorem
The exponential distribution has the memoryless (forgetfulness)
property. A variable X with positive support is memoryless if
for all t > 0 and s > 0 P (X > s + t|X > t) = P (X > s).
Proof
See the lecture!
The idea of the memoryless properly, for example, is that
Exponential distribution
Moment generating function=MX (t) = (1 − θt )−1 .
Solution
see the lecture.
Distributions
Gamma distribution
PDF
1 − xθ α−1
θα Γ(α) e x , x>0
fX (x|α, θ) =
0, otherwise,
Gamma distribution
Moment generating function=MX (t) = (1 − tθ)−α , t < 1θ .
Distributions
Normal distribution
An absolutely continuous random variable X is said to
follow normal distribution with mean µ ∈ R and standard
deviation σ > 0 if its probability density function is given
by
1 (x−µ)2
fX (x|µ, σ) = √ e− 2σ2 , −∞ < x < ∞.
2πσ
We denote X ∼ N (µ, σ 2 ).
The normal distribution with mean 0 and variance 1 is
called standard normal distribution. It is denoted by
N (0, 1). The probability density function of the standard
normal variate Z is given by
1 z2
φ(z) = √ e− 2 , −∞ < z < ∞.
2π
Distributions
Normal distribution
The cumulative distribution function of the standard
noprmal variate Z is given by
Z z Z z
1 x2
Φ(z) = φ(x)dx = √ e− 2 dx, −∞ < z < ∞.
−∞ 2π −∞
Normal distribution
Many human characteristics, such as height, IQ or
examination scores of a large number of people, follow the
normal distribution.
The model probably originated in 1733 in the work of the
mathematician Abraham Demoivre, who was interested in
laws of chance governing gambling, and it was also inde-
pendently derived in 1786 by Pierre Laplace, an
astronomer and mathematician.
Distributions
Normal distribution
However, the normal curve as a model for error distribution
in scientific theory is most commonly associated with a
German astronomer and mathematician, Karl Friedrich
Gauss, who found a new derivation of the formula for the
curve in 1809. For this reason, the normal curve is
sometimes referred to as the “Gaussian” curve. In 1835
another mathematician and astronomer, Lambert Qutelet,
used the model to describe human physiological and social
traits. Qutelet believed that “normal” meant average and
that deviations from the average were nature’s mistakes.
Almost all the scores (0.997 of them) lie within 3 standard
deviations of the mean.
Distributions
Problems
Q1. Let {Yi }, i = 1, . . . , n be the independent Bernoulli random
variables with parameter p. Obtain the mean and variance
of Y = ni=1 Yi .
P
P (X = x) = (1 − p)x−1 p, x = 1, 2, . . . .
Problems
Q4. The PDF of X is
4x,
0 ≤ x ≤ 1/2
fX (x) = 4 − 4x, 1/2 < x ≤ 1
0, otherwise,
Problems
Q6. The expected number of typos on a page of a new Harry
Potter book is 0.2. What is the probability that the next
page you read contains (i) 0 typos, (ii) 2 or more typos.
(iii) Explain what assumptions you have used.
Q7. An egg carton contains 20 eggs, of which 3 have a double
yolk. To make a pancake, 5 eggs from the carton are picked
at random. What is the probability that at least 2 of them
have a double yolk?
Q8. Suppose X has density function
(
ax + b, 0≤x≤1
fX (x) =
0, otherwise
Problems
Q9. Suppose X has density function
(
1
fX (x) = a−1 , 1<x<a
0, otherwise
Problems
Q11. The length of human pregnancies from conception to birth
approximates a normal distribution with a mean of 266
days and a standard deviation of 16 days. What proportion
of all pregnancies will last between 240 and 270 days
(roughly between 8 and 9 months)? 4. What length of time
marks the shortest 70% of all pregnancies? [Ans: 0.5471,
274.32]
Q12. A manufacturing process produces semiconductor chips
with a known failure rate 6.3%. Assume that chip failures
are independent of one another. You will be producing
2000 chips tomorrow.
(a.) Find the expected number of defective chips produced.
(b.) Find the standard deviation of the number of defective
chips.
(c.) Find the probability of producing less than 135 defects.
Part-IV
Two dimensional random variables
Some results
The random variables X and Y are said to be independent
if and only if fXY (x, y) = fX (x)fY (y). Otherwise,
dependent.
Let u(x, y) be a function of two variables. Then,
X X
E(u(X, Y )) = u(x, y)fXY (x, y),
(x,y)∈SXY
provided it exists.
PP
E(X) = xfXY (x, y),
P P (x,y)∈SXY
E(Y ) = (x,y)∈SXY yfXY (x, y).
Examples
Q1. Suppose we toss a pair of four sided dice, in which one is
red and other is black. Let X and Y denote the outcomes
on the red and black dice, respectively. Obtain the joint
PMF of (X, Y ). Obtain the marginal PMFs of X and Y.
Are X and Y independent? Find E(X), E(Y ), Var(X) and
Var(Y).
Q2. Two dice are thrown simulataneously. Let X be the sum of
the outcomes of two dice. Suppose Y =|difference of the
outcomes of two dice|. Obtain the joint PMF of (X, Y ).
Further, obtain the marginal PMFs of X and Y. Find the
conditional PMF of X given Y = 2.
Two dimensional random variables
Some results
The random variables X and Y are said to be independent
if and only if fXY (x, y) = fX (x)fY (y). Otherwise,
dependent.
Let u(x, y) be a function of two variables. Then,
Z ∞ Z ∞
E(u(X, Y )) = u(x, y)fXY (x, y)dxdy,
−∞ −∞
provided it exists.
∞
R ∞ R
E(X) = R −∞ −∞ xfXY (x, y)dxdy,
∞ R ∞
E(Y ) = −∞ −∞ yfXY (x, y)dxdy.
∞R ∞ R 2
V ar(X) = R −∞ −∞ (x − E(X)) fXY (x, y)dxdy
∞ R∞
V ar(Y ) = −∞ −∞ (y − E(Y ))2 fXY (x, y)dxdy.
Two dimensional random variables
Examples
Q1. Let X and Y have joint PDF
(
4xy, 0 < x < 1, 0 < y < 1
fXY (x, y) =
0, otherwise.
Examples
Q3. Let X and Y have joint PDF
(
6xy 2 , 0 < x < 1, 0 < y < 1
fXY (x, y) =
0, otherwise.
Examples
Q4. Let the joint PDF of (X, Y ) be
(
cx + 1, x, y ≥ 0, x + y < 1
fXY (x, y) =
0, otherwise.
Examples
Q6. Let the joint PDF of (X, Y ) be
(
2x, 0 ≤ x ≤ 1
fXY (x, y) =
0, otherwise.
Examples
Q7. Let the joint PDF of (X, Y ) be
( √
6xy, 0 ≤ x ≤ 1, 0 ≤ y ≤ x
fXY (x, y) =
0, otherwise.
Definition
Simple random sampling (SRS) is a method of selection of a
sample comprising of n number of sampling units out of the
population having N number of sampling units such that every
sampling unit has an equal chance of being chosen.
Notation
N : Number of sampling units in the population
(Population size).
n: Number of sampling units in the sample (sample size)
SRSWOR
If n units are selected by SRSWOR, the total number of
N
possible samples are n
So the probability of selecting any one of these samples is
1/ N
n .
Probability of drawing a sample
SRSWOR
Note that a unit can be selected at any one of the n draws.
Let ui be the ith unit selected in the sample. This unit can
be selected in the sample either at first draw, second draw,
..., or nth draw.
Let Pj (i) denote the probability of selection of ui at the jth
draw, j = 1, . . . , n. Then,
SRSWOR
Now if u1 , u2 , . . . , un are the n units selected in the sample,
then the probability of their selection is
SRSWOR
n n−1 1
If P (u1 ) = N, then P (u2 ) = N −1 , . . . , P (un ) = N −n+1 .
Thus,
n n−1 1
P (u1 , u2 , . . . , un ) = × × ... ×
N N −1 N −n+1
1
= N
.
n
Probability of drawing a sample
SRSWR
When n units are selected with SRSWR, the total number
of possible samples are N n .
1
The Probability of drawing a sample is Nn .
Alternatively, let ui be the ith unit selected in the sample.
This unit can be selected in the sample either at first draw,
second draw, ..., or nth draw. At any stage, there are
always N units in the population in case of SRSWR, so the
probability of selection of ui at any stage is 1/N for all
i = 1, . . . , n.
Probability of drawing a sample
SRSWR
Then the probability of selection of n units u1 , u2 , ..., un in
the sample is
SRSWOR
Let Al denote an event that a particular unit uj is not
selected at the lth draw. The probability of selecting, say,
jth unit at kth draw is
This is equal to
1 1 1 1 1
(1− )(1− )(1− )...(1− ) ,
N N −1 N −2 N −k+2 N −k+1
1
which is equal to N.
Probability of drawing a unit
SRSWR
P[selection of uj at kth draw]= N1
Sampling distribution of smaple mean and sample
variance
Theorem
Let X1 , . . . , Xn be a random sample drawn from the normal
population with mean µ and variance σ 2 . Denote X̄ and S 2 as
the sample mean and sample variance, respectively, where
X̄ = n1 ni=1 Xi and S 2 = n−1 1 Pn 2
i=1 (Xi − X̄) . Then,
P
Proof
see the lecture!
Part-VI
Statistical inference
(b)
Statistical inference
In statistical inference, we
draw conclusions about a
population based on the data
obtained from a sample
chosen from it.
Models
Classification of models
(
Parametric Models
Models →
Non-parametric Models
(c)
An example
(Statistical Inference!)
Modeling the example
Illustration
Consider estimating the population mean of a normally
distributed population, say N (µ, 9).
The most obvious estimate is to simply draw a sample and
calculate the sample mean.
If we repeat this process with a new sample, we would
expect to get a different estimate.
The distribution that results from repeated sampling is
called the sampling distribution of the estimate.
Point estimation
(d)
Point estimation
Expected behaviours
We want our estimate to be close to the true value.
Also, we want δn to behave in a nice way as the sample size
n increases.
If we take a large sample, we would like the estimate to be
more accurate than a small sample.
Point estimation
(e)
Point estimation
Properties
Consistency- An estimator δn of θ is said to be consistent if
δn converges to θ in probability. Mathematically,
P (|δn − θ| > 0) → 0 as n → ∞.
E(δn ) = θ.
µk = E(X k ).
Example-1
Let X1 , . . . , Xn be a random sample drawn from a Poisson
population with probability mass function
e−λ λx
P (X = x) = , x = 0, 1, 2 . . . ,
x!
where λ > 0. Obtain the MOM estimator of λ.
Solution
Here E(X) = λ. So that µ1 = E(X) = λ = X̄ = µ̂1 . Hence, the
method of moments estimator of λ is the sample mean.
MOM estimators
Example-2
Let X1 , . . . , Xn be a random sample drawn from a gamma
population with probability density function
λα α−1 −λx
fX (x) = x e , x > 0, α, λ > 0.
Γ(α)
Solution
The first two moments of the gamma distribution are µ1 = αλ
and µ2 = α(α+1)
λ2
. After solving these two equations, we get the
MOM estimators, which are given by
µ̂1 X̄
λ̂ = = and α̂ = λ̂µ̂1 .
µ̂2 − (µ̂1 ) 2 ¯
X − (X̄)2
2
MOM estimators
Example-3
Suppose X is a discrete random variable with the probability
mass function
2θ
3, x=0
θ,
x=1
P (X = x) = 32(1−θ)
3 , x=2
1−θ ,
x = 3,
3
Thus, if we try to solve equation E(X) = X̄, we will not get the
estimator, because E(X) does not contain the unknown
parameter σ.
MOM estimators
Solution (cont...)
Now, let us calculate the second order theoretical moment, we
have Z ∞
1 |x|
2
µ2 = E(X ) = x2 e− σ dx = 2σ 2 .
−∞ 2σ
The second order sample moment is
n
1X
µ̂2 = X 2.
n i=1 i
L(θ) = fX (θ|x1 , . . . , xn ).
Example-5
Let X1 , . . . , Xn be a random sample from a population with
probability mass function
e−λ λx
(
P (X = x) = x! , x = 0, 1, 2, . . .
0, otherwise,
Solution
The log-likelihood function is
n
X n
X
l(λ) = log λ xi − nλ − log xi !.
i=1 i=1
MLE
Solution (cont...)
To find the maximum we set the first derivative to zero,
n
1X
l0 (λ) = xi − n = 0.
λ i=1
NOTE:- Note that this agrees with the MOM estimator (see
Example-1).
MLE
Example-6
Let X1 , . . . , Xn be a random sample drawn from a gamma
population with probability density function
λα α−1 −λx
fX (x) = x e , x > 0, α, λ > 0.
Γ(α)
Solution
The log likelihood is
n
X n
X
l(λ, α) = nα log λ − n log Γ(α) + (α − 1) log xi − λ xi .
i=1 i=1
MLE
Solution (cont...)
In this case we have two parameters so we take the partial
derivatives and set them both to zero.
n
∂l X Γ0 (α)
= log xi + n log λ − n = 0.
∂α i=1 Γ(α)
n
∂l nα X
= − xi = 0.
∂λ λ i=1
MLE
Solution (cont...)
This second equality gives the MLE for λ as
α̂
λ̂ = .
X̄
Substituting this into the first equation we find that the MLE
for α must satisfy,
n
X Γ0 (α̂)
n log α̂ − n log X̄ + log xi − n = 0.
i=1
Γ(α̂)
Example-7
Consider Example-5 with the additional assumption that
λ ≤ λ0 . Obtain the MLE of λ.
Solution
In this case, the MLE of λ is
(
X̄, X̄ ≤ λ0
λ̂RM L =
λ0 , X̄ > λ0 .
MLE
(f)
MLE
Confidence interval
Let X be a random variable with distribution Pθ , θ ∈ Θ.
Consider a random sample X1 . . . , Xn drawn from this
distribution. Let δ1 (X) and δ2 (X) be two statistics such that
(g)
Interval estimation
(h)
Interval estimation
(i)
Interval estimation
(j)
Interval estimation
Confidence interval
Let X be a random variable with distribution Pθ , θ ∈ Θ.
Consider a random sample X1 . . . , Xn drawn from this
distribution. Let δ1 (X) and δ2 (X) be two statistics such that
(k)
Interval estimation
(l)
Interval estimation
(m)
Interval estimation
(n)
Interval estimation