0% found this document useful (0 votes)
23 views

Lectures Ma 2203

Uploaded by

bishaljaiswal515
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lectures Ma 2203

Uploaded by

bishaljaiswal515
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 209

MA-2203: Introduction to Probability and Statistics

Lecture slides
by
Dr. Suchandan Kayal

Department of Mathematics
National Institute of Technology Rourkela
Rourkela - 769008, Odisha, India
Autumn, 2020
Outline (Part-I)

Historical motivation and preliminary notions


Methods of assigning probabilities
Classical method
Relative frequency method
Axiomatic approach to probability
Some consequences
Boole’s and Bonferroni’s inequalities
Conditional probabilty
Bayes theorem
Independent events
Assignment-I
Outline (Part-II)

Random variable and distribution function


Types of random variables
Discrete random variable and probability mass function
Absolutely continuous random variable anmd probability
density function
Expectations, moments
Mean, median, mode
Measures of skewness and kurtosis
Moment generating functions
Historical motivation

Probability has its origin in the study of gambling and


insurance in seventeenth century.
These days, the theory of probability is an indispensible
tool of both the social and natural sciences.
A gambler’s dispute in 1654 led to the creation of the
probability theory by two French mathematicians Fermat
and Pascal.
Their motivation came from a problem related to gambling
proposed by a nobleman Chevalier de Mere. There was an
apparent contradiction concerning a popular dice game.
Historical motivation (cont...)

The game consisted in throwing a pair of dice 24 times.


The problem was to decide whether or not to bet even
money on the occurrence of at least one ‘double six’ during
the 24 throws.
A seemingly well-established gambling rule led de Mere to
believe that betting on a ‘double six’ in 24 throws would be
profitable. But his own calculations indicate just the
opposite.
This problem and others posed by de Mere led to an
exchange of letters between Pascal and Fermat in which
the fundamental principles of probability theory were
formulated for the first time.
Basic notions (statistical regularity)

One of the fundamental features of probability is that the


phenomena we are interested are random in nature.
Flipping of a coin: In this case, we do not know about the
event which will happen in the next flip. It may be head or
tail. However, in the long run, it is known to us that
aproximately 50% head and 50% tail will occur. So,

P (H) ≈ 0.5 and P (T ) ≈ 0.5

.
Birth of a offspring:

P (B) ≈ 0.5 and P (G) ≈ 0.5.

The long term behaviour of an event is known as statistical


regularity. This encourages to study the subject
probability.
Basic notions (experiment)

Experiment
An experiment is observing something happen or conducting
something under certain conditions which result in some
outcomes.

Example
Rainfall: It is a consequence of several things such as cloud
formation, elnino occurrence, humidity, atmospheric pressure
etc. Finally, we observe that there is rainfall. Thus, observing
weather is an experiment.

Types of experiment
Deterministic experiment: It results known outcomes under
certain conditions.
Random experiment: Under fixed conditions, the outcomes
are not known.
Basic notions (random experiment)

Random experiment
An experiment is said to be a random experiment if the
following conditions are satisfied.
The set of all possible outcomes of the experiment is known
in advance.
The outcomes of a particular performance (trial) of the
experiment can not be predicted in advance.
The experiemnt can be repeated under identical conditions.

Sample space
The collection of all possible outcomes of a random experiment
is called the sample space. It is denoted by Ω.
Basic notions (sample space and event)

Sample space/examples
Throwing of a die. Here Ω = {1, 2, 3, 4, 5, 6}.
Throwing of a die and tossing of a coin simultaneously.
Ω = {1, 2, 3, 4, 5, 6} × {H, T }
A coin is flipped repeatedly until a tail is observed.
Ω = {T, HT, HHT, HHHT, · · · }
Lifetime of a battery. Here Ω = [0, 10000].

Event
An event is a set of outcomes of an experiment (a subset of the
sample space) to which a probability is assigned.
Basic notions

Remarks on event
When the sample space is finite, any subset of the sample
space is an event. In this case, all elements of the power set
of the sample space are defined as events.
This approach does not work well in cases where the
sample space is uncountably infinite. So, when defining a
probability space it is possible, and often necessary to
exclude certain subsets of the sample space from being
events.
In general measure theoretic description of probability
spaces an event may be defined as an element of a selected
sigma-field of subsets of the sample space.
Basic notions (impossible and sure events)

Impossible event
An event is said to be impossible if the probability of
occurrence of that event is zero. For example, during the rolling
of a six faces die, the event that the face 7 will occur.

Sure event
An event with probability of occurrence one is called the sure
event. The sample space of any random experiment is always a
sure event. ANother example could be that the lifetime of a
battery is a nonnegative number.
Basic notions

Various operations
Union:
A ∪ B means occurrence of at least one of A and B.
∪ni=1 Ai means occurrence of at least one of Ai , i = 1, · · · , n.
∪∞i=1 Ai means occurrence of at least one of Ai ,
i = 1, · · · , ∞.
Intersection:
A ∩ B means simultaneous occurrence of both A and B.
∩ni=1 Ai means simultaneous occurrence of Ai , i = 1, · · · , n.
∩∞
i=1 Ai means simultaneous occurrence of Ai , i = 1, · · · , ∞.
Exhaustive events:
If ∪ni=1 Ai = Ω, we call A1 , · · · , An to be exhaustive events.
Basic notions

Various operations (cont...)


Disjoint events:
If A ∩ B = φ, an empty set, that is A and B can not occur
simulataneously, then A and B are called disjoint events or
mutually exclusive events. In this case, the happening of
one excludes the happening of the other.
Suppose {An }n≥1 be a sequence of events such that
Ai ∩ Aj = φ, for i 6= j, then A1 , A2 , · · · are said to be
pairwise disjoint or mutually exclusive events.
Complementation and substruction:
Ac means not happening of the event A.
A − B = A ∩ B c means happening of A but not of B.
Methods of assigning probabilities

A. Classsical approach
Assumptions:
A random experiment results in a finite number of equally
likely outcomes.
Let Ω = {ω1 , · · · , ωn } be a finite sample space with n ∈ N
possible outcomes, N denotes the set of natural numbers.
For a subset E of Ω, |E| denotes the number of elements in
E.
Result:
The probability of occurrence of an event E is given by

# of outcomes favourable to E |E| |E|


P (E) = = = .
Total # of outcomes in Ω |Ω| n
Methods of assigning probabilities/Classsical approach
(cont...)

Observations
For any event E, P (E) ≥ 0
For mutually exclusive events E1 , · · · , En ,
Pn n n
| ∪ni=1 Ei | i=1 |Ei |
X |Ei | X
P (∪ni=1 Ei ) = = = = P (Ei )
n n i=1
n i=1

|Ω|
P (Ω) = |Ω| = 1.
Methods of assigning probabilities/Classsical approach
(cont...)
Example-1
Suppose that in your section, we have 150 students born
in the same year. Assume that a year has 365 days. Find
the probability that all the students of your section are
born on different days of the year.

Solution
Denote the event that all the students are born on different
days of the year by E. Here,

|Ω| = 365140 and |E| = 365 × 364 × · · · × 266 = 365P 140 .


|E| 365P
140
Thus, P (E) = |Ω| = 365140
.
Methods of assigning probabilities/Classsical approach
(cont...)
Example-2
Find the probability of getting exactly two heads in three
tosses of a fair coin.

Solution
Denote the event that getting exactly two heads in three
tosses of a fair coin by E. Here,

Ω = {HHH, HHT, HT H, T HH, T HT, T T H, HT T, T T T }

and
E = {HHT, HT H, T HH}.
|E|
Thus, P (E) = |Ω| = 38 .
Methods of assigning probabilities/Classsical approach
(cont...)

Drawbacks
The random experiment must produce equally likely
outcomes.
The total number of outcomes of the random experiment
must be finite.
Methods of assigning probabilities

A. Relative frequency approach


Assumptions:
Suppose that a random experiment can be repeated
independently (the outcome of one trial is not affected by
the outcome of another trial) under identical conditions.
Let an denote the number of times (frequency) an event E
occurs in n trials of a random experiment.
Result:
Using weak law of large numbers, under mild conditions, it
can be shown that the relative frequencies an /n stabilize in
certain sense as n gets large.
an
P (E) = lim , provided the limit exists.
n→∞ n
Methods of assigning probabilities/Relative frequency
approach (cont...)

Observations
For any event E, P (E) ≥ 0
For mutually exclusive events E1 , · · · , En ,
n n
!
[ X
P Ei = P (Ei )
i=1 i=1

P (Ω) = 1.
Methods of assigning probabilities/Relative frequency
Example-3
After tossing a fair coin, we have the following outputs:

HHT HHT HHT HHT · · ·

Using relative frequency approach, find P (H).

Solution
Note that 
2k−1
 3k−2 ,

 k = 1, 2, · · ·
an 1 2 2 3 4 4 2k
= , , , , , , · · · = 3k−2 , k = 1, 2, · · ·
n 1 2 3 4 5 6 
 2k , k = 1, 2, · · ·

3k

an 2
Thus, lim = = P (H).
n→∞ n 3
Methods of assigning probabilities/Relative frequency
approach (cont...)

Drawbacks
The probability has been calculated based on an
approximation.
The random experiment has to be conducted a large
number of times. This is not always possible since some
experiments are costly (launching satellite).

n
lim = 0 ⇒ P (E) = 0 (not correct !).
n→∞ n

n− n
lim = 1 ⇒ P (E) = 1 (not correct !).
n→∞ n
Axiomatic approach to probability

Basic concepts
A set whose elements are themselves set is called a class of
sets. For example, A = {{2}, {2, 3}}.
A set function is a real-valued function whose domain is a
class of sets.
A sigma-field of subsets of Ω is a class F of subsets of Ω
satisfying the following properties:
(i) Ω ∈ F
(ii) E ∈ F ⇒ E c = Ω − E ∈ F (closed under complement)
(iii) Ei ∈ F, i = 1, 2, · · · ⇒ ∪∞
i=1 Ei ∈ F (closed under countably
infinite unions)
F = {φ, Ω} is a sigma (trivial) field.
Suppose A ⊂ Ω. Then, F = {φ, Ω, A, Ac } is a sigma field of
subsets of Ω.
Axiomatic approach to probability (cont...)

Definition
Let Ω be a sample space of a random experiment. Let F be the
event space or a sigma field of subsets of Ω. Then, a probability
function or a probability measure is a set function P , defined on
F, satisfying the following three axioms:
For any event E ∈ F, P (E) ≥ 0 (nonnegativity)
For a countably infinite collection of mutually exclusive
events E1 , E2 , · · · , we have

[ ∞
X
P( Ei ) = P (Ei )
i=1 i=1

(countably infinite additive)


P (Ω) = 1 (Probability of the sample space is one)
Axiomatic approach to probability (cont...)

Consequences of the axiomatic definition


Let (Ω, F, P ) be a probability space. Then,
(i) P (φ) = 0
(ii) for all E ∈ F, 0 ≤ P (E) ≤ 1 and P (E c ) = 1 − P (E)
(iii) for n mutually exclusive events Ei , i = 1, · · · , n,
n
[ n
X
P( Ei ) = P (Ei )
i=1 i=1

(iv) Let E1 .E2 ∈ F and E1 ⊂ E2 . Then,


P (E2 − E1 ) = P (E2 ) − P (E1 ) and P (E1 ) ≤ P (E2 )
(v) For E1 , E2 ∈ F, P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 )

Proof
See it during lecture.
Inequalities

Boole’s inequality (union bound proposed by George Boole)


Let (Ω, F, P ) be a probability space and let E1 , · · · , En ∈ F,
where n ∈ N. Then,
n
[ n
X
P( Ei ) ≤ P (Ei ).
i=1 i=1

Proof
See it during the lecture.

Note
To prove Boole’s inequality for the countable set of events, we
can use ∪ni=1 Ei → ∪∞
i=1 Ei for n → ∞ along with the continuity
of the probability measure P.
Inequalities (cont...)

Bonferroni’s inequality
Let (Ω, F, P ) be a probability space and let E1 , · · · , En ∈ F,
where n ∈ N. Then,
n n
!
\ X
P Ei ≥ P (Ei ) − (n − 1).
i=1 i=1

Proof
See it during the lecture.

Note
The Bonferroni’s inequality holds only for the probability of
finite intersection of events!
Conditional probability

Example
Let us toss two fair coins. Let A denote that both coins show
same face and B denote at least one coin shows head. Obtain
the probability of happening of A given that B has already
occured.

Solution
Listen to my lecture.

Definition
Let (Ω, F, P ) be a probability space and B ∈ F be a fixed event
such that P (B) > 0. Then, the conditional probability of event
A given that B has already occured is defined as

P (A ∩ B)
P (A|B) = .
P (B)
Conditional probability (cont...)

Example

Six cards are dealt at random (without replacement) from


a deck of 52 cards. Find the probability of getting all cards
of heart in a hand (event A) given that there are at least
5 cards of heart in the hand (event B).

Solution
Clearly,
13 13 39 13
6 5 1 + 6
P (A ∩ B) = P (A) = 52 and P (B) = 52 .
6 6

(13
6)
Thus, P (A|B) = .
( )(39
13
5
+ 13
1) (6)
Conditional probability (cont...)

Note
For events E1 , E2 · · · , En ∈ F, n ≥ 2, we have
P (E1 ∩ E2 ) = P (E1 )P (E2 |E1 ) if P (E1 ) > 0
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 |E1 )P (E3 |E1 ∩ E2 ) if
P (E1 ∩ E2 ) > 0. This condition also gurantees that
P (E1 ) > 0, since E1 ∩ E2 ⊂ E1
P (∩ni=1 Ei ) =
P (E1 )P (E2 |E1 )P (E3 |E1 ∩E2 ) · · · P (En |E1 ∩E2 ∩· · ·∩En−1 ),
provided P (E1 ∩ E2 ∩ · · · ∩ En−1 ) > 0, which also
guarantees that P (E1 ∩ E2 ∩ · · · ∩ Ei ) > 0, for
i = 1, 2, · · · , n − 1.
Conditional probability (cont...)

Example
An urn contains four red and six black balls. Two balls are
drawn successively, at random and without replacement,
from the urn. Find the probability that the first draw
resulted in a red ball and the second draw resulted in a
black ball.

Solution
Let A denote the event that the first draw results in a red
ball and B that the second ball results in a black ball.
Then,
4 6 12
P (A ∩ B) = P (A)P (A|B) = × = .
10 9 45
Total probability

Theorem of total probability


Let (Ω, F, P ) be a probability space and let {Ei ; i ∈ A} be a
countable collection of mutually exclusive and exhaustive events
(that is, Ei ∩ Ej = φ for i 6= j and P (∪i∈A Ei ) = P (Ω) = 1) such
that P (Ei ) > 0 for all i ∈ A. Then, for any event E ∈ A,
X X
P (E) = P (E ∩ Ei ) = P (E|Ei )P (Ei ).
i∈A i∈A

Proof
Let F = ∪i∈A Ei . Then, P (F ) = P (Ω) = 1 and
P (F c ) = 1 − P (F ) = 0. Again,
E ∩ F c ⊂ F c ⇒ 0 ≤ P (E ∩ F c ) ≤ P (F c ) = 0.
Total probability (cont...)

Proof (cont...)
Thus,

P (E) = P (E ∩ F ) + P (E ∩ F c )
= P (E ∩ F )
= P (∪i∈A (E ∩ Ei ))
X
= P (E ∩ Ei )
i∈A
X
= P (E|Ei )P (Ei ),
i∈A

since Ei ’s are disjoint implies that Ei ∩ E’s are disjoint.


Total probability (cont...)
Example
Urn U1 contains four white and six black balls and urn U2
contains six white and four black balls. A fair die is cast and
urn U1 is selected if the upper face of die shows 5 or 6 dots,
otherwise urn U2 is selected. If a ball is drawn at random from
the selected urn find the probability that the drawn ball is
white.
Solution
W → drawn ball is white; E1 → Urn U1 is selected; E2 → Urn
U2 is selected.
Here, {E1 , E2 } is a collection of mutually exclusive and
exhaustive events. Thus,

P (W ) = P (E1 )P (W |E1 ) + P (E2 )P (W |E2 )


2 4 4 6 8
= × + × = .
6 10 6 10 15
Bayes theorem

Theorem
Let (Ω, F, P ) be a probability space and let {Ei ; i ∈ A} be a
countable collection of mutually exclusive and exhaustive events
with P (Ei ) > 0 for i ∈ A. Then, for any event E ∈ F, with
P (E) > 0, we have

P (E|Ej )P (Ej )
P (Ej |E) = P , j ∈ A.
i∈A P (E|Ei )P (Ei )

Proof
For j ∈ A,
P (Ej ∩E) P (E|Ej )P (Ej ) P (E|Ej )P (Ej )
P (Ej |E) = P (E) = P (E) = P P (E|E )P (E )
from
i∈A i i
the theorem of total probability.
Bayes theorem (cont...)

Note
P (Ej ), j ∈ A are known as the prior probabilities.
P (Ej |E) are known as the posterior probabilities.
Bayes theorem (cont...)

Example
Urn U1 contains four white and six black balls and urn U2
contains six white and four black balls. A fair die is cast and
urn U1 is selected if the upper face of die shows 5 or 6 dots,
otherwise urn U2 is selected. A ball is drawn at random from
the selected urn.
Given that the drawn ball is white, what is the conditional
probability that it came from U1 .
Given that the ball is white, find the conditional
probability that it came from urn U2 .

Solution
W → drawn ball is white;
E1 → Urn U1 is selected;
E2 → Urn U2 is selected.
Bayes theorem (cont...)

Solution (contd...)
E1 and E2 are mutually exclusive and exhaustive events.

P (W |E1 )P (E1 )
(i) P (E1 |W ) =
P (W |E1 )P (E1 ) + P (W |E2 )P (E2 )
4 2
10 × 6 1
= 4 2 6 4 = 4.
10 × 6 + 10 × 6

(ii) Since E1 and E2 are mutually exclusive and


P (E1 ∪ E2 |W ) = P (Ω|W ) = 1, we have
3
P (E2 |W ) = 1 − P (E1 |W ) = .
4
Bayes theorem (cont...)

Observations from the previous example


P (E1 |W ) = 14 < 31 = P (E1 ) : that is, the probability of
occurrence of the event E1 decreases in the presence of the
information that the outcome will be an element of W.
P (E2 |W ) = 34 > 32 = P (E2 ) : that is, the probability of
occurrence of the event E2 increases in the presence of the
information that the outcome will be an element of W.
P (E1 ∩W )
P (E1 |W ) < P (E1 ) ⇔ P (W ) < P (E1 ) ⇔ P (E1 ∩ W ) <
P (E1 )P (W ).
P (E2 ∩W )
P (E2 |W ) > P (E2 ) ⇔ P (W ) > P (E2 ) ⇔ P (E2 ∩ W ) >
P (E2 )P (W ).
Definition
Let (Ω, F, P ) be a probability space and A and B be two
events. Events A and B are said to be
negatively associated (correlated) if P (A ∩ B) < P (A)P (B)
positively associated (correlated) if P (A ∩ B) > P (A)P (B)
independent if P (A ∩ B) = P (A)P (B)
dependent if they are not independent.

Note
If P (B) = 0, then P (A ∩ B) = 0 = P (A)P (B) for all
A ∈ F. That is, if P (B) = 0, then any event A ∈ F and B
are independent.
If P (B) > 0, then A and B are said to be independent if
and only if P (A|B) = P (A).
Independence
Let (Ω, F, P ) be a probability space. Let A ⊂ R be an index set
and let {Eα : α ∈ A} be a collection of events in F.
Events {Eα : α ∈ A} are said to be pairwise independent if
any pair of events Eα and Eβ , α 6= β in the collection
{Ej : j ∈ A} are independent, that is, if
P (Eα ∩ Eβ ) = P (Eα )P (Eβ ), α, β ∈ A and α 6= β.
Let A = {1, 2, · · · , n} for some n ∈ N . The events
E1 , · · · , En are said to be independent if for any sub
collection {Eα1 , · · · , Eαk } of {E1 , · · · , En } (k = 2, 3, · · · , n)
n
Y
P (∩nj=1 Eαj ) = P (Eαj ).
j=1
Independence
Independence ⇒ pairwise independence
pairwise independence ; Independence (always!)

Example of independent events


Rolling two dice, x1 and x2 . Let A be the event x1 = 3 and
B be the event x2 = 4. Then, A and B are independent.
Example
Take four identical marbles. On the first write symbols
A1 A2 A3 . On each of the other three, write A1 , A2 and A3 ,
respectively. Put the four marbles in an urn and draw one at
random. Let Ei denote the event that the symbol Ai appears
on the drawn marble. Then, show that E1 , E2 and E3 are not
independent though they are pairwise independent.

Solution
See during the lecture!
Assignment-I

Problems
Q1. A student prepares for a quiz by studying a list of ten
problems. She only can solve six of them. For the quiz, the
instructor selectes five questions at random from the list of
ten. What is the probability that the student can solve all
five problems on the examination?
Q2. A total of n shells is fired at a target. The probability that
the ith shell hitting the target is pi , i = 1, · · · , n. Find the
probability that at least two shells out of n find the target.
Q3. A bag contains 5 white and 2 black balls and balls are
drawn one by one without replacement. What is the
probability of drawing the second white ball before the
second black ball?
Assignment-I (cont...)

Problems
Q4. Balls are drawn repeatedly and with replacement from a
bag consisting of 60 white and 30 black balls. What is the
probability of drawing the third white ball before the
second black ball?
Q5. Let A and B be two events which are independent. Then,
show that A and B c , Ac and B, and Ac and B c are
independent.
Q6. Consider the experiment of tossing a coin three times. Let
Hi , i = 1, 2, 3, denote the event that the ith toss is a head.
Assuming that the coin is fair and has an equal probability
of landing heads or tails on each toss, the events H1 , H2
and H3 are mutually independent.
Assignment-I (cont...)

Problems
Q7. When coded messages are sent, there are sometimes errors
in transmission. In particular, Morse code uses “dots" and
“dashes", which are known to occur in the proportion of
3 : 4. This means that for any given symbol,
3 4
P (dot sent) = and P (dash sent) = .
7 7
Suppose there is interference on the transmission line, and
with probability 18 a dot is mistakenly received as a dash,
and vice versa. If we receive a dot, can we be sure that a
dot was sent? (Ans. 21/25)
Solve more problems other than these exercises if you are
willing to have good grade.
Part-II
Random variable

Motivation
Someone may not be interested in the full physical
description of the sample space or events. Rather, one may
be interested in the numerical characteristic of the event
considered.
For example, suppose some components have been put on a
test. After ceratain time t > 0, we may be interested that
how many of these are functioning or how many are not
functioning. Here, we are not interested which unit have
failed to work.
To study certain phenomena of a random experiment, it is
required to quantify the phenomena. One option is to
associate a real number to every outcome of the random
experiment. This encourages us to develop the concept of
the random variable.
Random variable (cont...)

Definition
Let (Ω, F, P ) be a probability space and let X : Ω → R be a
given function. We say that X is a random variable if

X −1 (B) ∈ F for all B ∈ B1 ,

where B1 is the Borel sigma-field.

Alternative
Let (Ω, F, P ) be a probability space. Then, a real valued
measurable function defined on the sample space is known as
the random variable.
Random variable (cont...)

Theorem
Let (Ω, F, P ) be a probability space and let X : Ω → R be a
given function. Then, X is a random variable if and only if

X −1 ((−∞, a]) = {w ∈ Ω : X(ω) ≤ a} ∈ F

for all a ∈ R.
Random variable (cont...)

Example
Consider the experiment of tossing of a coin. Then, the sample
space is Ω = {H, T }. Define X as the number of heads. Then,
X(H) = 1 and X(T ) = 0. Consider

F = Power set(Ω) = {φ, {H}, {T }, Ω}.

Our goal is to show that X : Ω → R is a random variable. Here,



φ,

 a<0
{w ∈ Ω : X(ω) ≤ a} = {T }, 0≤a<1

a≥1

Ω,

which belongs to F. Thus, X is a random variable.


Distribution function

Definition
A function F : R → R defined by

F (x) = P (X ≤ x) = P ((−∞, x]), x ∈ R

is called the distribution function of the random variable x. It


is also denoted by FX (x).

Theorem
Let FX be the distribution function of a random variable X.
Then,
FX is non-decreasing.
FX is right continuous.
FX (∞) = 1 and FX (−∞) = 0.
Distribution function (cont...)

Example
Suppose that a fair coin is independently flipped thrice. Then,
the sample space is

Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

Let X be a random variable, which denotes the number of


heads. Then,
1 3
P (X = 0) = = P (X = 3), P (X = 1) = = P (X = 2).
8 8
Distribution function (cont...)

Example (cont...)
The distribution function of X is




0, x<0
 1
8,


 0≤x<1
1
FX (x) = 2 , 1 ≤ x < 2

7
8, 2 ≤ x < 3






1, x ≥ 3.

FX (x) is non-decreasing, right continuous, FX (+∞) = 1


and FX (−∞) = 0. Moreover, FX (x) is a step function
having discontinuities at 0, 1, 2 and 3.
Sum of the sizes of the jumps is equal to 1.
Distribution function (cont...)

Note
Let −∞ < a < b < ∞. Then,
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a)
P (a < X < b) = P (X < b) − P (X ≤ a)
P (a ≤ X < b) = P (X < b) − P (X < a)
P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a)
P (X ≥ a) = 1 − P (X < a)
P (X > a) = 1 − P (X ≤ a)

Theorem
Let G : R → R be a non-decreasing and right continuous
function for which G(−∞) = 0 and G(+∞) = 1. Then, there
exists a random variable X defined on a probability space
(Ω, F, P ) such that the distribution function of X is G.
Distribution function (cont...)

Example
Consider a function G : R → R defined by
(
0, x<0
G(x) = −x
1−e , x≥0

Observations
Clearly, G is nondecreasing, continuous and satisfies
G(−∞) = 0 and G(∞) = 1. Thus, G is a distribution
function for a random variable X.
Since G is continuous, we have
P (X = x) = G(x) − G(x− ) = 0 for all x ∈ R, where G(x− )
is the left hand limit of G at the point x.
Distribution function (cont...)

Example (cont...)
For −∞ < a < b < ∞, P (a < X < b) = P (a ≤ X < b) =
P (a ≤ X ≤ b) = P (a < X ≤ b) = G(b) − G(a).
P (X ≥ a) = P (X > a) = 1 − G(a) and
P (X < a) = P (X ≤ a) = G(a).
P (2 < X ≤ 3) = G(3) − G(2) = e−2 − e−3
P (−2 < X ≤ 3) = G(3) − G(−2) = 1 − e−3
P (X ≥ 2) = 1 − G(2) = e−2
P (X > 5) = 1 − G(5) = e−5 .
Note that the sum of sizes of jumps of G is 0.
Types of the random variables

Discrete random variable


Continuous random variable
Absolutely continuous random variable
Mixed random variable

Note: We will only study discrete and (absolutely) continuous


random variables in detail.
Discrete random variables

Definition
A random variable X is said to be of discrete type if there
exists a non-empty and countable set SX such that

P (X = x) = FX (x) − FX (x− ) > 0, ∀ x ∈ SX

and

[FX (x) − FX (x− )] = 1.


X X
PX (SX ) = P (X = x) =
x∈SX x∈SX

The set SX is called the support of the discrete random variable


X.
Discrete random variables (cont...)

Theorem
Let X be a random variable with distribution function FX and
let DX be the set of discontinuity points of FX . Then, X is of
discrete type if and only if

P (X ∈ DX ) = 1.

Definition
Let X be a discrete type random variable with support SX . The
function fX : R → R defined by
(
P (X = x), x ∈ SX
fX (x) = c
0, x ∈ SX

is called the probability mass function of X.


Discrete random variable (cont...)

Example
Let us consider a random variable X having the distribution
function FX : R → R defined by The distribution function of X
is



 0, x<0

1
8,0≤x<2





 1
4,


 2≤x<3
1
FX (x) = 2 , 3 ≤ x < 6

4
5 , 6 ≤ x < 12





7

8 , 12 ≤ x < 15





1, x ≥ 15.

Is the random variable of discrete type? If yes, find the


probability mass function of X.
Discrete random variable (cont...)

Solution
The set of discontinuity points of FX is DX = {0, 2, 3, 6, 12, 15}
and P (X ∈ DX ) = x∈DX [FX (x) − FX (x− )] = 1. Thus, the
P

random variable X is of discrete type with support


SX = DX = {0, 2, 3, 6, 12, 15}. The probability mass function is

1


 8, x ∈ {0, 2, 15}
 1
4, x=3
( 

FX (x) − FX (x− ), x ∈ SX

fX (x) = 3
= 10
c
, x=6
0, x ∈ SX 
 3
40 , x = 12





0, otherwise.

Discrete random variables (cont...)

Remark
The PMF of a discrete type random variable X having support
SX satisfies the following properties:
c .
(i) fX (x) > 0 for all x ∈ SX and fX (x) = 0 for all x ∈ SX
P P
(ii) x∈SX fX (x) = x∈SX P (X = x) = 1
Conversely, if a function satisfies the above two properties, then
it is a probability mass function.
Continuous and absolutely continuous random variables

Definition
A random variable X is said to be of
continuous type if its distribution function FX is
continuous everywhere.
absolutely continuous type if there exits an integrable
function fX : R → R such that fX (x) ≥ 0 for all x ∈ R and
Z x
FX (x) = fX (t)dt, ∀x ∈ R.
−∞

The function fX (x) is said to be the probability density


function of the random variable X and set
SX = {x ∈ R : fX (x) > 0} is called a support of X.
Continuous and absolutely continuous random variables
(cont...)

Note
If fX (x) is the probability density function of an absolutely
continuous random variable X, then
(i) fX (x) ≥ 0 for all x ∈ R
R∞
(ii) −∞ fX (t)dt = FX (∞) = 1.

Absolutely continuous random variable ⇒ continuous random


variable
Continuous and absolutely continuous random variables
(cont...)

Example
Let X be a random variable having the distribution function
(
0, x<0
FX (x) = −x
1−e , x≥0

Clearly, FX is continuous at every x ∈ R and therefore X is


of continuous type.
Continuous and absolutely continuous random variables
(cont...)

Example (cont...)
Also, Z x
FX (x) = fX (t)dt, ∀x ∈ R,
−∞
where
(
0, t<0
fX (t) = −t
e , t≥0

It follows that X is also absolutely continuous type random


variable with probability density function fX (t).
Note: Let X be a random variable of continuous type. Then,

P (X = x) = FX (x) − FX (x− ) = 0, ∀x ∈ R.
Continuous and absolutely continuous random variables
(cont...)

Note
Let X be a random variable of absolutely continuous type.
Then, X is also of continuous type and thus, P (X = x) = 0 for
all x ∈ R. Consequently,
Rx
P (X < x) = P (X ≤ x) = FX (x) = −∞ fX (t)dt for all
x ∈ R.
R∞
P (X ≥ x) = P (X > x) = x fX (t)dt for all x ∈ R.
for −∞ < a < b < ∞, P (a < X ≤ b) = P (a ≤ X < b) =
P
Rb
(a < X < b) =
Ra
P (a ≤ X ≤ b)R = FX (b) − FX (a) =
b
−∞ fX (t)dt − −∞ fX (t)dt = a fX (t)dt.
Continuous and absolutely continuous random variables
(cont...)

Note
Suppose that the distribution function FX of a random variable
X is differentiable at every x ∈ R. Then,
Rx 0
FX (x) = −∞ FX (t)dt for all x ∈ R.
It follows that if FX is differentiable everywhere then the
random variable X is of absolutely continuous type and one
may take its probability density function to be

fX (x) = FX0 (x), ∀x ∈ R.


Continuous and absolutely continuous random variables
(cont...)

Example
Let X be a random variable with probability density function
(
k − |x|, |x| < 12
fX (x) =
0, otherwise,

where k ∈ R. Then,
(i) find the value of k.
(ii) Evaluate P (X < 0), P (X ≤ 0), P (0 < X ≤ 41 ),
P (0 ≤ X < 14 ) and P (− 81 ≤ X ≤ 14 ).
(iii) Find the distribution function of X.

Solution
See it during the lecture!
Expectation

Definition
Let X be a discrete type random variable with probability
mass function fX and support SX . We say that the
expected value of X (denoted by E(X)) is finite and equals
X
E(X) = xfX (x)
x∈SX

|x|fX (x) < ∞.


P
provided x∈SX
Let X be a continuous type random variable with
probability density function fX . We say that the expected
value of X (denoted by E(X)) is finite and equals
Z ∞
E(X) = xfX (x)dx
−∞
R∞
provided −∞ |x|fX (x)dx < ∞.
Expectation (cont...)

Example
Let X be a random variable with probability mass function
(
( 12 )x , x ∈ {1, 2, 3, · · · }
fX (x) =
0, otherwise,

Show that E(X) is finite and find its value. (Ans. 2)


Solution
See during my lecture!
Expectation (cont...)

Example
Let X be a random variable with probability mass function
(
6
π 2 x2
, x ∈ {−1, +1, −2, +2, −3, +3, · · · }
fX (x) =
0, otherwise,

Show that E(X) is not finite.


Solution
See during my lecture!
Expectation (cont...)

Example
Let X be a random variable with probability density function
e−|x|
(
2 , −∞ < x < ∞
fX (x) =
0, otherwise,

Show that E(X) is finite and fiind its value.


Solution
See during my lecture!
Expectation (cont...)

Example
Let X be a random variable with probability density function
(
1 1
π 1+x2 , −∞ < x < ∞
fX (x) =
0, otherwise,

Show that E(X) is not finite.


Solution
See during my lecture!
Expectation (cont...)

Theorem
Let X be a random variable of discrete type with support
SX and probability mass function fX . Let h : R → R be a
Borel function and let T = h(X). Then,
X
E(T ) = h(x)fX (x),
x∈SX

provided it is finite.
Let X be a random variable of (absolutely) continuous
type with probability density function fX . Let h : R → R
be a Borel function and let T = h(X). Then,
Z ∞
E(T ) = h(x)fX (x)dx,
−∞

provided it is finite.
Expectation and moments

Definition
Let X be a random variable defined on some probability space.
µ01 = E(X), provided it is finite, is called the mean of the
random variable X.
For r ∈ {1, 2, · · · }, µ0r = E(X r ), provided it is finite, is
called the rth moment of X.
For r ∈ {1, 2, · · · }, µr = E((X − µ01 )r ), provided it is finite,
is called the rth central moment of X.
µ2 = E((X − µ01 )2 ), provided it is finite, is called the
variance of X. We denote V ar(X) = E((X − µ01 )2 ). The
√ q
quantity σ = µ2 = E((X − µ01 )2 ) is called the standard
deviation of X.
Expectation and moments (cont...)

Theorem
Let X be a random variable.
For real constants a and b, E(aX + b) = aE(X) + b,
provided the involved expectations are finite.
If h1 , · · · , hm are Borel functions, then
m m
!
X X
E hi (X) = E(hi (X))
i=1 i=1

provided the involved expectations are finite.


Expectation and moments (cont...)

Theorem
Let X be a random variable with finite first two moments and
let E(X) = µ. Then,
V ar(X) = E(X 2 ) − (E(X))2
V ar(X) ≥ 0. Moreover, V ar(X) = 0 if and only if
P (X = µ) = 1
E(X 2 ) ≥ (E(X))2 [Cauchy-Schwarz inequality]
For any real constants a, b,

V ar(aX + b) = a2 V ar(X).
Expectation and moments (cont...)

Example
Let X be a random variable with probability density function

1
2,

 −2 < x < −1
fX (x) = x
, 0<x<3
9

0, otherwise.

(i) If Y1 = max(X, 0), find the mean and variance of Y1 .


(ii) If Y2 = 2X + 3e− max(X,0) + 4, find E(Y2 ).

Solution
See during lecture!
Mean, median and mode

Mean
The mean of a random variable X is given by µ01 = E(X).
Mean of a probability distribution gives us idea about the
average observed value of X in the long run.

Median
A real number m satisfying FX (m− ) ≤ 21 ≤ FX (m), that is,
P (X < m) ≤ 12 ≤ P (X ≤ m) is called the median of X.
The median of probability distribution divides SX into two
equal parts each having the same probability of occurrence.
If X is continuous, then median m is given by FX (m) = 21 .
For discrete case, the median may not be unique.
Mean, median and mode (cont...)

Mode
The mode m0 of a probability distribution is the value that
occurs with highest probability. It is defined as
m0 = sup{fX (x) : x ∈ SX }.

Example
Consider a random variable X with distribution function

0,

 x<0
FX (x) = x3 , x ∈ [0, 1]


1, x > 1,

Obtain mean, median and mode of X. (Ans. 3/4, (1/2)1/3 , 1)

Solution
See the lecture.
Measures of skewness and kurtosis

Skewness
Skewness of a probability distribution is a measure of
asymmetry (or lack of symmetry).
A measure of skewness of the probability distribution of X
is defined as
µ3
β1 = 3/2 .
µ2
For symmetric distribution, β1 = 0.
β1 > 0 indicates that the data is positively skewed and
β1 < 0 indicates that the data is negatively skewed
Measures of skewness and kurtosis (cont...)

Kurtosis
Kurtosis of a probability distribution of X is a measure of
peakedness and thickness of tail of probability density
function of X relative to the peakedness and thickness of
tails of the density function of normal distribution.
A distrbution is said to have higher (lower) kurtosis than
the normal distribution if its density function in
comparison with the density function of a normal
distribution, has sharper (rounded) peak and longer, flatter
(shorter, thinner) tails.
Measures of skewness and kurtosis (cont...)

Kurtosis (cont..)
A measure of kurtosis of the probability distribution of X
is defined by
µ4
γ1 = 2 .
µ2
For normal distribution with mean µ and variance σ 2 ,
γ1 = 3. The quantity

γ2 = γ1 − 3

is called the excess kurtosis of the distribution of X. The


distribution with zero excess kurtosis is called mesokurtic.
A distribution with positive (negative) excess kurtosis is
called leptokurtic (platykurtic).
Measures of skewness and kurtosis (cont...)

Exercises (Home work!)


(i) Obtain the skewness and kurtosis of X, with probability
mass function
e−λ λx
fX (x) = , λ > 0, x = 0, 1, 2, · · · .
x!
(ii) Obtain the skewness and kurtosis of X, with probability
density function

fX (x) = λe−λx x > 0, λ > 0.


Moment generating function

Definition
Let X be a random variable and let
A = {t ∈ R : E(|etX |) = E(etX ) is finite}. Define MX : A → R
by
MX (t) = E(etX ), t ∈ R

We call the function MX (.) the moment generating


function of X.
We say that the moment generating function exists if there
exists a positive real number a such that (−a, a) ⊂ A, that
is, if MX (t) = E(etX ) is finite in an interval containing 0.
Moment generating function (cont...)

Theorem
Let X be a random variable with moment generating function
(r)
MX . Then, for each r ∈ {1, 2, · · · } µ0r = E(X r ) = MX (0),
where MXr (.) is the rth derivative of M (t).
X

Examples
(i) Obtain the momet generating function of X, with
probability mass function

e−λ λx
fX (x) = , λ > 0, x = 0, 1, 2, · · · .
x!
(ii) Obtain the moment generating function of X, with
probability density function

fX (x) = λe−λx x > 0, λ > 0.


Assignment-II

Problems
Q1. Consider a random variable X with probability mass
function
(
x−1 3 x−2
16 ( 4 ) , x = 2, 3, 4, · · ·
fX (x) =
0, otherwise.

Obtain the cumulative distribution function of X.


Q2. Consider a random variable X with probability mass
function
( |x|
2550 , x = −1, +1, −2, +2, · · · , −50, +50
fX (x) =
0, otherwise.

Show that Z = |X| is a random variable. Find its PMF


and distribution function.
Assignment-II (cont...)

Q3. Consider a random variable X with probability mass


function
( n
x n−x ,
fX (x) = x p (1 − p) x = 0, 1, 2, 3, 4, · · · , n
0, otherwise,

where n is a positive integer and p ∈ (0, 1). Show that


Y = n − X is arandom variable. Find its PMF and
distribution function. Find the mean and variance of X.
Let T = eX + 2e−X + 6X 2 + 3X + 4. Then, find E(T ).
Q4. Let X be a random variable with moment generating
t
function MX (t) = eλ(e −1) , λ > 0. Obtain first four
moments of X.
Assignment-II (cont...)

Q5. Let us caste two fair die. Denote the sum of the outcomes
by X. Show that X is a random variable.
Q6. Derive mean and variance of X with the probability mass
function given by
( n
x n−x ,
fX (x) = x p (1 − p) x = 0, 1, 2, 3, 4, · · · , n
0, otherwise,

where n is a positive integer and p ∈ (0, 1).


Q7. Obtain the skewness and kurtosis of the normal
distribution with mean µ and variance σ 2 .
Q8. Suppose the moment generating function of X is
t
MX (t) = e3(e −1) . Find P (X = 0).
Assignment-II (cont...)

Q9. Suppose X is a discrete random variable and has the


moment generating function
1 3 3 1
MX (t) = e2t + e3t + e5t + e8t .
7 7 7 7
What is the probability mass function of X? Find E(X).
Q10. Suppose that you have a fair 4-sided die, and let X be the
random variable representing the value of the number
rolled.
(a) Write down the moment generating function for X.
(b) Use this moment generating function to compute the
1st and second moments of X.
Assignment-II (cont...)

Q11. The continuous type of random variable X has the


following density function:
(
a − x, 0 < x < a
fX (x) =
0, otherwise,

Find a. Obtain mean and variance of X. When Y = X 2 ,


obtain the density function of Y.
Q12. The continuous type of random variable X has the
following density function:
1 1 2
fX (x) = √ e− 2 x , −∞ < x < ∞.

Compute mean and variance of X. When Y = X 2 , Z = eX


compute mean and variance of Y and Z.
Part-III
Distributions

Bernoulli distribution
A random experiment is said to be a Bernoulli experiment
if its each trial results in just two possible outcomes:
success and failure.
Each replication of a Bernoulli experiment is called a
Bernoulli trial.
A discrete random variable X with support SX = {0, 1} is
said to follow Bernoulli distribution if its probability mass
function is given by

1 − p, x=0
 (
px (1 − p)1−x , x = 0, 1

fX (x) = p, x=1 =

 0, otherwise,
0, otherwise,

where 0 < p < 1.


Distributions

Bernoulli distribution (cont...)


The Bernoulli distribution represents the probability of
success or failure of a single Bernoulli trial.
The cumulative distribution function of X is given by

0,

 x<0
FX (x) = 1 − p, 0 ≤ x < 1

x ≥ 1,

1,

where 0 < p < 1.


E(X) = p and V ar(X) = p(1 − p)
Moment generating function: MX (t) = 1 − p + et p.
1−6p(1−p)
Skewness: β1 = √1−2p , Kurtosis: γ1 = p(1−p) .
p(1−p)
Distributions

Binomial distribution
Physical conditions for Binomial distribution: (we get the
binomial distribution under the following experimental
conditions)
Each trial results in two mutually disjoint outcomes,
termed as success and failure.
The number of trials n is finite.
The trials are independent of each other.
The probabilty of success p is constant for each trial.
Distributions

Binomial distribution (cont...)


A discrete type random variable X with support
SX = {0, 1, 2, . . . , n} is said to follow binomial distribution
with parameters n (a natural number) and p ∈ (0, 1) if it
has the probability mass function
( n
x n−x ,
fX (x) = P (X = x) = x p (1 − p) x ∈ SX
0, c ,
x ∈ SX

We denote X ∼ Binomial(n, p).


Distributions

Binomial distribution (cont...)


For r ∈ {1, 2, . . . , n}, we have

E(X(X − 1) · · · (X − r + 1)) = n(n − 1) · · · (n − r + 1)pr ,

which is known as the rth factorial moment of X.


Mean=E(X) = np, E(X 2 ) = n(n − 1)p2 + np and
V ar(X) = np(1 − p).
Moment generating function MX (t) = (1 − p + pet )n , t ∈ R.
Distributions

Examples
1. Ten coins are thrown simultaneously. Find the probability
of getting at least seven heads. (Ans: 176/1024)
2. The mean and variance of binomial distribution are 4 and
4/3, respectively. Find P (X ≥ 1). (Ans: 0.9986)
3. Let X be binomially distributed with parameters n and p.
What is the distribution of n − X? (Ans: Binomial(n,1-p))
 9
2
4. The moment generating function of X is 3 + 31 et . Find
P (0.2 < X < 5.6).
5. Let X1 , . . . , Xk be a random sample from Binomial(3, 0.4).
Find the distribution of S = ki=1 Xi .
P

Solution
See the lecture.
Distributions

Poisson distribution
A discrete type random variable X with support
SX = {0, 1, . . . , ∞} is said to follow Poisson distribution if its
probability mass function is given by
e−λ λx
(
x! , x ∈ SX
fX (x) = P (X = x) = c ,
0, x ∈ SX

where λ > 0.

Some situations
Number of occurrences in a given time interval.
Number of accidents in a particular junction of a city.
Number of deaths from a disease such as heart attack or
due to snake bite.
Distributions

Poisson distribution from Binomial distribution


n, the number of trials is infinitely large, that is, n → ∞.
p, the constant probability of success for each trial is
infinitely small, that is, p → ∞.
np = λ is finite, where λ is a positive number.
Under the above conditions, binomial distribution follows
Poisson distribution.
Note: For the derivation of Poisson distribution from Binomial
distribution, please see the lecture.
Mean=λ=Var(X), E(X(X − 1) . . . (X − r + 1)) = λr .
t −1)
Moment generating function MX (t) = eλ(e , t ∈ R.
Distributions/Poisson distribution
Examples
Q1. If X is a Poisson variate with parameter λ and such that

P (X = 2) = 9P (X = 4) + 90P (X = 6),

then find mean and variance of X. (Ans:1,1)


Q2. Let X and Y be two independent Poisson variates such
that P (X = 1) = P (X = 2) and P (Y = 2) = P (Y = 3).
Find V ar(X − 2Y ). (Ans: 14)
Q3. Consider a telephone operator who on the average handles
five calls in every 3 minutes. What is the probability that
there will be no calls in the next minute, at least two calls?
Q4. Consider a person who plays a series of 2500 games
independently. If the probability of person winning any
game is 0.002, find the probability that the person will win
at least two games.
Distributions

Discrete uniform distribution


The shorthand X ∼ discrete uniform(a, b) is used to
indicate that the random variable X has the discrete
uniform distribution with integer parameters a and b,
where a < b. A discrete uniform random variable X with
parameters a and b has probability mass function
1
fX (x) = , x = a, a + 1, . . . , b.
b−a+1
x−a+1
CDF FX (x) = b−a+1 , x = a, a + 1, . . . , b.
eat −e(b+1)t
MGF MX (t) = (b−a+1)(1−et ) , −∞ < t < ∞.
2 −1
Mean E(X) = a+b
2 , Var(X)= (b−a+1)
12 .
6((b−a+1)2 +1)
Skewness=0, Kurtosis= 5((b−a+1)2 −1) .
Distributions

Hypergeometric distribution
An urn has 1000 balls: 700 green, 300 blue.

(a)
Distributions

Sampling with replacement


Pick one of the 1000 balls. Record color (green or blue).
Put it back in the urn and shake it up. Again pick one of
the 1000 balls and record color. Repeat n times.
On each draw, the probability of green is 700/1000.
The number of green balls drawn has a binomial
distribution, with probability of success

p = 700/1000 = 0.7.
Distributions

Sampling without replacement


Pick one of the 1000 balls, record color, and set it aside.
Pick one of the remaining 999 balls, record color, set it
aside. Pick one of the remaining 998 balls, record color, set
it aside. Repeat n times, never re-using the same ball.
Equivalently, take n balls all at once and count them by
color.
The number of green balls drawn has a hypergeometric
distribution.
Distributions

Hypergeometric distribution
An urn contains N number of balls with K number of
green balls and N − K number of blues balls. A sample of
n balls is drwan without replacement. What is the
probability that there are k number of green balls?
Let random variable X be the number of green balls
drawn. Then,
K  N −K 
k n−k
P (X = k) = N
.
n

np(1−p)(N −n)
E(X) = np and V ar(X) = N −1 , where p = K/N.
Distributions

Example
An urn has 1000 balls: 700 green, 300 blue. A sample of 7 balls
is drawn. What is the probability that it has 3 green balls and
4 blue balls?
Sampling with replacement (binomial): Ans: 0.0972405
Sampling without replacement (hypergeometric): Ans:
0.0969179
Distributions

Uniform or rectangular distribution


A continuous type random variable X is said to have uniform
distribution in the interval (a, b) if the probability density
function is given by
(
1
fX (x) = b−a , a<x<b
0, otherwise,

br+1 −ar+1
For r ∈ {1, 2, . . .}, µ0r = E(X r ) = (r+1)(b−a) .
For r ∈ {1, 2, . . .}, µr = E((X − µ01 )r )=

 (b−a)r , r = 2, 4, 6, . . .
2r (r+1)
0, r = 1, 3, 5, . . . ,
Distributions

Uniform or rectangular distribution


2
Mean=E(X) = a+b
2 =Median, Var(X)= (a−b)
12
Skewness=0, Kurtosis= 95 .
CDF:

0,

 x<a
x−a
FX (x) = , a≤x<b
 b−a
x ≥ b,

1,

MGF:

 ebt −eat , t 6= 0
(b−a)t
MX (t) =
1, t = 0,
Distributions

Exponential distribution
A continuous type random variable X is said to have
exponential distribution with parameter λ > 0 if the probability
density function is given by
(
λe−λx , x > 0
fX (x|λ) =
0, otherwise,

Mean=E(X)=1/λ. Var(X)=1/λ2 .
CDF:
(
1 − e−λx , x > 0
FX (x) =
0, otherwise,
Distributions

Theorem
The exponential distribution has the memoryless (forgetfulness)
property. A variable X with positive support is memoryless if
for all t > 0 and s > 0 P (X > s + t|X > t) = P (X > s).

Proof
See the lecture!
The idea of the memoryless properly, for example, is that

P (X > 14|X > 8) = P (X > 6).

Intuitively, thinking about the fact that exponential random


variables are waiting times, this translates into saying that the
probability we have to wait 14 minutes or more, given that we
already waited at least 8 minutes, is equal to the probability
that we wait 6 minutes or more from the start.
Distributions

Exponential distribution
Moment generating function=MX (t) = (1 − θt )−1 .

Q1. Let X1 , X2 , . . . , Xn be a random sample (iid) from the


exponential distribution with mean 1/λ. Obtain the
cumulative distribution functions of max{X1 , X2 , . . . , Xn }
and min{X1 , X2 , . . . , Xn }.

Solution
see the lecture.
Distributions

Gamma distribution
PDF

1 − xθ α−1
θα Γ(α) e x , x>0

fX (x|α, θ) =
0, otherwise,

where α > 0 and θ > 0.


Γ(α+r) r
E(X r ) = Γ(α) θ for r ∈ {1, 2, 3, ...}
Mean: E(X) = αθ. Variance: V ar(X) = αθ2 .
µ3 = E(X − µ01 )3 = 2αθ3
µ4 = E(X − µ01 )4 = µ04 − 4µ01 µ03 + 6(µ01 )2 µ02 − 3(µ01 )4 =
3α(α + 2)θ4 .
β1 = √2 , γ1 = 3 + α6 .
α
Distributions

Gamma distribution
Moment generating function=MX (t) = (1 − tθ)−α , t < 1θ .
Distributions

Normal distribution
An absolutely continuous random variable X is said to
follow normal distribution with mean µ ∈ R and standard
deviation σ > 0 if its probability density function is given
by
1 (x−µ)2
fX (x|µ, σ) = √ e− 2σ2 , −∞ < x < ∞.
2πσ
We denote X ∼ N (µ, σ 2 ).
The normal distribution with mean 0 and variance 1 is
called standard normal distribution. It is denoted by
N (0, 1). The probability density function of the standard
normal variate Z is given by
1 z2
φ(z) = √ e− 2 , −∞ < z < ∞.

Distributions

Normal distribution
The cumulative distribution function of the standard
noprmal variate Z is given by
Z z Z z
1 x2
Φ(z) = φ(x)dx = √ e− 2 dx, −∞ < z < ∞.
−∞ 2π −∞

The distribution of the normal distribution N (µ, σ 2 ) is


symmetric about the mean µ.
Let X ∼ N (µ, σ 2 ). Then, E(X) = µ and Variance(X)=σ 2 .
σ 2 t2
MGF: MX (t) = eµt+ 2 , Skewness=0, Kurtosis=3.
The normal curve is the beautiful bell shaped curve shown
in Figure 1. It is a very useful curve in statistics because
many attributes, when a large number of measurements are
taken, are approximately distributed in this pattern.
Distributions

Normal distribution
Many human characteristics, such as height, IQ or
examination scores of a large number of people, follow the
normal distribution.
The model probably originated in 1733 in the work of the
mathematician Abraham Demoivre, who was interested in
laws of chance governing gambling, and it was also inde-
pendently derived in 1786 by Pierre Laplace, an
astronomer and mathematician.
Distributions

Normal distribution
However, the normal curve as a model for error distribution
in scientific theory is most commonly associated with a
German astronomer and mathematician, Karl Friedrich
Gauss, who found a new derivation of the formula for the
curve in 1809. For this reason, the normal curve is
sometimes referred to as the “Gaussian” curve. In 1835
another mathematician and astronomer, Lambert Qutelet,
used the model to describe human physiological and social
traits. Qutelet believed that “normal” meant average and
that deviations from the average were nature’s mistakes.
Almost all the scores (0.997 of them) lie within 3 standard
deviations of the mean.
Distributions

Central limit theorem


Informally, the Central Limit Theorem expresses that if a
random variable is the sum of n, independent, identically
distributed, non-normal random variables, then its
distribution approaches normal as n approaches infinity.
Assignment-III

Problems
Q1. Let {Yi }, i = 1, . . . , n be the independent Bernoulli random
variables with parameter p. Obtain the mean and variance
of Y = ni=1 Yi .
P

Q2. Suppose on average there are 5 homicides per month in a


city. What is the probability that there is at most 1 in a
certain month?
Q3. Let X be a random variable with PMF (geometric
distribution)

P (X = x) = (1 − p)x−1 p, x = 1, 2, . . . .

Obtain the mean and variance of X.


Assignment-III

Problems
Q4. The PDF of X is

4x,

 0 ≤ x ≤ 1/2
fX (x) = 4 − 4x, 1/2 < x ≤ 1


0, otherwise,

Find mean and variance of X.


Q5. A ball is drawn from an urn containing 4 blue and 5 red
balls. After the ball is drawn, it is replaced and another
ball is drawn. Suppose this process is done 7 times. What
is the probability that exactly 2 red balls were drawn in the
7 draws? What is the probability that at least 3 blue balls
were drawn in the 7 draws?
Assignment-III

Problems
Q6. The expected number of typos on a page of a new Harry
Potter book is 0.2. What is the probability that the next
page you read contains (i) 0 typos, (ii) 2 or more typos.
(iii) Explain what assumptions you have used.
Q7. An egg carton contains 20 eggs, of which 3 have a double
yolk. To make a pancake, 5 eggs from the carton are picked
at random. What is the probability that at least 2 of them
have a double yolk?
Q8. Suppose X has density function
(
ax + b, 0≤x≤1
fX (x) =
0, otherwise

and that E(X 2 ) = 1/6. Find the values of a and b.


Assignment-III

Problems
Q9. Suppose X has density function
(
1
fX (x) = a−1 , 1<x<a
0, otherwise

and that E(X) = 6V ar(X). Find the values of a.


Q10. Most graduate schools of business require applicants for
admission to take the Graduate Management Admission
Council’s GMAT examination. Scores on the GMAT are
roughly normally distributed with a mean of 527 and a
standard deviation of 112. What is the probability of an
individual scoring above 500 on the GMAT? How high
must an individual score on the GMAT in order to score in
the highest 5%? [Ans: 0.5948, 711.24]
Assignment-III

Problems
Q11. The length of human pregnancies from conception to birth
approximates a normal distribution with a mean of 266
days and a standard deviation of 16 days. What proportion
of all pregnancies will last between 240 and 270 days
(roughly between 8 and 9 months)? 4. What length of time
marks the shortest 70% of all pregnancies? [Ans: 0.5471,
274.32]
Q12. A manufacturing process produces semiconductor chips
with a known failure rate 6.3%. Assume that chip failures
are independent of one another. You will be producing
2000 chips tomorrow.
(a.) Find the expected number of defective chips produced.
(b.) Find the standard deviation of the number of defective
chips.
(c.) Find the probability of producing less than 135 defects.
Part-IV
Two dimensional random variables

Joint probability mass function


Let (X, Y ) be a two dimensional vector of discrete random
variables with support SXY . Then, the function
fXY (x, y) = P (X = x, Y = y) is said to be a joint probability
mass function if it satisfies the following conditions:
0 ≤ fXY (x, y) ≤ 1
PP
(x,y)∈SXY fXY (x, y) = 1
P [(X, Y ) ∈ A] =
PP
(x,y)∈A fXY (x, y), where A is a subset
of the support SXY .
Two dimensional random variables

Marginal probability mass function


Let X be a discrte random variable with support SX and Y be
another discrete random variable with support SY . Let X and
Y have the joint PDF fXY (x, y) with support SXY . Then, the
marginal PMF of X is defined as
X
fX (x) = P (X = x) = fXY (x, y), x ∈ SX .
y∈SY

Similarly, the marginal PMF of Y is defined as


X
fY (y) = P (Y = y) = fXY (x, y), y ∈ SY .
x∈SX
Two dimensional random variables

Some results
The random variables X and Y are said to be independent
if and only if fXY (x, y) = fX (x)fY (y). Otherwise,
dependent.
Let u(x, y) be a function of two variables. Then,
X X
E(u(X, Y )) = u(x, y)fXY (x, y),
(x,y)∈SXY

provided it exists.
PP
E(X) = xfXY (x, y),
P P (x,y)∈SXY
E(Y ) = (x,y)∈SXY yfXY (x, y).

(x − E(X))2 fXY (x, y)


PP
V ar(X) =
P P (x,y)∈SXY 2
V ar(Y ) = (x,y)∈SXY (y − E(Y )) fXY (x, y)
Two dimensional random variables

Examples
Q1. Suppose we toss a pair of four sided dice, in which one is
red and other is black. Let X and Y denote the outcomes
on the red and black dice, respectively. Obtain the joint
PMF of (X, Y ). Obtain the marginal PMFs of X and Y.
Are X and Y independent? Find E(X), E(Y ), Var(X) and
Var(Y).
Q2. Two dice are thrown simulataneously. Let X be the sum of
the outcomes of two dice. Suppose Y =|difference of the
outcomes of two dice|. Obtain the joint PMF of (X, Y ).
Further, obtain the marginal PMFs of X and Y. Find the
conditional PMF of X given Y = 2.
Two dimensional random variables

Joint probability density function


Let (X, Y ) be a two dimensional vector of absolutely continuous
random variables with support SXY . Then, the function
fXY (x, y) is said to be a joint probability mass function if it
satisfies the following conditions:
fXY (x, y) ≥ 0
R∞ R∞
−∞ −∞ fXY (x, y)dxdy =1
RR
P [(X, Y ) ∈ A] = A fXY (x, y)dxdy, where A is a subset of
the support SXY .
Two dimensional random variables

Marginal probability density function


Let X be an absolutely continuous type random variable with
support SX and Y be another absolutely continuous random
variable with support SY . Let X and Y have the joint PDF
fXY (x, y) with support SXY . Then, the marginal PDF of X is
defined as
Z ∞
fX (x) = fXY (x, y)dy, x ∈ SX .
−∞

Similarly, the marginal PDF of Y is defined as


Z ∞
fY (y) = fXY (x, y)dx, y ∈ SY .
−∞
Two dimensional random variables

Some results
The random variables X and Y are said to be independent
if and only if fXY (x, y) = fX (x)fY (y). Otherwise,
dependent.
Let u(x, y) be a function of two variables. Then,
Z ∞ Z ∞
E(u(X, Y )) = u(x, y)fXY (x, y)dxdy,
−∞ −∞

provided it exists.

R ∞ R
E(X) = R −∞ −∞ xfXY (x, y)dxdy,
∞ R ∞
E(Y ) = −∞ −∞ yfXY (x, y)dxdy.
∞R ∞ R 2
V ar(X) = R −∞ −∞ (x − E(X)) fXY (x, y)dxdy
∞ R∞
V ar(Y ) = −∞ −∞ (y − E(Y ))2 fXY (x, y)dxdy.
Two dimensional random variables

Examples
Q1. Let X and Y have joint PDF
(
4xy, 0 < x < 1, 0 < y < 1
fXY (x, y) =
0, otherwise.

Is this a valid PDF? Find P (Y < X). Find the marginal


PDFs of X and Y . Find E(X) and E(Y ),
V ar(X), V ar(Y ). Are X, Y independent?
Q2. Let the joint PDF of (X, Y ) be
(
2, 0<x<y<1
fXY (x, y) =
0, otherwise.

Are X and Y independent?


Two dimensional random variables

Examples
Q3. Let X and Y have joint PDF
(
6xy 2 , 0 < x < 1, 0 < y < 1
fXY (x, y) =
0, otherwise.

Is this a valid PDF? Find P (X + Y ≥ 1). Find the marginal


PDFs of X and Y . Find E(X) and E(Y ), V ar(X), V ar(Y ).
Are X, Y independent? Find P (1/4 < X < 1/2). Also find
the conditional PDFs of X|Y and Y |X. Find the
conditional mean and variance of Y given the value of X.
Two dimensional random variables

Examples
Q4. Let the joint PDF of (X, Y ) be
(
cx + 1, x, y ≥ 0, x + y < 1
fXY (x, y) =
0, otherwise.

Graphically, show the range of (X, Y ) in the xy plane.


Find the constant c. Find the marginal PDFs of X and Y.
Find P (Y < 2X 2 ).
Q5. Let the joint PDF of (X, Y ) be
(
6e−(2x+3y) , x, y ≥ 0,
fXY (x, y) =
0, otherwise.

Are X and Y independent? Find E(Y |X > 2). Find


P (X > Y ).
Two dimensional random variables

Examples
Q6. Let the joint PDF of (X, Y ) be
(
2x, 0 ≤ x ≤ 1
fXY (x, y) =
0, otherwise.

We known that Y |X = x ∼ U nif orm[−x, x]. Find the


joint PDF of (X, Y ). Find the marginal PDF of Y. Find
P (|Y | < x3 ).
Two dimensional random variables

Examples
Q7. Let the joint PDF of (X, Y ) be
( √
6xy, 0 ≤ x ≤ 1, 0 ≤ y ≤ x
fXY (x, y) =
0, otherwise.

Graphically, show the range of (X, Y ) in the xy plane.


Find the marginal PDFs of X and Y . Are they
independent? Find the conditional PDF of X given Y = y.
Find E[X|Y = y] and V ar[X|Y = y] for 0 ≤ y ≤ 1.
Part-V
Simple random sample

Definition
Simple random sampling (SRS) is a method of selection of a
sample comprising of n number of sampling units out of the
population having N number of sampling units such that every
sampling unit has an equal chance of being chosen.

The samples can be drawn in two possible ways.


The sampling units are chosen without replacement in the
sense that the units once are chosen are not placed back in
the population.
The sampling units are chosen with replacement in the
sense that the chosen units are placed back in the
population.
Simple random sample

Simple random sampling without replacement (SRSWOR)


SRSWOR is a method of selection of n units out of the N
units one by one such that at any stage of selection, any
one of the remaining units have the same chance of being
selected, i.e. 1/N.

Simple random sampling with replacement (SRSWR)


SRSWR is a method of selection of n units out of the N
units one by one such that at each stage of selection, each
unit has an equal chance of being selected, i.e., 1/N .
Probability of drawing a sample

Notation
N : Number of sampling units in the population
(Population size).
n: Number of sampling units in the sample (sample size)

SRSWOR
If n units are selected by SRSWOR, the total number of
N
possible samples are n
So the probability of selecting any one of these samples is
1/ N

n .
Probability of drawing a sample

SRSWOR
Note that a unit can be selected at any one of the n draws.
Let ui be the ith unit selected in the sample. This unit can
be selected in the sample either at first draw, second draw,
..., or nth draw.
Let Pj (i) denote the probability of selection of ui at the jth
draw, j = 1, . . . , n. Then,

Pj (i) = P1 (i) + P2 (i) + . . . + Pn (i)


1 1 1
= + + ... +
N N N
n
= .
N
Probability of drawing a sample

SRSWOR
Now if u1 , u2 , . . . , un are the n units selected in the sample,
then the probability of their selection is

P (u1 , u2 , . . . , un ) = P (u1 )P (u2 ) . . . P (un ).

Note that when the second unit is to be selected, then


there are (n − 1) units left to be selected in the sample
from the population of (N − 1) units. Similarly, when the
third unit is to be selected, then there are (n − 2) units left
to be selected in the sample from the population of (N − 2)
units and so on.
Probability of drawing a sample

SRSWOR
n n−1 1
If P (u1 ) = N, then P (u2 ) = N −1 , . . . , P (un ) = N −n+1 .
Thus,
n n−1 1
P (u1 , u2 , . . . , un ) = × × ... ×
N N −1 N −n+1
1
= N
.
n
Probability of drawing a sample

SRSWR
When n units are selected with SRSWR, the total number
of possible samples are N n .
1
The Probability of drawing a sample is Nn .
Alternatively, let ui be the ith unit selected in the sample.
This unit can be selected in the sample either at first draw,
second draw, ..., or nth draw. At any stage, there are
always N units in the population in case of SRSWR, so the
probability of selection of ui at any stage is 1/N for all
i = 1, . . . , n.
Probability of drawing a sample

SRSWR
Then the probability of selection of n units u1 , u2 , ..., un in
the sample is

P (u1 , u2 , . . . , un ) = = P (u1 )P (u2 ) . . . P (un )


1 1 1
= × × ... ×
N N N
1
= .
Nn
Probability of drawing a unit

SRSWOR
Let Al denote an event that a particular unit uj is not
selected at the lth draw. The probability of selecting, say,
jth unit at kth draw is

P (selection of uj at kth draw)


= P (A1 ∩ A2 ∩ . . . ∩ Ak−1 ∩ Āk )
= P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) × ...
×P (Ak−1 |A1 A2 ...Ak−2 ) × P (A¯k |A1 A2 ...Ak−1 ).

This is equal to
1 1 1 1 1
(1− )(1− )(1− )...(1− ) ,
N N −1 N −2 N −k+2 N −k+1
1
which is equal to N.
Probability of drawing a unit

SRSWR
P[selection of uj at kth draw]= N1
Sampling distribution of smaple mean and sample
variance

Theorem
Let X1 , . . . , Xn be a random sample drawn from the normal
population with mean µ and variance σ 2 . Denote X̄ and S 2 as
the sample mean and sample variance, respectively, where
X̄ = n1 ni=1 Xi and S 2 = n−1 1 Pn 2
i=1 (Xi − X̄) . Then,
P

X̄ and S 2 are independent,


X̄ ∼ N (µ, σ 2 /n),
(n − 1)S 2 /σ 2 ∼ χ2(n) .

Proof
see the lecture!
Part-VI
Statistical inference

Diagram of the Steps

(b)
Statistical inference

Producing data- how data are obtained, and what


considerations affect the data production process.
Exploratory data analysis-tools that help us get a first feel
for the data, by exposing their features using visual
displays and numerical summaries which help us explore
distributions, compare distributions, and investigate
relationships.
We use probability to quantify how much we expect
random samples to vary.
In statistical inference, we infer something about the
population based on what is measured in the sample.
Goal!

In statistical inference, we
draw conclusions about a
population based on the data
obtained from a sample
chosen from it.
Models

Classification of models

(
Parametric Models
Models →
Non-parametric Models

Parametric model is associated with finite-dimensional


parameter.
In case of the non-parametric model, the number and
nature of parameters is flexible and not fixed in advance.
An example

Resolving disputed ownership of a painting by tossing a


coin.

(c)
An example

Resolving disputed ownership of a painting by tossing a


coin.
Suppose two brothers Ramesh and Suresh agree to resolve
their disputed ownership of a painting by tossing a coin.
Ramesh produces a coin to Suresh. Suresh tosses the coin.
Ramesh calls Head. The coin comes to rest with Head
facing up. Thus, Ramesh takes possession of the property.
At the same evening, Suresh got disappointed and decided
to conduct an experiment. He tosses the same coin 100
times and observed 68 Head.
Therefore, Suresh thought the coin he tossed in the
morning may not be entirely fair. But, he is unwilling to
accuse his brother for the coin.
Question!

How will Suresh proceed?

(Statistical Inference!)
Modeling the example

Suresh’s experiment can be modeled as follows:


Here, each toss is a Bernoulli trial and the experiment is a
sequence of n = 100 trials. Let Xi denote the outcome of
the toss i. Then,
(
1, if Head is observed
Xi =
0, if Tail is observed.

Thus, we have X1 , . . . , X100 ∼ Bernoulli(p), where p


(probability of occurrence of head) is fixed but unknown to
Suresh. Now,
100
X
Z= Xi ∼ Binomial(100, p).
i=1
Questions!

Now, Suresh is interested to draw inferences about this fixed but


unknown quantity. We consider three questions he might ask.
What is the true value of p? More precisely, what is the
resonable guess as to the true value of p?
What are the possible values of p? In particular, is there a
subset of [0, 1] that Suresh can confidently claim contains
the true value of p?
Is p = 0.5? Specifically, is there any evidence that p 6= 0.5?
So that Suresh can completely accuse Ramesh for giving an
unfair coin.
Point Estimation
Point estimation

Let X1 , . . . , Xn be a random sample drawn from a model


(parametric).
Based on this random sample, we wish to come up with a
function that will estimate the unknown model parameters.
Let δn = g(X1 , . . . , Xn ) be an estimator. It is only useful
for inference if we know something about how it behaves.
Note that δn is a function of the random sample. It is a
random variable and its behavior depends on the sample
size n.
Point estimation

Illustration
Consider estimating the population mean of a normally
distributed population, say N (µ, 9).
The most obvious estimate is to simply draw a sample and
calculate the sample mean.
If we repeat this process with a new sample, we would
expect to get a different estimate.
The distribution that results from repeated sampling is
called the sampling distribution of the estimate.
Point estimation

500 estimates of the population mean based on a sample


size 100

(d)
Point estimation

Figure d illustrates 500 estimates of the population mean


based on a sample of size 100.
We can see that our estimates are generally centered
around the true value of 10, but there is some variation:
maybe a standard deviation of about 1.
These observations translate to more formal ideas: the
expectation of the estimator and the standard error of the
estimator.
Point estimation

So, what do we look for in a


good estimator?
Point estimation

Expected behaviours
We want our estimate to be close to the true value.
Also, we want δn to behave in a nice way as the sample size
n increases.
If we take a large sample, we would like the estimate to be
more accurate than a small sample.
Point estimation

500 estimates of the population mean based on sample sizes


of 5, 10, 25, 50 and 100

(e)
Point estimation

Each histogram in Figure e represents 500 estimates of the


population mean for sample sizes of 5, 10, 25, 50 and 100.
We can see that the standard deviation of the estimate is
smaller as the sample size increases. Formally, this is
embodied in the principle of consistency.
A consistent estimator will converge to the true parameter
value as the sample size increases.
Our estimator X̄ for the population mean of a normal
seems very well behaved.
Point estimation

Properties
Consistency- An estimator δn of θ is said to be consistent if
δn converges to θ in probability. Mathematically,

P (|δn − θ| > 0) → 0 as n → ∞.

Unbiasedness- An estimator δn is said to be unbiased of θ if

E(δn ) = θ.

Efficiency-An estimator is efficient if it has the lowest


possible variance among all unbiased estimators.
Method of moments (MOM) estimators

Let X be a random variable following some distribution,


say f (x|θ), where θ = (θ1 , . . . , θk ). Then the kth moment of
the distribution is defined as

µk = E(X k ).

The sample moments based on the random sample


X1 , . . . , Xn drawn from this distributon are given by
n
1X
µ̂k = Xk.
n i=1 i

The MOM estimator simply equates the moments of the


distribution with the sample moments (µk = µ̂k ) and solves
for the unknown parameters. Note that this implies the
distribution must have finite moments.
MOM estimators

Example-1
Let X1 , . . . , Xn be a random sample drawn from a Poisson
population with probability mass function

e−λ λx
P (X = x) = , x = 0, 1, 2 . . . ,
x!
where λ > 0. Obtain the MOM estimator of λ.

Solution
Here E(X) = λ. So that µ1 = E(X) = λ = X̄ = µ̂1 . Hence, the
method of moments estimator of λ is the sample mean.
MOM estimators

Example-2
Let X1 , . . . , Xn be a random sample drawn from a gamma
population with probability density function
λα α−1 −λx
fX (x) = x e , x > 0, α, λ > 0.
Γ(α)

Obtain the MOM estimators of λ and α.

Solution
The first two moments of the gamma distribution are µ1 = αλ
and µ2 = α(α+1)
λ2
. After solving these two equations, we get the
MOM estimators, which are given by

µ̂1 X̄
λ̂ = = and α̂ = λ̂µ̂1 .
µ̂2 − (µ̂1 ) 2 ¯
X − (X̄)2
2
MOM estimators
Example-3
Suppose X is a discrete random variable with the probability
mass function


3, x=0



θ,

x=1
P (X = x) = 32(1−θ)


 3 , x=2

 1−θ ,

x = 3,
3

where 0 ≤ θ ≤ 1. The following 10 independent observations


were taken from this distribution: (3, 0, 2, 1, 3, 2, 1, 0, 2, 1). Use
the MOM method to find the estimate of θ.
Solution
Here, the theoretical mean µ1 = E(X) = 37 − 2θ. The sample
mean is µ̂1 = X̄ = 1.5. Equating these two means, we have
5
θ̂ = 12 .
MOM estimators
Example-4
Let X1 , . . . , Xn be a random sample from a population with
probability density function

 1 e− |x|
σ , −∞ < x < ∞
fX (x|σ) = 2σ
0, otherwise,

where σ > 0. Please use the method of moment to estimate σ.


Solution
If we calculate the first order theoretical moment, we would
have Z ∞
1 |x|
E(X) = x e− σ dx = 0.
−∞ 2σ

Thus, if we try to solve equation E(X) = X̄, we will not get the
estimator, because E(X) does not contain the unknown
parameter σ.
MOM estimators

Solution (cont...)
Now, let us calculate the second order theoretical moment, we
have Z ∞
1 |x|
2
µ2 = E(X ) = x2 e− σ dx = 2σ 2 .
−∞ 2σ
The second order sample moment is
n
1X
µ̂2 = X 2.
n i=1 i

Solving the equation µ2 = µ̂2 , we get the estimate of σ as


sP
n 2
i=1 Xi
σ̂ = .
2n
MOM estimators

Properties of the MOM estimators


Nice properties:
it is consistent
sometimes easier to calculate than other methods say
maximum likelihood estimates
Not so nice properties:
sometimes not sufficient. (Sufficiency has a formal
definition but intuitively it means that all the data that are
relevant to estimating the parameter of interest are used.)
sometimes gives estimates outside the parameter space
Maximum likelihood Estimator (MLE)

Let X1 , X2 , . . . , Xn be a random vector of observations


with joint density function fX (x1 , . . . , xn |θ).
Then, the likelihood of θ as a function of the observed
values, Xi = xi , is defined as,

L(θ) = fX (θ|x1 , . . . , xn ).

The MLE of the parameter θ is the value of θ that


maximizes the likelihood function.
In general, it is easier to maximize the natural log of the
likelihood. In the case that the Xi ’ are iid, the log
likelihood is generally of the form,

l(θ) = log L(θ).


MLE

Example-5
Let X1 , . . . , Xn be a random sample from a population with
probability mass function
e−λ λx
(
P (X = x) = x! , x = 0, 1, 2, . . .
0, otherwise,

where λ > 0. Please use the method of maximum likelihood to


estimate λ.

Solution
The log-likelihood function is
n
X n
X
l(λ) = log λ xi − nλ − log xi !.
i=1 i=1
MLE

Solution (cont...)
To find the maximum we set the first derivative to zero,
n
1X
l0 (λ) = xi − n = 0.
λ i=1

Solving for λ we find that the MLE is


n
1X
λ̂ = xi = X̄
n i=1

NOTE:- Note that this agrees with the MOM estimator (see
Example-1).
MLE

Example-6
Let X1 , . . . , Xn be a random sample drawn from a gamma
population with probability density function
λα α−1 −λx
fX (x) = x e , x > 0, α, λ > 0.
Γ(α)

Obtain the MLEs of λ and α.

Solution
The log likelihood is
n
X n
X
l(λ, α) = nα log λ − n log Γ(α) + (α − 1) log xi − λ xi .
i=1 i=1
MLE

Solution (cont...)
In this case we have two parameters so we take the partial
derivatives and set them both to zero.
n
∂l X Γ0 (α)
= log xi + n log λ − n = 0.
∂α i=1 Γ(α)

n
∂l nα X
= − xi = 0.
∂λ λ i=1
MLE

Solution (cont...)
This second equality gives the MLE for λ as
α̂
λ̂ = .

Substituting this into the first equation we find that the MLE
for α must satisfy,
n
X Γ0 (α̂)
n log α̂ − n log X̄ + log xi − n = 0.
i=1
Γ(α̂)

This equation needs to be solved by numerical means.


NOTE:- Note however this is a different estimate to that given
by the method of moments.
MLE

Example-7
Consider Example-5 with the additional assumption that
λ ≤ λ0 . Obtain the MLE of λ.

Solution
In this case, the MLE of λ is
(
X̄, X̄ ≤ λ0
λ̂RM L =
λ0 , X̄ > λ0 .
MLE

(f)
MLE

Properties of the MLE


Nice properties:
consistent
unaffected by monotonic transformations of the data
MLE of a function of the parameters, is that function of
the MLE
theory provides large sample properties
asymptotically efficient estimators
Not so nice properties:
may be slightly biased
can be computationally demanding
Interval estimation
Interval estimation

Confidence interval
Let X be a random variable with distribution Pθ , θ ∈ Θ.
Consider a random sample X1 . . . , Xn drawn from this
distribution. Let δ1 (X) and δ2 (X) be two statistics such that

P (δ1 (X) ≤ g(θ) ≤ δ2 (X)) = 1 − α, ∀ θ ∈ Θ.

Then, if the random sample X = x is observed, we say that


[δ1 (x), δ2 (x)] is a 100 × (1 − α)% confidence interval for g(θ).
Interval estimation

(g)
Interval estimation

I. Confidence interval of µ of N (µ, σ 2 ), when σ 2 is known


Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and known variance σ 2 .
X̄−µ
Then, X̄ ∼ N (µ, σ 2 /n) and Z = √
σ/ n
∼ N (0, 1).
P (−zα/2 ≤ Z ≤ zα/2 ) = 1 − α.
Then, (x̄ − √σn zα/2 , x̄ + √σn zα/2 ) is a 100 × (1 − α)%
confidence interval for µ.
For example, if x̄ = 2, σ = 1, n = 4 and α = 0.05, then
zα/2 = 1.96. Thus, the confidence interval is (1.02, 2.98).
Interval estimation

(h)
Interval estimation

II. Confidence interval of µ of N (µ, σ 2 ), when σ 2 is


unknown
Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and unknown variance
σ2.
X̄−µ
T = √ ∼ tn−1 ,
s/ n
where s2 is the sample variance, given by
1 Pn
s2 = n−1 i=1 (xi − x̄)2 .
P (−tα/2,n−1 ≤ T ≤ tα/2,n−1 ) = 1 − α.
Then, (x̄ − √sn tα/2,n−1 , x̄ + √s tα/2,n−1 )
n
is a 100 × (1 − α)%
confidence interval for µ.
For example, if x̄ = 0.0506, s = 0.004, n = 10 and α = 0.05,
then tα/2,9 = 2.262. Thus, the confidence interval is
(0.0477, 0.0535).
Interval estimation

(i)
Interval estimation

III. Confidence interval of σ 2 of N (µ, σ 2 ), when µ is known


Let X1 , . . . , Xn be a random sample drawn from normal
population with known mean µ and unknown variance σ 2 .
Pn
(Xi −µ)2
W = i=1
σ2
∼ χ2n .
P (χ21−α/2,n ≤ W ≤ χ2α/2,n ) = 1 − α.
 Pn Pn 
(x −µ)2
i=1 i
(x −µ)2
i=1 i
Then, χ2α/2,n
, χ21−α/2,n
is a 100 × (1 − α)%
confidence
sP interval for σs2 . !
n 2
Pn 2
(x i −µ) (x i −µ)
Then, i=1
χ2
, i=1
χ2
is a 100 × (1 − α)%
α/2,n 1−α/2,n

confidence interval for σ.


Interval estimation

(j)
Interval estimation

IV. Confidence interval of σ 2 of N (µ, σ 2 ), when µ is


unknown
Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and unknown variance
σ2. P n
(Xi −X̄)2 (n−1)S 2
W∗ = i=1
σ2
=
σ2
∼ χ2n−1 .
P (χ21−α/2,n−1 ≤ ∗ 2
W ≤ χα/2,n−1 ) = 1 − α.
 
(n−1)s2 (n−1)s2
Then, 2 , 2
χα/2,n−1 χ1−α/2,n−1
is a 100 × (1 − α)% confidence
interval for σ 2 . 
r r
(n−1)s2 (n−1)s2
Then, χ2α/2,n−1
, χ21−α/2,n−1
is a 100 × (1 − α)%
confidence interval for σ.
Interval estimation
Interval estimation

Confidence interval
Let X be a random variable with distribution Pθ , θ ∈ Θ.
Consider a random sample X1 . . . , Xn drawn from this
distribution. Let δ1 (X) and δ2 (X) be two statistics such that

P (δ1 (X) ≤ g(θ) ≤ δ2 (X)) = 1 − α, ∀ θ ∈ Θ.

Then, if the random sample X = x is observed, we say that


[δ1 (x), δ2 (x)] is a 100 × (1 − α)% confidence interval for g(θ).
Interval estimation

(k)
Interval estimation

I. Confidence interval of µ of N (µ, σ 2 ), when σ 2 is known


Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and known variance σ 2 .
X̄−µ
Then, X̄ ∼ N (µ, σ 2 /n) and Z = √
σ/ n
∼ N (0, 1).
P (−zα/2 ≤ Z ≤ zα/2 ) = 1 − α.
Then, (x̄ − √σn zα/2 , x̄ + √σn zα/2 ) is a 100 × (1 − α)%
confidence interval for µ.
For example, if x̄ = 2, σ = 1, n = 4 and α = 0.05, then
zα/2 = 1.96. Thus, the confidence interval is (1.02, 2.98).
Interval estimation

(l)
Interval estimation

II. Confidence interval of µ of N (µ, σ 2 ), when σ 2 is


unknown
Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and unknown variance
σ2.
X̄−µ
T = √ ∼ tn−1 ,
s/ n
where s2 is the sample variance, given by
1 Pn
s2 = n−1 i=1 (xi − x̄)2 .
P (−tα/2,n−1 ≤ T ≤ tα/2,n−1 ) = 1 − α.
Then, (x̄ − √sn tα/2,n−1 , x̄ + √s tα/2,n−1 )
n
is a 100 × (1 − α)%
confidence interval for µ.
For example, if x̄ = 0.0506, s = 0.004, n = 10 and α = 0.05,
then tα/2,9 = 2.262. Thus, the confidence interval is
(0.0477, 0.0535).
Interval estimation

(m)
Interval estimation

III. Confidence interval of σ 2 of N (µ, σ 2 ), when µ is known


Let X1 , . . . , Xn be a random sample drawn from normal
population with known mean µ and unknown variance σ 2 .
Pn
(Xi −µ)2
W = i=1
σ2
∼ χ2n .
P (χ21−α/2,n ≤ W ≤ χ2α/2,n ) = 1 − α.
 Pn Pn 
(x −µ)2
i=1 i
(x −µ)2
i=1 i
Then, χ2α/2,n
, χ21−α/2,n
is a 100 × (1 − α)%
confidence
sP interval for σs2 . !
n 2
Pn 2
(x i −µ) (x i −µ)
Then, i=1
χ2
, i=1
χ2
is a 100 × (1 − α)%
α/2,n 1−α/2,n

confidence interval for σ.


Interval estimation

(n)
Interval estimation

IV. Confidence interval of σ 2 of N (µ, σ 2 ), when µ is


unknown
Let X1 , . . . , Xn be a random sample drawn from normal
population with unknown mean µ and unknown variance
σ2. P n
(Xi −X̄)2 (n−1)S 2
W∗ = i=1
σ2
=
σ2
∼ χ2n−1 .
P (χ21−α/2,n−1 ≤ ∗ 2
W ≤ χα/2,n−1 ) = 1 − α.
 
(n−1)s2 (n−1)s2
Then, 2 , 2
χα/2,n−1 χ1−α/2,n−1
is a 100 × (1 − α)% confidence
interval for σ 2 . 
r r
(n−1)s2 (n−1)s2
Then, χ2α/2,n−1
, χ21−α/2,n−1
is a 100 × (1 − α)%
confidence interval for σ.
References

(i) An Introduction to Probability and Statistics, Second


Edition, by V. K. Rohatgi and Md. E. Saleh, Wiley.
(ii) Introduction to Probability and Statistics for Engineers
and Scientists by S.M. Ross
(iii) Probability and Distributions, NPTEL Lecture notes,
Neeraj Misra, IIT Kanpur.
(iiv) Probability and Statistics, NPTEL Lecture notes, Somesh
Kumar, IIT Kharagpur.
Thank You

You might also like