0% found this document useful (0 votes)
27 views142 pages

Chapter 3

This document provides an introduction to common probability distributions used in statistics, including discrete distributions like Bernoulli, Binomial, and Poisson, as well as continuous distributions such as Normal and Exponential. It details the definitions, properties, and examples of these distributions, emphasizing their applications in modeling random variables. The chapter serves as a foundational resource for understanding how these distributions are characterized and utilized in statistical analysis.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views142 pages

Chapter 3

This document provides an introduction to common probability distributions used in statistics, including discrete distributions like Bernoulli, Binomial, and Poisson, as well as continuous distributions such as Normal and Exponential. It details the definitions, properties, and examples of these distributions, emphasizing their applications in modeling random variables. The chapter serves as a foundational resource for understanding how these distributions are characterized and utilized in statistical analysis.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
CHAPTER 3

Common Distributions

2 / 121
Outline:

Common Discrete Distributions

3.1 Bernoulli Distribution


3.2 Binomial Distribution
3.3 Geometric Distribution
3.4 Hypergeometric Distribution
3.5 Poisson Distribution

3 / 121
Common Continuous Distributions

3.6 Uniform Distribution


3.7 Exponential Distribution
3.8 Special Functions Arising in Statistics
① Gamma Function
② Beta Function
③ The Digamma and Trigamma Functions
④ The Φ Function and its Inverse
3.9 Normal Distribution
3.10 Gamma Distribution
3.11 Beta Distribution

Supplementary Material

4 / 121
In the previous chapters, we saw that the random variables
may be characterised by probability mass functions
(discrete case) and probability density functions
(continuous case).

Any non-negative function that sums to one is a legal


probability mass function.

Any non-negative function that integrates to one is a legal


probability density function.

Certain families of probability mass functions and


probability density functions are particularly useful in
Statistics.

This chapter covers the most common distributions.

5 / 121
3.1 Bernoulli Distribution

6 / 121
The Bernoulli distribution is very important to statistics
because it can be used to model responses to any Bernoulli
trial.

Definition
A Bernoulli trial is an experiment with two possible
outcomes. The outcomes are often labelled success or
failure.

7 / 121
Example
Coin tossing is a Bernoulli trial. We may define, for
example,
success = heads
failure = tails.

Some other examples of Bernoulli trials are


dead or alive
sick or not sick
flowering or not flowering
sold or not sold
faulty or not faulty.

8 / 121
In the Bernoulli trials, there are only two possible
responses to any of the questions. These trials are
modelled using the Bernoulli distribution.

Definition
For a Bernoulli trial, we define the random variable, X as
(
1 if the trial results in a success
X=
0 otherwise.

Then X is said to have a Bernoulli distribution.

9 / 121
Result
If X is a Bernoulli random variable defined according to a
Bernoulli trial Jump to the definition of a Bernoulli trial with a
probability of success p ∈ (0, 1), then the probability mass
function of X is

P (X = x) = px (1 − p)1−x , x = 0, 1,

Definition
A constant p in the probability mass function is called a
parameter.

10 / 121
Derivation:

P (X = 0) = p0 (1 − p)1 = 1 − p
P (X = 1) = p1 (1 − p)0 = p.

We denote a Bernoulli random variable X as

X ∼ Bernoulli(p).

Notation
The symbol ∼ is commonly used in Statistics for the phrase

is distributed as or has distribution.

11 / 121
Example
1
Consider tossing a coin. If the coin is fair, we have p = 2
and
 1 x  1 1−x 1
P (X = x) = 1− = , x = 0, 1.
2 2 2

This is consistent with the notion that heads and tails are
equally likely for a fair coin.

12 / 121
3.2 Binomial Distribution

13 / 121
The Binomial distribution arises when several Bernoulli
trials are repeated in succession. Hence, the Bernoulli
distribution is a special case of a Binomial distribution.

Definition
Consider a sequence of n independent Bernoulli trials, each
with a probability of success p. If

X = total number of successes,

then X is a Binomial random variable with parameters n


and p.

A common shorthand to represent a Binomial random


variable is
X ∼ Bin(n, p).

14 / 121
The mathematical expression

X ∼ Bin(29, 0.72)

is usually read as X has a Binomial distribution with


parameters n = 29 and p = 0.72.

We have a binomial distribution when we sum up the


number of times we observe a particular successful outcome
across n independent trials.

15 / 121
Example
Write down a distribution that could be used to model X.
1. The number of patients who survive a new type of
surgery out of 12 patients who have a 20% chance of
surviving.

16 / 121
Example
Write down a distribution that could be used to model X.
1. The number of patients who survive a new type of
surgery out of 12 patients who have a 20% chance of
surviving.

Solution:
Here, X is the number of patients who survive a new type
of surgery with parameters n = 12 and the probability of
success p = 0.20. That is

X ∼ Bin(12, 0.20).

16 / 121
Example
Write down a distribution that could be used to model X.
2. The number of patients who visited a doctor and were
sick out of the 36 patients. Assume that, in general,
70% of the people who visit the doctor are sick.

17 / 121
Example
Write down a distribution that could be used to model X.
2. The number of patients who visited a doctor and were
sick out of the 36 patients. Assume that, in general,
70% of the people who visit the doctor are sick.

Solution:
Here, X is the number of patients who visited the doctors
and were sick with parameters n = 36 and the probability
of success (being sick) p = 0.70. That is

X ∼ Bin(36, 0.70).

17 / 121
Example
Write down a distribution that could be used to model X.
3. The number of plants that are flowering out of 52
randomly selected plants in the wild, when the
proportion of plants in the wild flowering at the time is
p.

18 / 121
Example
Write down a distribution that could be used to model X.
3. The number of plants that are flowering out of 52
randomly selected plants in the wild, when the
proportion of plants in the wild flowering at the time is
p.

Solution:
Here, X is the number of plants flowering in the wild with
parameters n = 52 and the probability of success
(flowering) p. That is

X ∼ Bin(52, p).

18 / 121
Example
Write down a distribution that could be used to model X.
4. The number of plasma screen televisions a store sells in
a month, out of the store’s total stock of 18, each with
a probability of 0.6 of being sold.

19 / 121
Example
Write down a distribution that could be used to model X.
4. The number of plasma screen televisions a store sells in
a month, out of the store’s total stock of 18, each with
a probability of 0.6 of being sold.

Solution:
Here, X is the number of plasma screen televisions a store
sells monthly with parameters n = 18 and the probability
of success (being sold) p = 0.60. That is

X ∼ Bin(18, 0.60).

19 / 121
Example
Write down a distribution that could be used to model X.
5. The number of plasma screen televisions returned to
the store due to faults, out of the nine televisions sold,
when the return rate for this type of television is 8%.

20 / 121
Example
Write down a distribution that could be used to model X.
5. The number of plasma screen televisions returned to
the store due to faults, out of the nine televisions sold,
when the return rate for this type of television is 8%.

Solution:
Here, X is the number of plasma screen televisions returned
to the store due to faults with parameters n = 9 and the
probability of success (return rate) p = 0.08. That is

X ∼ Bin(9, 0.08).

20 / 121
In each example, we required the assumption of
independence of responses across the n units to use the
binomial distribution.

This assumption is guaranteed to be satisfied if we


randomly select units from some larger population (as was
done in the plant example ).

21 / 121
Result
If X ∼ Bin(n, p), then its probability mass function is given
by
 
n x
P (X = x) = p (1−p)n−x , x = 0, 1, . . . , n, 0 < p < 1.
x

This result follows from the fact that there are nx ways by


which X can take the values of x, and each of these ways


have probability px (1 − p)n−x of occurring.

22 / 121
Results
If X ∼ Bin(n, p), then
1 E(X) = np See Derivation Here

2 V ar(X) = np(1 − p) See Derivation Here

u n
3 mX (u) = (p e + 1 − p) See Derivation Here

23 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What distribution could be used to model the number
of slices of toast that landed butter side down? Assume
a 50:50 chance of each slice landing butter side down.

24 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What distribution could be used to model the number
of slices of toast that landed butter side down? Assume
a 50:50 chance of each slice landing butter side down.

Solution:
Let X be the number of slices of toast that landed butter
side down. So X is a binomial random variable with
parameters n = 10 and the probability of success (slice
landing butter side down) p = 0.5, that is, X ∼ Bin(10, 12 ).

24 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the expected number of pieces of toast that
land butter side down, and what is the standard
deviation?

25 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the expected number of pieces of toast that
land butter side down, and what is the standard
deviation?
Solution:
We know that the expected number of a binomial random
variable is
1
E(X) = np = 10 × = 5
2
and the standard deviation is
r
p p 1 1
V ar(X) = np(1 − p) = 10 × × (1 − ) ≈ 1.88
2 2

25 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the probability that exactly seven slices land
butter side down?

26 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the probability that exactly seven slices land
butter side down?

Solution:
Here we want the probability P (X = 7). That is,
   
10 1 7 1 10−7  1 10
P (X = 7) = 1− = 120 ×
7 2 2 2

26 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the probability that at least seven slices land
butter side down?

27 / 121
Example
Adam pushes ten pieces of toast off a table. Seven of these
landed butter side down.
What is the probability that at least seven slices land
butter side down?

Solution:
Here we want the probability P (X ≥ 7). That is,
"       #
10 10 10 10  1 10
P (X ≥ 7) = + + + ≈ 0.1719.
7 8 9 10 2

27 / 121
As mentioned earlier, the Binomial distribution generalises
the Bernoulli distribution.

Result
X has a Bernoulli distribution with parameter p if and only
if
X ∼ Bin(1, p).

28 / 121
3.3 Geometric Distribution

29 / 121
The Geometric distribution arises when a Bernoulli trial
is repeated until the first success. In this case,

X = the number of trials until the first success,

and X is said to have a geometric distribution with


parameter p, where p is the probability of success on each
trial.

A common shorthand to represent a geometric random


variable is
X ∼ geometric(p).

30 / 121
Results
If X has a geometric distribution with parameter p ∈ (0, 1),
then
1 X has probability mass function

P (X = x) = (1 − p)x−1 p, x = 1, 2, 3, . . .
1
2 E(X) = p
1−p
3 V ar(X) = p2

The above form of the geometric distribution is used for modelling the number of
trials up to and including the first success.

By contrast, the following form of the geometric distribution is used for modelling
Y , the number of failures until the first success:
P (Y = x) = P (X = x + 1) = (1 − p)x p x = 0, 1, 2, . . . , .

In either case, the sequence of probabilities is geometric.

31 / 121
Example
Suppose we repeatedly flip a fair coin with probability p of
coming up heads and the probability of 1 − p of coming up
tails, where 0 < p < 1.

Let X be the number of tails that appear before the first


heads.

Then, for x = 1, 2, 3, . . . , X = x if and only if the coin


shows exactly x − 1 tails followed by a head.

The probability of this is equal to (1 − p)x−1 p.

32 / 121
3.4 Hypergeometric Distribution

33 / 121
Hypergeometric random variables arise when counting the
number of binary (success/failure or yes/no) responses
when objects are sampled independently from finite
populations and the total number of successes in the
population is known.

Suppose that a box contains N balls, m are red and N − m


balls are black. Suppose n balls are drawn at random. Let

X = the number of red balls drawn.

Then X has a hypergeometric distribution with


parameters N, m and n and is denoted as

X ∼ hypergeometric(N, m, n).

34 / 121
Results
If X has a hypergeometric distribution with parameters
N, m and n, then
probability mass function is given by
m
 N −m
x n−x
P (X = x) = N
 , max(0, n + m − N ) ≤ x ≤ min(m, n)
n

m
E(X) = n · N
m m
 N −n 
V ar(X) = n · N
1− N N −1
.

35 / 121
The hypergeometric distribution can be considered a finite
population version of the binomial distribution.

Instead of assuming some constant probability p of success


in the population, we say that there are N units in the
population of which m are successes.

36 / 121
It can be shown that as N gets large, a hypergeometric
distribution with parameters N, m and n approaches the
m
random variable Y ∼ Bin n, N .
m
A suggestion of this can be seen in E(X) = n · N  is equal to
m
its binomial version, E(Y ), where Y ∼ Bin n, N .

m m
 N −n 
Also, V ar(X) = n · N 1 − N N −1 only differs from its
binomial version by a finite population correction factor
N −n
N −1
, which tends to one as N gets large.

37 / 121
Example
A box contains balls numbered from one to eighty. You may select ten numbers
from the integers one to eighty. Twenty balls are specified randomly (think of
these as red balls). You win the major prize if all ten numbers are on red balls.

38 / 121
Example
A box contains balls numbered from one to eighty. You may select ten numbers
from the integers one to eighty. Twenty balls are specified randomly (think of
these as red balls). You win the major prize if all ten numbers are on red balls.

Solution:
This hypergeometric distribution has parameters N = 80, n = 10 and m = 20.
That is, let X be the number of your selected numbers that are on red balls. Then

20 60 
x 10−x
P (X = x) = 80
and
10
20 60 
10 10−10
P (win major prize) = P (X = 10) = 80
10
20
10
= 80
,
10

which is about 1 in 8.9 million.

38 / 121
Example
Write down a distribution that could be used to model X.
1. The number of patients a town doctor sees who are
sick: when 800 people want to see a doctor, 500 of
these are sick, and when the doctor only has time to
see 32 of the 800 people (who are selected effectively at
random from those that want to see the doctor).

39 / 121
Example
Write down a distribution that could be used to model X.
1. The number of patients a town doctor sees who are
sick: when 800 people want to see a doctor, 500 of
these are sick, and when the doctor only has time to
see 32 of the 800 people (who are selected effectively at
random from those that want to see the doctor).

Solution:
Let X be the number of sick patients a town doctor sees.
Here we have N = 800, m = 500 and n = 32. That is,

X ∼ hypergeometric(800, 500, 32).

39 / 121
Example
Write down a distribution that could be used to model X.
2. The number of plants that are flowering out of 52
randomly selected plants in the wild, from a
population of 800 plants, of which 650 are flowering.

40 / 121
Example
Write down a distribution that could be used to model X.
2. The number of plants that are flowering out of 52
randomly selected plants in the wild, from a
population of 800 plants, of which 650 are flowering.

Solution:
Let X be The number of plants that are flowering with
N = 800, m = 650 and n = 52. That is,

X ∼ hypergeometric(800, 650, 52).

40 / 121
Example
Write down a distribution that could be used to model X.
3. The number of faulty plasma screen televisions
returned to a store, when five of the stores’ eighteen
televisions were faulty, and six of the eighteen were
sold.

41 / 121
Example
Write down a distribution that could be used to model X.
3. The number of faulty plasma screen televisions
returned to a store, when five of the stores’ eighteen
televisions were faulty, and six of the eighteen were
sold.
Solution:
Let X be the number of faulty plasma screen televisions
returned to a store with N = 18, m = 5 and n = 6. That is,

X ∼ hypergeometric(18, 5, 6).

41 / 121
3.5 Poisson Distribution

42 / 121
The Poisson distribution often arises when the random
variable of interest is a count.

For example, the number of traffic accidents in a city on


any given day could be well-described by a Poisson random
variable.

The Poisson distribution is particularly useful for modelling


the number of times that rare events occur - events that
can reasonably be assumed to follow what is known as the
Poisson process.

43 / 121
Definition
The random variable X has a Poisson distribution with
parameter λ > 0 if its probability mass function is given by

e−λ λx
P (X = x) = , x = 0, 1, 2, 3, . . . , .
x!

A common abbreviation is

X ∼ Poisson(λ).

44 / 121
Results
If X ∼ Poisson(λ), then
1 E(X) = λ
2 V ar(X) = λ
u
3 mX (u) = eλ (e −1) .

45 / 121
Example
Suggest a distribution that could be useful for studying X.
1. The number of workplace accidents in a month when
the average number of accidents is 1.4 .

46 / 121
Example
Suggest a distribution that could be useful for studying X.
1. The number of workplace accidents in a month when
the average number of accidents is 1.4 .

Solution:
Let X be the number of workplace accidents per month
with parameter λ = 1.4. That is,

X ∼ Poisson(1.4) .

46 / 121
Example
Suggest a distribution that could be useful for studying X.
2. The number of people calling a helpline per day.

47 / 121
Example
Suggest a distribution that could be useful for studying X.
2. The number of people calling a helpline per day.

Solution:
Let X be the number of people calling a helpline per day,
with parameter λ being the average number of people
calling the helpline daily. That is,

X ∼ Poisson(λ).

47 / 121
Example
Suggest a distribution that could be useful for studying X.
3. The number of ATM customers overnight when a bank
is closed when the average number is 5.6.

48 / 121
Example
Suggest a distribution that could be useful for studying X.
3. The number of ATM customers overnight when a bank
is closed when the average number is 5.6.

Solution:
Let X be the number of ATM customers overnight when a
bank is closed with parameter λ = 5.6. That is

X ∼ Poisson(5.6).

48 / 121
Example
4. If, on average, five servers go offline during the day,
what is the chance that no more than one server will go
offline? Assume independence of servers going offline.

49 / 121
Example
4. If, on average, five servers go offline during the day,
what is the chance that no more than one server will go
offline? Assume independence of servers going offline.

Solution:
Let X be the number of servers that go offline during the day
with parameter λ = 5. That is, X ∼ Poisson(5). We want

P (no more than one server goes offline during the day)
= P (X ≤ 1)
= P (X = 0) + P (X = 1)
e−5 50 e−5 51
= + = e−5 + e−5 5 = 6e−5 .
0! 1!

49 / 121
Common Continuous Distributions

50 / 121
So far, we considered discrete random variables X for
which P (X = x) > 0 for certain values of x.

Now, we wish to consider continuous random variables.

A continuous random variable takes an infinite number of


possible values.

That is, a random variable X is continuous if possible


values comprise either a single interval on the number
line or a union of disjoint intervals.

Continuous random variables are usually measurements


such as height, weight, the time required to walk to school,
etc.

51 / 121
Definition
A random variable X is continuous if

P (X = x) = 0 for all x ∈ R.

52 / 121
3.6 Uniform Distribution

53 / 121
The uniform distribution is the simplest common
distribution for continuous random variables.

Definition
A continuous random variable X that can take values in
the interval (a, b) with equal probability is said to have a
uniform distribution on (a, b), and has probability
density function given by
1
fX (x; a, b) = , a < x < b; a < b.
b−a

A common shorthand to denote a random variable of the


uniform distribution is

X ∼ Uniform(a, b).

54 / 121
1
Note that fX (x; a, b) = b−a a < x < b; a < b, is simply a
constant function over the interval (a, b) and zero
otherwise. That is,
(
1
, a < x < b, a < b
fX (x; a, b) = b−a
0 otherwise.

55 / 121
Graphs of Different Uniform Density Functions

56 / 121
Results
If X ∼ Uniform(a, b), then
1 E(X) = a+b
2
(b−a)2
2 V ar(X) = 12
ebu −eau
3 mX (u) = (b−a) u
.

57 / 121
Note that there is also a discrete version of the uniform
distribution, which is useful for modelling the outcome of an
event with k equally likely outcomes, such as a roll of a die.

This has a different formula for its expectation and


variance than the continuous case, which can be derived
from first principles.

58 / 121
3.7 Exponential Distribution

59 / 121
The Exponential distribution is the simplest common
distribution for describing the probability structure of
continuous positive random variables, such as
lifetimes.

60 / 121
Definition
A random variable X is said to have an exponential
distribution with parameter β > 0 if it has the probability
density function of X is given by
1 −x/β
fX (x; β) = e , x>0
β

The common abbreviation is

X ∼ exp(β).

61 / 121
Graphs of Various Exponentials

62 / 121
The cumulative distribution function of X is given by
Z x
1 −y/β
FX (x; β) = P (X ≤ x) = e dy
0 β
= 1 − e−x/β , x > 0,

and tail probability is

P (X > x) = 1 − P (X ≤ x) = 1 − FX (x; β) = e−x/β , x > 0.

63 / 121
Results
If X has an exponential distribution with parameter β > 0,
then
1 E(X) = β
2 V ar(X) = β 2
1
3 mX (u) = 1−β u
, u < 1/β.

64 / 121
The exponential distribution is closely related to the
Poisson distribution.

We know that if a random variable follows a Poisson


process, then the number of times a particular event is
counted has a Poisson distribution with parameter λ.

It can be shown that the time between two consecutive


events has an exponential distribution with parameter
β = 1/λ.

65 / 121
Example
If, on average, five servers go offline during the day, what is
the chance that no servers will go offline in the next hour?
1
(Hint: Note that an hour is 24 of a day.)

66 / 121
Example
If, on average, five servers go offline during the day, what is
the chance that no servers will go offline in the next hour?
1
(Hint: Note that an hour is 24 of a day.)

Solution:
Let T be the time until the next server goes offline with
parameter β = 1/λ = 245
, where λ = 5 per day and λ = 24 5

per hour. That is T ∼ exp( 24


5
), and we want the P (T > 1).
Z 1
1 −x/β
P (T > 1) = 1 − P (T ≤ 1) = 1 − e dx
0 β
Z 1
1 −x/(24/5)
= 1− e dx = e−5 x/24 .
0 24/5

66 / 121
An important property of the exponential distribution
is lack of memory or the memoryless property.

That is, if X has an exponential distribution, then

P (X > s + t|X > s) = P (X > t).

In other words, if the waiting time until the next event is


exponential, then the waiting time until the next event is
independent of the time you have already been waiting.

Note that the exponential distribution is a special case of


the Gamma distribution, which we will discuss later.

67 / 121
3.8 Special Functions Arising in Statistics

68 / 121
There are three more special distributions that we consider,

Gamma distribution
Normal distribution
Beta distribution

However, before discussing these distributions, we need to


introduce some special functions closely related to these
distributions.

69 / 121
3.8.1 Gamma Function

70 / 121
The Gamma function is essentially an extension of the
factorial function (e.g. 4! = 4 × 3 × 2 × 1 = 24) to the
general real numbers.

Definition
The Gamma function at x ∈ R is given by
Z ∞
Γ(x) = tx−1 e−t dt.
0

71 / 121
Results
Some basic results for the Gamma function are
1 Γ(x) = (x − 1) Γ(x − 1)
2 Γ(n) = (n − 1)!, n = 1, 2, 3, . . .

3 Γ( 21 ) = π
4 If m is a non-negative integer, then
Z ∞
Γ(m + 1) = xm e−x dx = m!
0

72 / 121
Example
Suppose X has a probability density function
1 5 −x
fX (x) = x e , x > 0.
120
Find E(X) using the gamma function.

73 / 121
Example
Suppose X has a probability density function
1 5 −x
fX (x) = x e , x > 0.
120
Find E(X) using the gamma function.
Solution:
Z ∞
E(X) = x fX (x) dx definition
−∞
Z ∞ 1 5 −x
= x x e dx
0 120
Z ∞
1
= x6 e−x dx
120 0
1 Jump to definition of a Gamma function
= Γ(7)
120
1 6×5×4×3×2×1
= 6! = = 6.
120 120

73 / 121
3.8.2 Beta Function

74 / 121
Definition
The Beta function at x, y ∈ R is given by
Z 1
B(x, y) = tx−1 (1 − t)y−1 dt.
0

Result
For all x, y ∈ R,

Γ(x) Γ(y)
B(x, y) = .
Γ(x + y)

75 / 121
Example
Suppose X has a probability density function

fX (x) = 168 x2 (1 − x)5 dx, 0 < x < 1.

Find E(X 2 ), using the beta function.

76 / 121
Example
Suppose X has a probability density function

fX (x) = 168 x2 (1 − x)5 dx, 0 < x < 1.

Find E(X 2 ), using the beta function.


Solution:
Z ∞
E(X 2 ) = x2 fX (x) dx definition
−∞
Z 1
= x2 168 x2 (1 − x)5 dx
0
Z 1
= 168 x4 (1 − x)5 dx
0
= 168 B(5, 6) Jump to definition of a beta function

Γ(5) Γ(6) 4! 5!
= 168 = 168 .
Γ(11) 10!

76 / 121
3.8.3 The Digamma and Trigamma Functions

77 / 121
The digamma and trigamma functions will be required
later, but we will give their definitions here since they are
closely related to the Gamma function.

Definition
For all x ∈ R,
d
digamma(x) = ln{Gamma(x)}
dx
d2
trigamma(x) = ln{Gamma(x)}.
dx2

78 / 121
Unlike other mathematical functions, such as sin(x),
tan−1 (x), log10 , etc., the digamma and trigamma functions
are unavailable on ordinary hand-held calculators.

However, they are just another set of special functions


available in mathematics and statistics computer software
such as Maple, Matlab, and R.

The following figure (constructed using R software)


displays them graphically.

79 / 121
Graphs of Digamma, Trigamma, and Other Gamma Functions

R Documentation
80 / 121
3.8.4 The Φ Function and its Inverse

81 / 121
The final special function we will consider is routinely
denoted by the capital phi symbol Φ. It is defined as
follows.

Definition
For all x ∈ R,
Z x
1 2 /2
Φ(x) = √ e−t dt.
2π −∞

Note that Φ(x) cannot be simplified any further than in the


2
above expression since e−t /2 does not have a closed-form
anti-derivative. This function gives the cumulative function
of the standard normal distribution.

82 / 121
Results
1 limx→−∞ Φ(x) = 0
2 limx→∞ Φ(x) = 1
3 Φ(0) = 12
4 Φ is a monotonically increasing function over R.

83 / 121
The previous result shows that the inverse of Φ, denoted by
Φ−1 , is well-defined for all 0 < x < 1.

Examples are
1
Φ−1 = 0 and Φ−1 (0.975) = 1.95996 · · · ≈ 1.96.
2

We will see later that the function Φ−1 plays a particular


important role in statisitical inference.

84 / 121
3.9 Normal Distribution

85 / 121
A particularly important family of continuous random
variables is those following the normal distribution.

Definition
The random variable X is said to have a normal
distribution with parameters µ, −∞ < µ < ∞, and σ 2 > 0
if X has probability density function
1 −(x−µ)2
fX (x; µ, σ) = √ e 2 σ2 , −∞ < x < ∞.
σ 2π

A common shorthand notation of a normally distributed


random variable is

X ∼ N (µ, σ 2 ).

86 / 121
Normal density functions are symmetric bell shaped curves,
symmetric about µ. The following figures show four
different normal density functions.

87 / 121
88 / 121
The normal distribution is important because of the
central limit theorem, which will be discussed in
Chapter 6.

It is also sometimes useful in practice when studying


variables with an approximately normal distribution (such
as the height of people).

This, however, is not the main use of the distribution. It is


more commonly used in making inferences about statistics
via the central limit theorem, which will be described in
Chapters 8-10.

89 / 121
Results
1 E(X) = µ
2 V ar(X) = σ 2
1 2 u2
3 mX (x) = eµ u + 2 σ .

90 / 121
The special case of µ = 0 and σ 2 = 1 is known as the
standard normal distribution.

It is common to use the letter Z to denote standard normal


random variables. Its common shorthand notation is
Z ∼ N(0, 1).

The standard normal density function is


1 −x2
fZ (x) = √ e 2 , −∞ < x < ∞,

and the cumulative distribution function of a standard
normal random variable is
Z x
1 2
Φ(x) = FZ (x) = P (Z ≤ x) = √ e−t /2 dt.
2 π −∞

91 / 121
3.9.1 Computing Normal Distribution Probabilities

92 / 121
93 / 121
Consider the problem
Z 0.47
1 2
P (Z ≤ 0.47) = √ e−x /2 dx.
−∞ 2π

The standard normal density function does not have a


closed-form anti-derivative and cannot be solved as usual.

However, tables for Φ(x) = P (Z ≤ x) are available or using


statistical package RStudio, by typing pnorm at the
prompt. That is,

> pnorm(0.47)
[1] 0.6808225

That is,
P (Z ≤ 0.47) ≈ 0.6808.

94 / 121
95 / 121
Some other examples are
P (Z ≤ 1) = Φ(1) ≈ 0.8413 and P (Z ≤ 1.54) = Φ(1.54) ≈ 0.9382.

In RStudio, type pnorm(1) and pnorm(1.54),


respectively.

To find the probability such as P (Z > 0.81), we need to


work with the complement, i.e.,

P (Z > 0.81) = 1 − P (Z ≤ 0.81) = 1 − Φ(0.81)


≈ 1 − 0.7910 = 0.2090.

96 / 121
How about probabilities concerning non-standard normal
random variables?

For example, how do we find

P (X ≤ 12) where X ∼ N (10, 9)?

Result
If X ∼ N (µ, σ 2 ), then

X −µ
Z= ∼ N (0, 1).
σ

97 / 121
Example
Find P (X ≤ 12), where X ∼ N (10, 9).

98 / 121
Example
Find P (X ≤ 12), where X ∼ N (10, 9).
Solution:
!
X − 10 12 − 10
P (X ≤ 12) = P ≤
3 3
= P (Z ≤ 0.67), where Z ∼ N (0, 1)
= Φ(0.67) ≈ 0.7486.

We can also use RStudio, and type at the prompt


pnorm(12, 10, 3), which gives us 0.7475075.

In general, using RStudio, and type at the prompt


pnorm(x, µ, σ).

98 / 121
Example
Young men’s height is approximately normal, with a mean
of 174 cm and standard deviations of 6.4 cm.
1 What percentage of these men are taller than six feet
(i.e. 182 cm)?
2 What is the chance that a randomly selected young
man is 170-something cm tall?
3 Find a range of heights that contains 95% of young
men.

99 / 121
Example

Solution:
Let X be the height of a young man, and we are given that
X ∼ N (174, 6.42 ).
➊ We want P (X > 182.9). That is
 X − 174 182.9 − 174 
P (X > 182.9) = P >
6.4 6.4
= P (Z > 1.3906)
= 1 − P (Z ≤ 1.3906) ≈ 0.0822.

In RStudio, type at the prompt


1 − pnorm(182.9, 174, 6.4),
which gives 0.08216959.

100 / 121
Example

Solution:
Let X be the height of a young man, and we are given that
X ∼ N (174, 6.42 ).
➋ We want P (170 < X < 180). That is
 170 − 174X − 174 180 − 174 
P (170 < X < 180) = P < <
6.4 6.4 6.4
= P (−0.625 < Z < 0.9375)
= Φ(0.9375) − Φ(−0.625) ≈ 0.5598.

In RStudio, type at the prompt


pnorm(180, 174, 6.4) − pnorm(170, 174, 6.4),
which gives 0.5597638.

101 / 121
Example

Solution:
Let X be the height of a young man, and we are given that
X ∼ N (174, 6.42 ).
➌ We want to find c such that
 X − 174 
0.95 = P < c = P (|Z| < c).
6.4
=⇒ c = Φ−1 (0.975) = 1.96
=⇒ Range: 174 ± 1.96 × 6.4 = (161.456, 186.544)

In RStudio, type at the prompt


qnorm(0.025, 174, 6.4) and qnorm(0.975, 174, 6.4),
separately, which gives (161.4562, 186.5438).

102 / 121
3.10 Gamma Distribution

103 / 121
Definition
A random variable X is said to have a Gamma
distribution with parameters α > 0 (shape parameter)
and β > 0 (scale parameter) if X has a probability density
function
e−x/β xα−1
fX (x; α, β) = , x > 0.
Γ(α) β α

A common shorthand notation is

X ∼ Gamma(α, β) or X ∼ Γ(α, β).

104 / 121
Gamma density functions are skewed curves on the positive
half-line.

The following figure shows four different Gamma density


functions.

105 / 121
Graphs of Various Exponentials

106 / 121
Results
If X ∼ Gamma(α, β), then
1 E(X) = α β
2 V ar(X) = α β 2
 α
1
3 mX (u) = 1−β u
, u < β1 .

107 / 121
As mentioned previously, the Gamma distribution
generalises the Exponential distribution.

The following result makes this explicit.

Result
X has an exponential distribution ( i.e. X ∼ Exp(β)) if
and only if
X ∼ Gamma(1, β).

108 / 121
3.11 Beta Distribution

109 / 121
For completeness, we conclude the chapter with the Beta
Distribution.

This generalises the Uniform(0,1) distribution, which


can be thought of as a beta distribution with α = β = 1

110 / 121
Definition
A random variable X is said to have a Beta distribution
with parameters α > 0 and β > 0 if its probability density
function is
xα−1 (1 − x)β−1
fX (x; α, β) = , 0 < x < 1.
B(α, β)

Here B(α, β) is the beta function. See Definition of Beta Function

111 / 121
Results
If X has a Beta distribution with parameters α > 0 and
β > 0, then

α
1 E(X) = α+β
αβ
2 V ar(X) = (α + β + 1) (α + β)2
.

112 / 121
Supplementary Material

113 / 121
Supplementary Material-Binomial Mean

n  
X n x
E(X) = x p (1 − p)n−x
x=0
x
n
X n!
= x px (1 − p)n−x
x=0
x! (n − x)!
n
X n!
= px (1 − p)n−x
x=1
(x − 1)! (n − x)!
(since the zero term vanishes)

Let y = x − 1 and m = n − 1. Substituting x = y + 1 and


n = m + 1 into the last sum (and using the fact that the
limits x = 1 and x = n correspond to y = 0 and
y = n − 1 = m, respectively)

114 / 121
Supplementary Material-Binomial Mean (continued)

m
X (m + 1)! x
E(X) = p (1 − p)n−x
y=0
y!, (m − y)!
m
X (m + 1)! y+1
= p (1 − p)m−y
y=0
y! (m − y)!
m
X m!
= (m + 1) p py (1 − p)m−y
y=0
y! (m − y)!
m
X m!
= np py (1 − p)m−y .
y=0
y! (m − y)!

115 / 121
Supplementary Material-Binomial Mean (continued)

The binomial theorem says that


m
m
X m!
(a + b) = ay bm−y .
y=0
y! (m − y)!

Setting a = p and b = 1 − p, we see that


m m
X m! y m−y
X m! y m−y m m
p (1 − p) = a b = (a + b) = (p + 1 − p) = 1.
y=0 y! (m − y)! y=0 y! (m − y)!

We the above calculation, we conclude that

E(X) = np.

116 / 121
Supplementary Material- Binomial Variance

Similarly, but this time using y = x − 2 and m = n − 2, we


have
n  
X n x
E(X (X − 1)) = x (x − 1) p (1 − p)n−x
x=0
x
n
X n!
= px (1 − p)n−x
x=2
(x − 2)! (n − x)!
n
X (n − 2)!
= n (n − 1) p2 px−2 (1 − p)n−x
x=2
(x − 2)! (n − x)!
m
X m!
= n (n − 1) p2 py (1 − p)m−y
y=0
y! (m − y)!
= n (n − 1) p (p + (1 − p))m
2
(by the binomial theorem)
2
= n (n − 1) p .
So, the variance of X ∼ Bin(n, p) is
117 / 121
Supplementary Material-Binomial Moment Generating Function

mx (u) = E(eu X ) definition

n  
X n x
ux
= e p (1 − p)n−x
x=0
x
n  
X n
= (p eu )x (1 − p)n−x
x=0
x

= (p eu + 1 − p)n by the Binomial Theorem

with a = p e
u
.
and b = 1 − p

118 / 121
Supplementary Material-Binomial Theorem

Binomial Theorem
Let a and b be constant. Then,
m  
m
X m y m−y
(a + b) = a b
y=0
y
m
X m!
= ay bm−y .
y=0
y! (m − y)!

119 / 121
Supplementary Material - R Documentation on Digamma and other Gamma Functions

Description
Special mathematical functions related to the beta and gamma functions.

Usage
beta(a, b)
lbeta(a, b)

gamma(x)
lgamma(x)
psigamma(x, deriv = 0)
digamma(x)
trigamma(x)

Arguments
a, b non-negative numeric vectors.
x, n numeric vectors.
k, deriv integer vectors.

The functions beta and lbeta return the beta function and the natural logarithm of the beta
function,
Γ(a) Γ(b)
B(a, b) =
Γa + b)
The functions gamma and lgamma return the Γ(x) and the natural logarithm of the absolute
value of the gamma function. The gamma function (See Abramowitz and Stegun, Section 6.2.1,
page 255) is defined by Z ∞
x−1 −t
Gamma(x) = t e dt,
0
for all real x except zero and negative integers (when NaN is returned).

120 / 121
Supplementary Material - R Documentation on Digamma and other Gamma Functions -

continued

The functions digamma and trigamma return the first and second derivatives of the logarithm
of the gamma function psigamma(x, deriv) (deriv >= 0) computes the deriv-th derivative of
Ψ(x).
That is,
d Γ′ (x)
digamma(x) = Ψ(x) = ln Γ(x) = ,
dx Γ(x)

where Ψ and its derivatives, the psigamma(), are often called the polygamma functions (See
Abramowitz and Stegun, page 260).

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language.
Wadsworth & Brooks/Cole. (For gamma and lgamma.)

Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New


York: Dover. Chapter 6: Gamma and Related Functions.

121 / 121

You might also like