896-21-02
896-21-02
896-21-02
In this and a few more coming lectures, we will review the notion of random
variables. Random variables (r.v.) will be denoted X, Y, ... . You may remember
that there are two main types of random variables, discrete random variables and
continuous random variables. Let us start with the discrete r.v.
P(X = xi ) = f (xi ),
The function f (xi ) is called probability mass function (pmf ) of the random variable
X. The collection of all probabilities P(X = xi ) is called distribution of the random
variable X.
Remark 1. By our definition, the distribution P(X = xi ) of any random variable is
uniquely defined. However, such a distribution does not uniquely define the random
variable X, as there may be many other random variables Y, Z, ..., with he same
distribution.
Suppose we have two discrete random variables: random variable X, taking values
in a set {x1 , x2 , ... }, with pmf fX (xi ), and another random variable Y , taking values
in a set {y1 , y2 , ... }, with pmf fY (yj ). For all possible i and j, one can consider the
following probabilities
P(X = xi , Y = yj ) = f (xi , yj ),
or, equivalently,
1
f (xi , yj ) = fX (xi )fY (yj ).
This definition is easy to generalize for any (finite) number of random variables.
Indeed, suppose we have a discrete random variable random variable X, taking
values in a set {x1 , x2 , ... }, with pmf fX (xi ); another random variable Y , taking
values in a set {y1 , y2 , ... }, with pmf fY (yj ) ...; and finally a random variable Z,
taking values in a set {z1 , z2 , ... }, with pmf fZ (zk ). For all possible i, j, ..., k, one
can consider the following probabilities
or, equivalently,
The function f (xi , yj , ..., zk ) is called the joint pmf of the random variables X, Y, ..., Z.
The set of probabilities P(X = xi , Y = yj , ..., Z = zk ) is called the joint distribution
of the variables X, Y, ..., Z. The pmf’s fX (xi ), fY (yj ), ..., fX (zk ) are called marginal
pmf ’s of X, Y, ..., Z, respectively. One can rephrase the Definition 3 as follows:
joint pmf of independent rv’s factorizes into the product of their marginal
pmf ’s.
2
Definition 5. A random variable X taking values in the set {0, 1}, with proba-
bility P(X = 1) = p, is called Bernoulli random variable, or Bernoulli trial, with
probability of success p. We will write symbolically X ∼ B(p).
Remark 2. The word success, the opposite of a failure, is used here in a generic
sense, and may have nothing to do with an actual success. We may abbreviate them
as success=S, failure=F; depending on a particular interpretation of outcomes, we
may also identify them with H=head, T=tail; or with numbers 1,0; etc.
Binomial distribution
X = X1 + · · · + Xn .
Thus, X is a random variable taking values in the set {0, 1, ..., n}.
Let us find its probability mass function (pmf ), i.e., the function f (k) = P(X =
k), k = 0, 1, ..., n. Obviously, the event (X = k) means that there are exactly k
successes, in the series of n trials. These k successes may happen either at the first
k trials, or next k trial, or, in fact, at any k trials out of {0, 1, ..., n}. This gives a
disjoint partition of the event (X = k).
Note that the number of events in such partition equals the number of ways to
choose k places out of n, which is the number of different combinations of k elements
out of n (from combinatorics). As we know, it is given by the binomial coefficients
n n!
k
= k!(n−k)!
. Moreover, the probability of getting exactly k successes, at any given
k places, and failures at all remaining n − k places, equals pk (1 − p)n−k . Thus,
X
n n!
p (1 − p)
k n−k
= × pk (1 − p)n−k = pk (1 − p)n−k . (1)
all combinations of
k k!(n − k)!
k places out of n
Here, we are summing up equal numbers pk (1 − p)n−k , over all n
k
possible com-
binations of k places n1 < ... < nk , where successes happen.
Obviously, f (k) ≥ 0, k = 0, 1, ..., n. Does f (k) sum up to 1, as expected? Well, by
the binomial theorem,
X
n Xn
n k
f (k) = p (1 − p)n−k = (p + (1 − p))n = 1n = 1.
k=0 k=0
k
3
Definition 6. Any random variable X having the pmf (1) is called binomial random
variable, with n trials and probability of success p. We will write: X ∼ bin(n, p).
Note that such a random variable has the same distribution as the sum of n inde-
pendent Bernoulli trials, X1 + X2 + · · · + Xn .
Remark 2. The argument used in (1) uses one of the main axioms of probability,
the additivity axiom, by which the probability of a union of mutually exclusive events
is the sum of their probabilities.
4
Definition 7. A random variable X ∈ {0, 1, 2, ...} with the pmf given by
λk −λ
f (k) = P(X = k) = e , k = 0, 1, ... ,
k!
is called Poisson random variable with parameter λ (λ > 0). Shorthand
notation: X ∼ P(λ).
The Poisson distribution can be very useful in applications were one may assume a
large number n of independent Bernoulli trials, with a small probability of success
p. Here are some typical examples.
Geometric distribution
Suppose that independent Bernoulli trials, with a probability of success 0 < p < 1,
are repeated indefinitely:
Ω = {SSF S · · · }.
Let T denote the first moment of success, like in the following example, where
T = 4:
| F F{zS · ·}·).
ω = (F
T =4
It is known that the cardinality of the sample space Ω is continuum (see explanation
below). Thus, here T is a discrete random variable defined on the sample space Ω
of cardinality continuum! The possible values of T are {1, 2, ...}.
Let us find the pmf f (k) = P(T = k), for k ≥ 1. By independence of the trials,
5
Thus,
(k) = p(1 − p)k−1 , k = 1, 2, ... . (3)
Definition 8. We say that a random variable T , whose pmf is given by (3),
has geometric distribution, with probability of success 0 < p < 1. The
shorthand notation: T ∼ G(p).
Do the probabilities f (k) sum up to 1, as expected? We will need the following
property of the geometric series: for any real a, and any −1 < q < 1, called the
ratio,
X∞
aq n
aq k = . (4)
k=n
1−q
Mnemonically: the sum of a geometric series equals the first term of the series over
by 1 minus its ratio. In our case, by (4),
X
∞ X
∞
p p
f (k) = p(1 − p)k−1 = = = 1.
k=1 k=1
1 − (1 − p) p
Using additivity of probability, we can calculate probabilities of various other events
T ∈ A, where A is a subset of {1, 2, ...}. Here are some examples where we again
use the property (4):
X
∞ X
∞
p(1 − p) p(1 − p) 1−p
P(T is an even number) = f (2k) = p(1−p)2k−1 = = = ,
k=1 k=1
1 − (1 − p)2 p(2 − p) 2−p
X
∞ X
∞
p p 1
P(T is an odd number) = f (2k+1) = p(1−p)2k = = = .
k=0 k=0
1 − (1 − p)2 p(2 − p) 2−p
Cardinality of sets
The meaning of the words countable set and continuum set is described in the so-
called Set theory. Let us make our terminology precise, without going into too
much detail.
i) If Ω = {ω1 , ..., ωn } is a finite set, its cardinality equals the number n of elements
contained in Ω.
ii) A set Ω = {ω} is said to be countable, if it is either finite, or its elements can
be counted, or enumerated, as ω1 , ω2 , ω3 , ... . In the latter case, we can say that
there is a one-to-one correspondence between all the elements of Ω and the natural
numbers 1, 2, 3, ... . We can, for instance, associate each element ωi with the integer
i, and vice versa.
iii) If Ω = {ω1 , ω2 , ω3 ...} is an infinite countable set, its cardinality is denoted ℵ0
(aleph-nought). All other infinite sets are said to be uncountable.
iv) Of all uncountable sets, the only ones which are practically used, are the con-
tinuum sets. We say that set Ω = {ω} is a continuum, if there exists a one-to-one
correspondence between the elements ω ∈ Ω and the real numbers 0 ≤ x ≤ 1. The
cardinality of all continuum sets is denoted c.