0% found this document useful (0 votes)
87 views239 pages

Gazi o Introduction To Probability and Random Variables

Probability

Uploaded by

Strahinja Donic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views239 pages

Gazi o Introduction To Probability and Random Variables

Probability

Uploaded by

Strahinja Donic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 239

Orhan 

Gazi

Introduction
to Probability
and Random
Variables
Introduction to Probability and Random Variables
Orhan Gazi

Introduction to Probability
and Random Variables
Orhan Gazi
Electrical and Electronics Engineering Department
Ankara Medipol University
Altındağ/Ankara, Türkiye

ISBN 978-3-031-31815-3 ISBN 978-3-031-31816-0 (eBook)


https://doi.org/10.1007/978-3-031-31816-0

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The first book about probability and random variables was written in 1937. Although
probability has been known for long time in history, it has been seen that the
compilation of probability and random variables as a written material does not go
back more than a hundred years in history. In fact, most of the scientific develop-
ments in humanity history have been done in the last 100 years. It is not wrong to say
that people have sufficient intelligence only in the last century. Developments
especially in basic sciences took a long time in humanity history.
The founding in random variables and probability affected the other sciences as
well. The scientists dealing with physics subject focused on deterministic modeling
for long time. As the improvements in random variables and probability showed
itself, the modeling of physical events have been performed using probabilistic
modeling. Beforehand the physicians were modeling the flows of electrons around
an atom as deterministic circular paths, but, the developments in probability and
random variables lead physicians to think about the probabilistic models about the
movement of electrons. It is seen that the improvements in basic sciences directly
affect the other sciences as well. The modern electronic devices owe their existence
to the probability and random variable science. Without probability concept, it
would not be possible to develop information communication subjects. Modern
electronic communication devices are developed using the fundamental concepts
of probability and random variables. Shannon in 1948 published his famous paper
about information theory using probabilistic modeling and it lead to the development
of modern communication devices. The developments in probability science caused
the science of statistics to emerge. Many disciplines from medical sciences to
engineering benefit from the statistics science. Medical doctors measure the effects
of tablets extracting statistical data from patients. Engineers model some physical
phenomenon using statistical measures.
In this book, we explain fundamental concepts of probability and random vari-
ables in a clear manner. We cover basic topics of probability and random variables.
The first chapter is devoted to the explanations of experiments, sample spaces,
events, and probability laws. The first chapter can be considered as the basement

v
vi Preface

of the random variable topic. However, it is not possible to comprehend the concept
of random variables without mastering the concept of events, definition of probabil-
ity and probability axioms.
The probability topic has always been considered as a difficult subject compared
to the other mathematic subjects by the students. We believe that the reason for this
perception is the unclear and overloaded explanations of the subject. Considering
this we tried to be brief and clear while explaining the topics. The concept of joint
experiments, writing the sample spaces of joint experiments, and determining the
events from the given problem statement are important to solve the probability
problems.
In Chap. 2, using the basic concepts introduced in Chap. 1, we introduce some
classical probability subjects such as total probability theorem, independence, per-
mutation and combination, multiplication, partition rule, etc.
Chapter 3 introduces the discrete random variables. We introduce the probability
mass function of the discrete random variables using the event concept. Expected
value and variance calculation are the other topics covered in Chap. 3. Some well-
known probability mass functions are also introduced in this chapter. It is easier to
deal with discrete random variables than the continuous random variables. We
advise the reader to study the discrete random variables before continuous random
variables. Functions of random variables are explained in Chap. 4 where joint
probability mass function, cumulative distribution function, conditional probability
mass function, and conditional mean value concepts are covered as well.
Continuous random variables are covered in Chap. 5. Continuous random vari-
ables can be considered as the integral form of the discrete random variables. If the
reader comprehends the discrete random variables covered in Chap. 4, it will not be
hard to understand the subjects covered in Chap. 5. In Chap. 6, we mainly explain
the calculation of probability density, cumulative density, conditional probability
density, conditional mean value calculation, and related topics considering more
than one random variable case. Correlation and covariance topics of two random
variables are also covered in Chap. 6.
This book can be used as a text book for one semester probability and random
variables course. The book can be read by anyone interested in probability and
random variables. While writing this book, we have used the teaching experience of
many years. We tried to provide original examples while explaining the basic
concepts. We considered examples which are as simple as possible, and they provide
succinct information. We decreased the textual part of the book as much as possible.
Inclusions of long text parts decrease the concentration of the reader. Considering
this we tried to be brief as much as possible and aimed to provide the fundamental
concept to the reader in a quick and short way without being lost in details.
I dedicate this book to my lovely daughter Vera Gazi.

Altındağ/Ankara, Türkiye Orhan Gazi


Contents

1 Experiments, Sample Spaces, Events, and Probability Laws . . . . . . . 1


1.1 Fundamental Definitions: Experiment, Sample Space, Event . . . . 1
1.2 Operations on Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Probability and Probabilistic Law . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Discrete Probability Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Joint Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Properties of the Probability Function . . . . . . . . . . . . . . . . . . . . . 12
1.7 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Total Probability Theorem, Independence, Combinatorial . . . . . . . . 29
2.1 Total Probability Theorem, and Bayes’ Rule . . . . . . . . . . . . . . . . 29
2.1.1 Total Probability Theorem . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.2 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Multiplication Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Independence of Several Events . . . . . . . . . . . . . . . . . . . 38
2.4 Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Independent Trials and Binomial Probabilities . . . . . . . . . . . . . . 42
2.6 The Counting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.8 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.10 Case Study: Modeling of Binary Communication Channel . . . . . . 61
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Defining Events Using Random Variables . . . . . . . . . . . . . . . . . 69
3.3 Probability Mass Function for Discrete Random Variables . . . . . . 76
3.4 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . 80

vii
viii Contents

3.5 Expected Value (Mean Value), Variance, and Standard


Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6 Expected Value and Variance of Functions of a Random
Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7 Some Well-Known Discrete Random Variables
in Mathematic Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.7.1 Binomial Random Variable . . . . . . . . . . . . . . . . . . . . . . 100
3.7.2 Geometric Random Variable . . . . . . . . . . . . . . . . . . . . . . 100
3.7.3 Poisson Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 101
3.7.4 Bernoulli Random Variable . . . . . . . . . . . . . . . . . . . . . . 102
3.7.5 Discrete Uniform Random Variable . . . . . . . . . . . . . . . . 103
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.1 Probability Mass Function for Functions of a Discrete
Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2 Joint Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3 Conditional Probability Mass Function . . . . . . . . . . . . . . . . . . . . 118
4.4 Joint Probability Mass Function of Three or More Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.5 Functions of Two Random Variables . . . . . . . . . . . . . . . . . . . . . 122
4.6 Conditional Probability Mass Function . . . . . . . . . . . . . . . . . . . . 124
4.7 Conditional Mean Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.8 Independence of Random Variables . . . . . . . . . . . . . . . . . . . . . . 131
4.8.1 Independence of a Random Variable from an Event . . . . . 131
4.8.2 Independence of Several Random Variables . . . . . . . . . . . 136
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.1 Continuous Probability Density Function . . . . . . . . . . . . . . . . . . 141
5.2 Continuous Uniform Random Variable . . . . . . . . . . . . . . . . . . . . 143
5.3 Expectation and Variance for Continuous Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.4 Expectation and Variance for Functions of Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.5 Gaussian or Normal Random Variable . . . . . . . . . . . . . . . . . . . . 148
5.5.1 Standard Random Variable . . . . . . . . . . . . . . . . . . . . . . . 149
5.6 Exponential Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.7 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . 154
5.7.1 Properties of Cumulative Distribution Function . . . . . . . . 154
5.8 Impulse Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.9 The Unit Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Contents ix

5.10 Conditional Probability Density Function . . . . . . . . . . . . . . . . . . 162


5.11 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.12 Conditional Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6 More Than One Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.1 More Than One Continuous Random Variable for the Same
Continuous Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.2 Conditional Probability Density Function . . . . . . . . . . . . . . . . . . 187
6.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.3.1 Bayes’ Rule for Continuous Distribution . . . . . . . . . . . . . 191
6.4 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.5 Conditional Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.6 Independence of Continuous Random Variables . . . . . . . . . . . . . 199
6.7 Joint Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . 201
6.7.1 Three or More Random Variables . . . . . . . . . . . . . . . . . . 202
6.7.2 Background Information: Reminder for Double
Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.7.3 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . 206
6.7.4 Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.8 Distribution for Functions of Random Variables . . . . . . . . . . . . . 206
6.9 Probability Density Function for Function of Two Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.10 Alternative Formula for the Probability Density Function
of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.11 Probability Density Function Calculation for the Functions
of Two Random Variables Using Cumulative Distribution
Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.12 Two Functions of Two Random Variables . . . . . . . . . . . . . . . . . 221
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Chapter 1
Experiments, Sample Spaces, Events,
and Probability Laws

1.1 Fundamental Definitions: Experiment, Sample Space,


Event

In this section, we provide some definitions very widely used in probability theory.
We first consider the discrete probability experiments and give definitions of discrete
sample spaces to understand the concept of probability in an easy manner. Later, we
consider continuous experiments.
Set
A set in its most general form is a collection of objects, and these objects can be
physical objects like, pencils or chairs, or they can be nonphysical objects, like
integers, real numbers, etc.
Experiment
An experiment is a process used to measure a physical phenomenon.
Trial
A trial is a single performance of an experiment. If we perform an experiment once,
then we have a trial of the experiment.
Outcome, Simple Event, Sample Point
After the trial of an experiment, we have an outcome that can be called as a simple
event, sample point, or simple outcome.
Sample Space
A sample space is defined for an experiment, and it is a set consisting of all the
possible outcomes of an experiment.
Event
A sample space is a set, and it has subsets. A subset of a sample space is called an
event. A discrete sample space, i.e., a countable sample space, consisting of
N outcomes, or simple events, has 2N events, i.e., subsets.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_1
2 1 Experiments, Sample Spaces, Events, and Probability Laws

Example 1.1: Consider the coin toss experiment. This experiment is a discrete
experiment, i.e., we have a countable number of different outcomes for this exper-
iment. Then, we have the following items for this experiment.

Experiment: Coin Toss.


Simple Events, or Experiment Outcomes:
{H}, and {T} where H indicates “head”, and T denotes “tail.”
Sample Space: S = {H, T}
Events: Events are nothing but the subsets of the sample space. Thus, we have the
events, {H}, {T}, {H, T}, ϕ. That is, we have 4 events for this experiment.

Example 1.2: Consider a rolling-a-die experiment. We have the following identi-


ties for this experiment.
Experiment: Rolling a die.
Simple Events: {1}, {2}, {3}, {4}, {5}, {6}.
Sample Space: S = {1, 2, 3, 4, 5, 6}.
Events: Events are nothing but the subsets of the sample space. Thus, we have
26 = 64 events for this experiment.
We wrote that an event is nothing but a subset of the sample space. A subset
is also a set, and it may include more than a simple event. Let’s assume that
A is an event for an experiment including a number of simple events such
that A = {a, b, ⋯}. After a trial of the experiment, if a simple outcome x appears
such that x 2 A, then we say that the event A occurs.
Example 1.3: For the rolling-a-die experiment, the sample space is
S = {1, 2, 3, 4, 5, 6}. Let’s define two events of this experiment as A = {1, 3, 5},
B = {2, 4, 6}. Assume that we roll a die and “3” appears at the top face of the die,
since 3 2 A we say that the event A has occurred.
Example 1.4: For the rolling-a-die experiment, the sample space is
S = {1, 2, 3, 4, 5, 6}. Let A = {1, 2, x}, B = {2, 4, 7}. Are the sets A and B events
for the die experiment?
Solution 1.4: An event is a subset of a sample space of an experiment. For the given
sample space, it is obvious that

A 6 B B 6 S

then we can say that A and B are not events for the rolling-a-die experiment.
Example 1.5: For the rolling-a-die experiment, the sample space is
S = {1, 2, 3, 4, 5, 6}. Write three events for the rolling-a-die experiment.
Solution 1.5: We can write any three subsets of the sample space since events are
nothing but the subsets of the sample space. Then, we can write three arbitrary
events as
1.3 Probability and Probabilistic Law 3

A = f1, 2, 4g B = f 5g C = f1, 6g:

1.2 Operations on Events

Since events are nothing but subsets of the sample space, the operations defined on
sets are also valid on events. If A and B are two events, then we can define the
following operations on the events:

A [ B = A þ B → Union of A and B A \ B = AB → Intersection of A and B


A → Complement of A:
c

The complement of A, i.e., Ac, is calculated as

Ac = S - A:

Note: A - B = A \ Bc
Mutually Exclusive Events or Disjoint Events
Let A and B be two events. If A \ B = ϕ, then A and B are called mutually exclusive
events, or disjoint events.

1.3 Probability and Probabilistic Law

Probability is a real valued function, and it is usually denoted by P(). The inputs of
the probability function are the events of experiments, and the outputs are the real
numbers between 0 and 1. Thus, we can say that the probability function is nothing
but a mapping between events and real numbers in the range of 0–1. The use of
probability function P() is illustrated in Fig. 1.1.
Probabilistic Law
The probability function P() is not an ordinary real valued function. For a real
valued function to be used as a probability function, it should obey some axioms, and
these axioms are named probabilistic law axioms, which are outlined as follows:
Probability Axioms
Let S be the sample space of an experiment, and A and B be two events for which
the probability function P() is used such that
4 1 Experiments, Sample Spaces, Events, and Probability Laws

0
Event-1

Event-2 P
Experiment

Event-N
1

Fig. 1.1 The mapping of the events by the probability function

PðAÞ → Probability of Event A


PðBÞ → Probability of Event B:

Then, we have the following axioms


1. For every event A, i.e., the probability function is a non-negative function, i.e.,

PðAÞ ≥ 0: ð1:1Þ

2. If A \ B = ϕ, i.e., A and B are disjoint sets, then the probability of A [ B satisfies

PðA [ BÞ = PðAÞ þ PðBÞ ð1:2Þ

which is called the additivity axiom of the probability function.

3. The probability of the sample space equals 1, i.e.,

PðSÞ = 1: ð1:3Þ

This property is called the normalization axiom.

1.4 Discrete Probability Law

For a discrete experiment, assume that the sample space is


1.5 Joint Experiment 5

S = fs1 , s2 , ⋯, sN g:

Let A be the event of this discrete experiment, i.e., A ⊂ S, such that

A = fa1 , a2 , ⋯, ak g:

The probability of the event A can be calculated as

PðAÞ = Pfa1 , a2 , ⋯, ak g

where employing (1.2), since the simple events are also disjoint events, we get

PðAÞ = Pða1 Þ þ Pða2 Þ þ ⋯Pðak Þ: ð1:4Þ

If the simple events are equally probable events, i.e., P(si) = p, then according to
(1.3), we have

1
PðSÞ = 1 → Pðs1 Þ þ Pðs2 Þ þ ⋯ þ PðsN Þ = 1 → Np = 1 → p = :
N

That is, the probability of a simple event happens to be

1
Pðsi Þ = :
N

Then, in this case, the probability of the event given in (1.4) can be calculated as

1 1 1 k
PðAÞ = Pða1 Þ þ Pða2 Þ þ ⋯Pðak Þ → PðAÞ = þ þ ⋯ þ → PðAÞ =
N N N N

which can also be stated as

Number of elements in event A


PðAÞ = : ð1:5Þ
Number of elements in sample space S

Note: Equation (1.5) is valid, only if the simple events are all equally likely, i.e.,
simple events have equal probability of occurrences.

1.5 Joint Experiment

Assume that we perform two different experiments. Let experiment-1 have the
sample space S1 and experiment-2 have the sample space S2. If both experiments
are performed at the same time, we can consider both experiments as a single
6 1 Experiments, Sample Spaces, Events, and Probability Laws

experiment, which can be considered as a joint experiment. In this case, the sample
space of the joint experiments becomes equal to

S = S1 × S 2

i.e., Cartesian product of S1 and S2. Similarly, if more than two experiments with
sample spaces S1, S2, ⋯ are performed at the same time, then the sample space of the
joint experiment can be calculated as

S = S1 × S2 × ⋯

If

S 1 = f a 1 , a 2 , a 3 , ⋯ g S2 = f b 1 , b 2 , b 3 , ⋯ g S 3 = f c 1 , c 2 , c 3 , ⋯ g ⋯

then a single element of S will be in the form si = ajblcm⋯, and the probability of si
can be calculated as

Pðsi Þ = P aj Pðbl ÞPðcm Þ⋯ ð1:6Þ

That is, the probability of the simple event of the combined experiment equals the
product of the probabilities of the simple events appearing in the simple event of the
combined experiment.
Example 1.6: For the fair coin toss experiment, sample space is S = {H, T}. Simple
events are {H}, {T}. The probabilities of the simple events are

1 1
PðH Þ = PðT Þ = :
2 2

Example 1.7: We toss a coin twice. Find the sample space of this experiment.
Solution 1.7: For a single toss of the coin, the sample space is S1 = {H, T}. If we
toss the coin twice, we can consider it as a combined experiment, and the sample
space of the combined experiment can be calculated as

S = S1 × S1 → S = fH, T g × fH, T g → S = fHH, HT, TH, TT g:

Example 1.8: We toss a coin three times. Find the sample space of this experiment.
Solution 1.8: The three tosses of the coin can be considered a combined experi-
ment. For a single toss of the coin, the sample space is S1 = {H, T}. For three tosses,
the sample space can be calculated as

S = S1 × S1 × S1 → S = fHHH, HHT, HTH, THH, HTT, THT, TTH, TTT g:


1.5 Joint Experiment 7

Example 1.9: For the fair die toss experiment, sample space is S = {f1, f2, f3, f4, f5,
f6}. Simple events are {f1}, {f2}, {f3}, {f4}, {f5}, {f6}. The probabilities of the simple
events are

1
Pðf 1 Þ = Pðf 2 Þ = Pðf 3 Þ = Pðf 4 Þ = Pðf 5 Þ = Pðf 6 Þ = :
6

Example 1.10: We flip a fair coin and toss a fair die at the same time. Find the
sample space of the combined experiment, and find the probabilities of the simple
events of the combined experiment.
Solution 1.10: For the coin flip experiment, we have the sample space

S1 = fH, T g

where H denotes the head, and T denotes the tail.


For the fair die flip experiment, we have the sample space

S2 = ff 1 , f 2 , f 3 , f 4 , f 5 , f 6 g

where the integers indicate the faces of the die. For the combined experiment, the
sample space S can be calculated as

S = S1 × S2 → S = fHf 1 , Hf 2 , Hf 3 , Hf 4 , Hf 5 , Hf 6 , Tf 1 , Tf 2 , Tf 3 , Tf 4 , Tf 5 , Tf 6 g:

The simple events of the combined experiment are

fHf 1 g fHf 2 g fHf 3 g fHf 4 g fHf 5 g fHf 6 g fTf 1 g fTf 2 g fTf 3 g fTf 4 g fTf 5 g fTf 6 g:

The probabilities of the simple events of the combined experiment according to


(1.6) can be calculated as

1 1 1
PðHf 1 Þ = PðH ÞPðf 1 Þ → PðHf 1 Þ = × → PðHf 1 Þ =
2 6 12
1 1 1
PðHf 2 Þ = PðH ÞPðf 2 Þ → PðHf 2 Þ = × → PðHf 2 Þ =
2 6 12
1 1 1
PðHf 3 Þ = PðH ÞPðf 3 Þ → PðHf 3 Þ = × → PðHf 3 Þ =
2 6 12
1 1 1
PðHf 4 Þ = PðH ÞPðf 4 Þ → PðHf 4 Þ = × → PðHf 4 Þ =
2 6 12
1 1 1
PðHf 5 Þ = PðH ÞPðf 5 Þ → PðHf 5 Þ = × → PðHf 5 Þ =
2 6 12
8 1 Experiments, Sample Spaces, Events, and Probability Laws

1 1 1
PðHf 6 Þ = PðH ÞPðf 6 Þ → PðHf 6 Þ = × → PðHf 6 Þ =
2 6 12
1 1 1
PðTf 1 Þ = PðT ÞPðf 1 Þ → PðTf 1 Þ = × → PðTf 1 Þ =
2 6 12
1 1 1
PðTf 2 Þ = PðT ÞPðf 2 Þ → PðTf 2 Þ = × → PðTf 2 Þ =
2 6 12
1 1 1
PðTf 3 Þ = PðHT ÞPðf 3 Þ → PðTf 3 Þ = × → PðTf 3 Þ =
2 6 12
1 1 1
PðTf 4 Þ = PðT ÞPðf 4 Þ → PðTf 4 Þ = × → PðTf 4 Þ =
2 6 12
1 1 1
PðTf 5 Þ = PðT ÞPðf 5 Þ → PðTf 5 Þ = × → PðTf 5 Þ =
2 6 12
1 1 1
PðTf 6 Þ = PðT ÞPðf 6 Þ → PðTf 6 Þ = × → PðTf 6 Þ =
2 6 12

Example 1.11: A biased coin is flipped. The sample space is S1 = {Hb, Tb}. The
probabilities of the simple events are

2 1
P ðH b Þ = PðT b Þ = :
3 3

Assume that the biased coin is flipped twice. Consider the two flips as a single
experiment. Find the sample space of the combined experiment, and determine the
probabilities of the simple events for the combined experiment.
Solution 1.11: The sample space of the combined experiment can be found using
S = S1 × S1 as

S = fH b H b , H b T b , T b H b , T b T b g:

The simple events for the combined experiment are

fH b H b g fH b T b g fT b H b g fT b T b g:

The probabilities of the simple events of the combined experiment are calculated
as

2 2 4
PðH b H b Þ = PðH b ÞPðH b Þ → PðH b H b Þ = × → P ðH b H b Þ =
3 3 9
2 1 2
PðH b T b Þ = PðH b ÞPðT b Þ → PðH b T b Þ = × → PðH b T b Þ =
3 3 9
1.5 Joint Experiment 9

1 2 2
PðT b H b Þ = PðT b ÞPðH b Þ → PðT b H b Þ = × → PðT b H b Þ =
3 3 9
1 1 1
PðT b T b Þ = PðT b ÞPðT b Þ → PðT b T b Þ = × → PðT b T b Þ = :
3 3 9

Example 1.12: We have a three-faced biased die and a biased coin. For the three-
faced biased die, the sample space is S1 = {f1, f2, f3}, and the probabilities of the
simple events are

1 1 2
Pðf 1 Þ = Pðf 2 Þ = Pðf 3 Þ = :
6 6 3

For the biased coin, the sample space is S2 = {Hb, Tb}, and the probabilities of the
simple events are

1 2
P ðH b Þ = PðT b Þ = :
3 3

We flip the coin and toss the die at the same time. Find the sample space of the
combined experiment, and calculate the probabilities of the simple events.
Solution 1.12: For the combined experiment, the sample space can be calculated
using

S = S1 × S2

as

S = ff 1 , f 2 , f 3 g × f H b , T b g → S = f f 1 H b , f 1 T b , f 2 H b , f 2 T b , f 3 H b , f 3 T b g :

The probabilities of the simple events of the combined experiment can be


computed as

1 1 1
Pðf 1 H b Þ = Pðf 1 ÞPðH b Þ → Pðf 1 H b Þ = × → Pðf 1 H b Þ =
6 3 18
1 2 2
Pðf 1 T b Þ = Pðf 1 ÞPðT b Þ → Pðf 1 T b Þ = × → Pðf 1 T b Þ =
6 3 18
1 1 1
Pðf 2 H b Þ = Pðf 2 ÞPðH b Þ → Pðf 2 H b Þ = × → Pðf 2 H b Þ =
6 3 18
1 2 2
Pðf 2 T b Þ = Pðf 2 ÞPðT b Þ → Pðf 2 T b Þ = × → Pðf 2 T b Þ =
6 3 18
2 1 2
Pðf 3 H b Þ = Pðf 3 ÞPðH b Þ → Pðf 3 H b Þ = × → Pðf 3 H b Þ =
3 3 9
10 1 Experiments, Sample Spaces, Events, and Probability Laws

2 2 4
Pðf 3 T b Þ = Pðf 3 ÞPðT b Þ → Pðf 3 T b Þ = × → Pðf 3 T b Þ = :
3 3 9

Example 1.13: We toss a coin three times. Find the probabilities of the following
events.
(a) A = {ρi 2 S | ρi includes at least two heads}.
(b) B = {ρi 2 S | ρi includes at least one tail}.
Solution 1.13: For three tosses, the sample space can be calculated as

S = fHHH, HHT, HTH, THH, HTT, THT, TTH, TTT g:

The events A and B can be written explicitly as

A = fHHH, HHT, HTH, THH g


B = fHHT, HTH, THH, HTT, THT, HTT, TTT g:

The probability of the event A can be computed as

1 1 1 1
PðAÞ = PðHHH Þ þ PðHHT Þ þ PðHTH Þ þ PðTHH Þ → PðAÞ = þ þ þ
8 8 8 8
4
→ PðAÞ = :
8

In a similar manner, the probability of the event B can be found as

7
PðBÞ = :
8

Example 1.14: For a biased coi, the sample space is S1 = {Hb, Tb}. The probabil-
ities of the simple events for the biased coin flip experiment are

2 1
P ðH b Þ = PðT b Þ = :
3 3

Assume that a biased coin and a fair coin are flipped together. Consider the two
flips as a single experiment. Find the sample space of the combined experiment.
Determine the probabilities of the simple events for the combined experiment, and
determine the probabilities of the following events.
(a) A = {Biased head appears in the simple event.}
(b) B = {At least two heads appear.}
1.5 Joint Experiment 11

Solution 1.14: For the fair coin flip experiment, the sample space is

S2 = fH, T g

and the probabilities of the simple events are

1 1
PðH Þ = PðT Þ = :
2 2

For the flip of biased and fair coin experiment, the sample space can be calculated
as

S = S1 × S2 → S = fH b H, H b T, T b H, T b T g:

The probabilities of the simple events for the combined experiment are calculated
as

2 1 1
PðH b H Þ = PðH b ÞPðH Þ → PðH b H Þ = × → PðH b H Þ =
3 2 3
2 1 1
PðH b T Þ = PðH b ÞPðT Þ → PðH b T Þ = × → PðH b T Þ =
3 2 3
1 1 1
PðT b H Þ = PðT b ÞPðH Þ → PðT b H Þ = × → PðT b H Þ =
3 2 6
1 1 1
PðT b T Þ = PðT b ÞPðT Þ → PðT b T Þ = × → PðT b T Þ = :
3 2 6

The events A and B can be explicitly written as

A = fH b H, H b T g B = fH b H, H b T, T b H g

whose probabilities can be calculated as

1 1 2
PðAÞ = PðH b H Þ þ PðH b T Þ → PðAÞ = þ → PðAÞ =
3 3 3
1 1 1 5
PðBÞ = PðH b H Þ þ PðH b T Þ þ PðT b H Þ → PðBÞ = þ þ → PðBÞ = :
3 3 6 6

Exercises:
1. For a biased coin, the sample space is S1 = {Hb, Tb}. The probabilities of the
simple events for the biased coin toss experiment are
12 1 Experiments, Sample Spaces, Events, and Probability Laws

2 1
P ðH b Þ = PðT b Þ = :
3 3

Assume that a biased coin is flipped and a fair die is tossed together. Consider
the combined experiment, and find the sample space of the combined experiment.
Determine the probabilities of the simple events for the combined experiment,
and determine the probabilities of the events:
(a) A = {Biased head and odd numbers appear in the simple event.}
(b) B = {Biased tail and a number divisible by 3 appear in the simple event.}
2. For a biased coin, the sample space is S1 = {Hb, Tb}. The probabilities of the
simple events for the biased coin toss experiment are

2 1
P ðH b Þ = PðT b Þ = :
3 3

Assume that a biased coin is tossed three times. Find the sample space, and
find the probabilities of the simple events. Calculate the probability of the events
A = {At least two heads appear in the simple event.}
B = {At most two tails appear in the simple event.}

1.6 Properties of the Probability Function

Let A, B, and C be the events for an experiment, and P() be the probability function
defined on the events of the experiment. We have the following properties for the
probability function P().
(a) If A ⊂ B, then P(A) ≤ P(B)
(b) P(A [ B) = P(A) + P(B) - P(A \ B)
(c) P(A [ B) ≤ P(A) + P(B)
(d) P(A [ B [ C) = P(A) + P(Ac \ B) + P(Ac \ Bc \ C)
We will prove some of these properties in examples.
Example 1.15: Prove the property P(A [ B) = P(A) + P(B) - P(A \ B).
Proof 1.15: We should keep in our mind that events are nothing but subsets. Then,
any operation that can be performed on sets is also valid on events.
Let S be the sample space. The event A [ B can be written as

A [ B = S \ ðA [ BÞ

in which using S = A [ Ac, we get


1.6 Properties of the Probability Function 13

A [ B = ðA [ Ac Þ \ ðA [ BÞ

which can be written as

A [ B = A [ ðAc \ BÞ ð1:7Þ

where the events A and Ac \ B are disjoint events, i.e., A \ (Ac \ B) = ϕ. According
to probability axiom-2, the probability of the event A [ B in (1.7) can be written as

PðA [ BÞ = PðAÞ þ PðAc \ BÞ: ð1:8Þ

The event B can be written as

B=S \ B

in which using S = A [ Ac, we obtain

B = ðA [ Ac Þ \ B

which can be written as

B = ðA \ BÞ [ ðAc \ BÞ ð1:9Þ

where A \ B and Ac \ B are disjoint events, i.e., (A \ B) \ (Ac \ B ) = ϕ. According


to probability axiom-2 in (1.2), the probability of the event B in (1.9) can be written
as

PðBÞ = PðA \ BÞ þ PðAc \ BÞ ð1:10Þ

from which, we get

PðAc \ BÞ = PðBÞ - PðA \ BÞ: ð1:11Þ

Substituting (1.11) into (1.8), we obtain

PðA [ BÞ = PðAÞ þ PðBÞ - PðA \ BÞ: ð1:12Þ

Note: If A and B are disjoint, i.e., mutually exclusive events, then we have

PðA [ BÞ = PðAÞ þ PðBÞ:

This is due to A \ B = ϕ → P(A \ B) = 0.


14 1 Experiments, Sample Spaces, Events, and Probability Laws

Fig. 1.2 Venn diagram c


illustration of the events AI B AI B
A B A B

AI BIC

Since events of an experiment are nothing but subsets of the sample space of the
experiment, it may sometimes be easier to manipulate the events using Venn
diagrams.
Venn Diagram Illustration of Events
In Fig. 1.2, Venn diagram illustrations of the events are depicted. As can be seen
from Fig. 1.2, we can take the intersection and union of the events.
Example 1.16: Show that

PðA [ BÞ ≤ PðAÞ þ PðBÞ:

Proof 1.16: We showed that

PðA [ BÞ = PðAÞ þ PðBÞ - PðA \ BÞ: ð1:13Þ

According to probability axiom-1 in (1.1), the probability function is a


non-negative function, and we have

PðA \ BÞ ≥ 0: ð1:14Þ

If we omit P(A \ B) from the right-hand side of (1.13), we can write

PðA [ BÞ ≤ PðAÞ þ PðBÞ:

Example 1.17: A is an event of an experiment, and Ac is the complement of A.


Show that

PðAc Þ = 1 - PðAÞ:

Proof 1.17: We know that

A [ Ac = S ð1:15Þ

where S is the sample space, and A \ Ac = ϕ.


1.6 Properties of the Probability Function 15

According to probability law axioms 3 and 2 in (1.3) and (1.2), we have

PðSÞ = 1

and

PðA [ Ac Þ = PðAÞ þ PðAc Þ:

Then, from (1.15), we can write

PðAÞ þ PðAc Þ = 1

which leads to

PðAc Þ = 1 - PðAÞ:

Theorem 1.1: If the events A1, A2, ⋯, Am are mutually exclusive events, then we
have

PðA1 [ A2 [ ⋯ [ Am Þ = PðA1 Þ þ PðA2 Þ þ ⋯ þ PðAm Þ:

Example 1.18: For a biased die, the probabilities of the simple events are given as

1 1 1 1 2 1
Pðf 1 Þ = P ðf 2 Þ = P ðf 3 Þ = Pðf 4 Þ = Pðf 5 Þ = Pðf 6 Þ =
12 12 6 6 6 6

The events A and B are defined as

A = fEven numbers appearg B = fNumbers that are powers of 2 appearg:

Find, P(A), P(B), P(A [ B), P(A \ B).


Solution 1.18: The events A, B, A [ B, and A \ B can be written as

A = ff 2 , f 4 , f 6 g B = ff 1 , f 2 , f 4 g A [ B = ff 1 , f 2 , f 4 , f 6 g A \ B = ff 2 , f 4 g:

The probabilities of the events A, B, A [ B, and A \ B can be computed as

1 1 1 5
PðAÞ = Pðf 2 Þ þ Pðf 4 Þ þ Pðf 6 Þ → PðAÞ = þ þ → PðAÞ =
12 6 6 12
1 1 1 4
PðBÞ = Pðf 1 Þ þ Pðf 2 Þ þ Pðf 4 Þ → PðBÞ = þ þ → PðBÞ =
12 12 6 12
16 1 Experiments, Sample Spaces, Events, and Probability Laws

PðA [ BÞ = Pðf 1 Þ þ Pðf 2 Þ þ Pðf 4 Þ þ Pðf 6 Þ →


1 1 1 1 6
PðA [ BÞ = þ þ þ → PðA [ BÞ =
12 12 6 6 12
1 1 3
PðA \ BÞ = Pðf 2 Þ þ Pðf 4 Þ → PðA \ BÞ = þ → PðA \ BÞ =
12 6 12

The probability of A [ B can also be calculated as

PðA [ BÞ = PðAÞ þ PðBÞ - PðA \ BÞ → PðA [ BÞ


5 4 3 6
= þ - → PðA [ BÞ = :
12 12 12 12

Example 1.19: A and B are two events of an experiment. Show that

if A ⊂ B then PðAÞ ≤ PðBÞ:

Proof 1.19: If A ⊂ B, then we have

B=A [ B

which can be written as

B = ðA [ B Þ \ S

in which substituting A [ Ac for sample space, we get

B = ðA [ BÞ \ ðA [ Ac Þ

which can be expressed as

B = A [ ðAc \ BÞ ð1:16Þ

where the events A and Ac \ B are disjoint events, i.e., A \ (Ac \ B ) = ϕ. Using
probability law axiom-2 in (1.2) and equation (1.16), we have

PðBÞ = PðAÞ þ PðAc \ BÞ ð1:17Þ

Since probability is a non-negative quantity, (1.17) implies that

PðAÞ ≤ PðBÞ:
1.7 Conditional Probability 17

1.7 Conditional Probability

Assume that we perform and experiment, and we get an outcome of the experiment.
Let the outcome of the experiment belong to an event B. And consider the question:
What is the probability that the outcome of the experiment also belongs to another
event A? To calculate this probability, we should first determine the sample space,
then identify the event and calculate the probability of the event. Assume that the
experiment is a fair one, as

Number of elements in Event


PðEventÞ = → PðEventÞ
Number of elements in Sample Space
N ðeventÞ
= :
N ðSample SpaceÞ

Let’s show the conditional event E which implies that

fThe outcome of the experiment belongs to A given that it also belongs to Bg

where the condition “given that it also belongs to B” implies that the sample space
equals B, i.e.,

S0 = B:

Then, the probability of the event E can be calculated using

N ðE Þ N ðA \ B Þ
PðE Þ = 0 → PðE Þ = ð1:18Þ
N ðS Þ N ðB Þ

which can be written as

N ðA \ BÞ=N ðSÞ
P ðE Þ =
N ðBÞ=N ðSÞ

leading to

PðA \ BÞ
PðEÞ = :
PðBÞ

If we show this special event E by a special notation AjB, then the conditional
event probability can be written as
18 1 Experiments, Sample Spaces, Events, and Probability Laws

PðA \ BÞ
PðAjBÞ = ð1:19Þ
PðBÞ

which can be called as conditional probability in short instead of the conditional


event probability. In fact, we will use the term “conditional probability” for (1.19)
throughout the book.
From the conditional probability expression in (1.19), we can have the following
identities:

PðA \ BÞ = PðAjBÞPðBÞ PðA \ BÞ = PðBjAÞPðAÞ: ð1:20Þ

Properties
1. If A1 and A2 are disjoint events, then we have

PðA1 [ A2 jBÞ = PðA1 jBÞ þ PðA2 jBÞ

2. If A1 and A2 are not disjoint events, then we have

PðA1 [ A2 jBÞ ≤ PðA1 jBÞ þ PðA2 jBÞ

Let’s now see the proof of these properties.


Proof 1: The conditional probability P(A1 [ A2| B) can be written as

PððA1 [ A2 Þ \ BÞ
PðA1 [ A2 jBÞ =
PðBÞ

where using (A1 [ A2) \ B = (A1 \ B) [ (A2 \ B), we obtain

PððA1 \ BÞ [ ðA2 \ BÞÞ


PðA1 [ A2 jBÞ = : ð1:21Þ
PðBÞ

Since the events A1 and A2 are disjoint, then we have

A1 \ A2 = ϕ

which also implies that

ðA1 \ BÞ \ ðA2 \ BÞ = ϕ

leading to
1.7 Conditional Probability 19

PððA1 \ BÞ [ ðA2 \ BÞÞ = PðA1 \ BÞ þ PðA2 \ BÞ: ð1:22Þ

Substituting (1.22) into (1.21), we get

PðA1 \ BÞ þ PðA2 \ BÞ
PðA1 [ A2 jBÞ =
PðBÞ

leading to

PðA1 \ BÞ PðA2 \ BÞ
PðA1 [ A2 jBÞ = þ
PðBÞ PðBÞ

which can be written as

PðA1 [ A2 jBÞ = PðA1 jBÞ þ PðA2 jBÞ:

Proof 2: In (1.21), we got

PððA1 \ BÞ [ ðA2 \ BÞÞ


PðA1 [ A2 jBÞ = ð1:23Þ
PðBÞ

in which employing the property

PðA [ BÞ ≤ PðAÞ þ PðBÞ

for the numerator of (1.23), we get

PðA1 \ BÞ þ PðA2 \ BÞ
PðA1 [ A2 jBÞ ≤
PðBÞ

leading to

PðA1 \ BÞ PðA2 \ BÞ
PðA1 [ A2 jBÞ ≤ þ
PðBÞ PðBÞ

which can be written as

PðA1 [ A2 jBÞ ≤ PðA1 jBÞ þ PðA2 jBÞ:

Example 1.20: There are two students A and B having an exam. The following
information is available about the students.
20 1 Experiments, Sample Spaces, Events, and Probability Laws

(a) The probability that student A can be successful in the exam is 5/8.
(b) The probability that student B can be successful in the exam is 1/2.
(c) The probability that at least one student can be successful is 3/4.
After the exam, it was announced that only one student was successful in the
exam. What is the probability that student A was successful in the exam?
Solution 1.20: For each student, having an exam can be considered as an experi-
ment. The sample spaces of individual experiments are

SA = As , Af SB = B s , B f

where As, Af are the success and fail outputs for student A, and Bs, Bf are the success
and fail outputs for student B. If we consider both students having an exam together,
i.e., joint experiment, the sample space in this case can be formed as

S = SA × SB → S = As Bs , As Bf , Af Bs , Af Bf

Let’s define the events

E A = fStudent A is successfulg → EA = As Bs , As Bf
E B = fStudent B is successfulg → E B = As Bs , Af Bs
E1 = fAt least one student is successfulg → E1 = As Bs , As Bf , Af Bs

From the given information in the question, we can write the following equations:

5 5
PðE A Þ = → PðAs Bs Þ þ P As Bf =
8 8
1 4
PðE B Þ = → PðAs Bs Þ þ P Af Bs =
2 8
3 6
PðE1 Þ = → PðAs Bs Þ þ P As Bf þ P Af Bs =
4 8

which can be solved for P(AsBs), P(AsBf), P(AfBs) as

3 2 1
PðAs Bs Þ = P As Bf = P Af Bs = :
8 8 8

Now, let’s define the event

Eo = fOnly one student is successful in the examg → E o = As Bf , Af Bs :


1.7 Conditional Probability 21

In our question, P(EA| Eo) is asked. We can calculate P(EA| Eo) as

PðE A \ Eo Þ
PðEA jE o Þ =
PðE o Þ

where P(Eo) and P(EA \ Eo) can be calculated as

2 1 3
PðEo Þ = P As Bf þ P Af Bs → PðE o Þ = þ → PðE o Þ =
8 8 8
2
PðEA \ E o Þ = P As Bf → PðEA \ E o Þ = :
8

Then, P(EA| Eo) is evaluated as

2
2
PðEA jE o Þ = 8
3
→ PðEA jE o Þ = :
8
3

Example 1.21: A fair coin is tossed three times. The events A and B are defined as

A = fThe first two tosses are different from each otherg


B = fSecond toss is a tailg

Find, P(A), P(B), P(A| B), and P(B| A).


Solution 1.21: For a single toss of the coin, the sample space is S1 = {H, T}. For
three tosses, the sample space is found using S = S1 × S1 × S1 as

S = fHHH, HHT, HTH, HTT, THH, THT, TTH, TTT g:

Then, the events A and B described in the question can be written as

A = fHTH, HTT, THH, THT g


B = fHTH, HTT, TTH, TTT g:

Since the coin is a fair one and simple events have the same probability, the
probabilities of the events A and B can be calculated using

N ðA Þ N ðBÞ
PðAÞ = PðBÞ =
N ðSÞ N ð SÞ

where N(A), N(B), and N(S) indicate the number of elements in the events, A, B, and
S, respectively. Then, P(A) and P(B) are found as
22 1 Experiments, Sample Spaces, Events, and Probability Laws

N ðAÞ 4 N ðBÞ 4
PðAÞ = → PðAÞ = PðBÞ = → PðBÞ = :
N ð SÞ 8 N ð SÞ 8

The conditional probability P(A| B) can be calculated using

PðA \ BÞ
PðAjBÞ = ð1:24Þ
PðBÞ

where evaluating P(A \ B) as

N ðA \ BÞ 2
PðA \ BÞ = → PðA \ BÞ = ð1:25Þ
N ð SÞ 8

and employing (1.25) in (1.24), we get

PðA \ BÞ 2
2
PðAjBÞ = → PðAjBÞ = 8
→ PðAjBÞ =
PðBÞ 4
8
4

Example 1.22: Consider a metal detector security system in an airport. The prob-
ability of the security system giving an alarm in the absence of a metal is 0.02, the
probability of the security system giving an alarm in the presence of a metal is 0.95,
and the probability of the security system not giving an alarm in the presence of a
metal is 0.03. The probability of availability of metal is 0.02.
Express the miss detection event mathematically, and calculate the probability of
miss detection.
Express the missed detection event mathematically, and calculate the probability
of missed detection.
Solution 1.22: Considering the given information in the question, we can define the
events and their probabilities as

A = fMetal existsg Ac = fMetal does not existg


B = fAlarmg C = fMiss Detectiong D = fMissed Detectiong

(a) The miss detection event can be written as

C = Ac \ B

whose probability can be calculated as


1.7 Conditional Probability 23

PðCÞ = PðAc \ BÞ → PðC Þ = PðBjAc ÞPðAc Þ → PðC Þ = 0:0196:


0:02 0:98

(b) The missed detection event can be written as

D = A \ Bc

whose probability can be calculated as

PðDÞ = PðA \ Bc Þ → PðDÞ = PðBc jAÞ PðAÞ → PðDÞ = 0:0006:


0:03 0:02

Example 1.23: A box contains three white and two black balls. We pick a ball from
this box. Find the sample space of this experiment and write the events for this
sample space.
Solution 1.23: The sample space can be written as

S = fw1 , w2 , w3 , b1 , b2 g:

The events are subsets of S, and there are in total 25 = 32 events. These events are

fg, fw1 g, fw2 g, fw3 g, fb1 g, fb2 g


fw1 w2 g, fw1 w3 g, fw1 b1 g, fw1 b2 g, fw2 w3 g, fw2 b1 g, fw2 b2 g, fw3 b1 g, fw3 b2 g, fb1 b2 g
fw1 , w2 w3 g, fw1 , w2 b1 g, fw1 , w2 b2 g, fw2 , w3 b1 g, fw2 , w3 b2 g, fw1 , w3 b1 g,
fw1 , w3 b2 g, fw1 , b1 b2 gfw2 , b1 , b2 g, fw3 , b1 , b2 g
fw1 , w2 , w3 , b1 g, fw1 , w2 , w3 , b2 g, fw2 , w3 , b1 , b2 g, fw1 , w3 , b1 , b2 g, fw1 , w3 , b1 , b2 g
fw1 , w2 , w3 , b1 , b2 g

Example 1.24: A box contains two white and two black balls. We pick two balls
from this box without replacement. Find the sample space of this experiment.
Solution 1.24: We perform two experiments consecutively. The sample space of
the first experiment can be written as
24 1 Experiments, Sample Spaces, Events, and Probability Laws

S1 = fw1 , w2 , b1 , b2 g:

The sample space of the second experiment depends on the outcome of the first
experiment.
If the outcome of the first experiment is w1, the sample space of the second
experiment is

S21 = fw2 , b1 , b2 g:

If the outcome of the first experiment is w2, the sample space of the second
experiment is

S22 = fw1 , b1 , b2 g:

If the outcome of the first experiment is b1, the sample space of the second
experiment is

S23 = fw1 , w2 , b2 g:

If the outcome of the first experiment is b2, the sample space of the second
experiment is

S24 = fw1 , w2 , b1 g:

If the outcome of the first experiment is w1, the sample space of combined
experiment is

S = S1 × S21 :

If the outcome of the first experiment is w2, the sample space of the second
experiment is

S = S1 × S22 :

If the outcome of the first experiment is b1, the sample space of the second
experiment is

S = S1 × S23 :

If the outcome of the first experiment is b2, the sample space of the second
experiment is

S = S1 × S24 :
1.7 Conditional Probability 25

Continuous Experiment
For continuous experiments, sample space includes an uncountable number of
simple events. For this reason, for continuous experiments, the sample space is
usually expressed either as an interval if one-dimensional representation is sufficient,
or it is expressed as an area in two-dimensional plane.
Let’s illustrate the concept with an example.
Example: A telephone call may occur at a time t which is a random point in the
interval [8 18].

(a) Find the probabilities of the following events:

A = fA call occurs between 10 and 16g


B = fA call occurs between 8 and 16g:

(b) Calculate P(B| A).

Solution: The sample space of the experiment is the interval [8 18], i.e.,

S = ½8 18:

The events A and B are subsets of S, and they are nothing but the intervals

A = ½10 16 B = ½8 16:

(a) The probabilities of the events can be calculated as

LengthðAÞ 16 - 10 6
PðAÞ = → PðAÞ = → PðAÞ =
LengthðSÞ 18 - 8 10
LengthðBÞ 16 - 8 8
PðBÞ = → PðBÞ = → PðAÞ =
LengthðSÞ 18 - 8 10

(b) P(B| A) can be calculated as

PðB \ AÞ
PðBjAÞ = →
PðAÞ
Pð½8 16 \ ½10 16Þ Pð½8 10Þ 2
PðBjAÞ = → PðBjAÞ = → PðBjAÞ = :
Pð½10 16Þ Pð½10 16Þ 6
26 1 Experiments, Sample Spaces, Events, and Probability Laws

Problems

1. State the three probability axioms.


2. What is the probability function? Is it an ordinary real valued function?
3. What do mutually exclusive events mean?
4. The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 g:

Find three mutually exclusive events E1, E2, E3 such that S = E1 [ E2 [ E3.
Find the probability of each mutually exclusive event.
5. The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g:

The event E is defined as

E = fs1 , s3 , s5 , s6 , s8 g:

Write the event E as the union of two mutually exclusive events E1 and E2, i.e.,

E = E1 [ E2

6. The sample space of an experiment is given as

S = fs1 , s2 , s3 g

where the probabilities of the simple events are provided as

1 2 1
Pðs1 Þ = Pðs2 Þ = Pðs3 Þ = :
4 4 4

Write all the events for this sample space, and calculate the probability of each
event.
7. The sample space of an experiment is given as

S = fs1 , s2 , s3 g

where the probabilities of the simple events are provided as

1 1 1
Pðs1 Þ = Pðs2 Þ = Pðs3 Þ = :
3 6 2

We perform the experiment twice. Consider the two performances of the same
experiment as a single experiment, i.e., combined experiment. Find the simple
Problems 27

events of the combined experiment, and calculate the probability of each simple
event of the combined experiment.
8. The sample spaces of two experiments are given as

S1 = fa, b, cg S2 = fd, eg

where the probabilities of the simple events are provided as

1 1 1
PðaÞ = PðbÞ = P ð cÞ =
3 6 2
3 1
Pðd Þ = PðeÞ = :
4 4

We perform the first experiment once and the second experiment twice. Consider
the three trials of the experiment as a single experiment, i.e., combined experiment.
Find the simple events of the combined experiment, and calculate the probability of
each simple event of the combined experiment.
Chapter 2
Total Probability Theorem, Independence,
Combinatorial

2.1 Total Probability Theorem, and Bayes’ Rule

Definition
Partition: Let A1, A2, ⋯, AN be the events of a sample space such that Ai \ Aj = ϕ i,
j 2 {1, 2, ⋯, N} and S = A1 [ A2⋯AN. We say that the events A1, A2, ⋯, AN form a
partition of S.
The partition of a sample space is graphically illustrated in Fig. 2.1.

2.1.1 Total Probability Theorem

Let A1, A2, ⋯, AN be the disjoint events that form a partition of a sample space S, and
B is any event. Then, the probability of the event B can be written as

PðBÞ = PðA1 ÞPðBjA1 Þ þ PðA2 ÞPðBjA2 Þ þ ⋯ þ PðAN ÞPðBjAN Þ: ð2:1Þ

The partition theorem is illustrated in Fig. 2.2.


Proof: If A1, A2, ⋯, AN are disjoint events that form a partition of a sample space S,
then we have

S = A1 [ A2 ⋯ [ AN :

For any event B, we can write

B=B \ S

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 29


O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_2
30 2 Total Probability Theorem, Independence, Combinatorial

Fig. 2.1 The partition of a S


sample space

A1 B A2

A3

Fig. 2.2 Illustration of total S


probability theorem A1 A2
A2 I B
B

A3 I B
A1 I B A3

in which substituting S = A1 [ A2⋯ [ AN, we get

B = B \ ðA1 [ A2 ⋯ [ AN Þ

where distributing \ over [, we obtain

B = ðB \ A1 Þ [ ðB \ A2 Þ⋯ \ ðB \ AN Þ: ð2:2Þ

In (2.2), the events (B \ Ai) and (B \ Aj) i, j 2 {1, 2, ⋯, N}, i ≠ j are disjoint
events. Then, according to probability law axiom-2 in (1.2), P(B) can be written as

PðBÞ = PðB \ A1 Þ þ PðB \ A2 Þ þ ⋯ þ PðB \ AN Þ

in which employing the property P(B \ Ai) = P(Ai)P(B| Ai), we get

PðBÞ = PðA1 ÞPðBjA1 Þ þ PðA2 ÞPðBjA2 Þ þ ⋯ þ PðAN ÞPðBjAN Þ ð2:3Þ

which is the total probability equation.


Example 2.1: In a chess tournament, there are 100 players. Of these 100 players,
20 of them are at an advanced level, 30 of them are at an intermediate level, and 50 of
them are at a beginner level. You randomly choose an opponent and play a game.
(a) What is the probability that you will play against an advanced player?
(b) What is the probability that you will play against an intermediate player?
(c) What is the probability that you will play against a beginner player?
2.1 Total Probability Theorem, and Bayes’ Rule 31

Fig. 2.3 Partition of S


the sample space
for Example 2.1 W
A

Solution 2.1: The experiment here can be considered as playing a chess game
against an opponent. The sample space is

S = f100 playersg

and the events are

A = f20 advanced playersg B = f30 intermediate playersg


C = f50 beginner playersg:

The probabilities P(A), P(B), and P(C) can be calculated as

N ðAÞ 20
PðAÞ = → PðAÞ =
N ð SÞ 100
N ðBÞ 30
PðBÞ = → PðBÞ =
N ð SÞ 100
N ðC Þ 50
PðC Þ = → PðCÞ = :
N ð SÞ 100

The sample space and its partition are depicted in Fig. 2.3.
Example 2.2: In a chess tournament, there are 100 players. Of these 100 players,
20 of them are at an advanced level, 30 of them are at an intermediate level, and 50 of
them are at a beginner level.
Your probability of winning against an advanced player is 0.2, and it is 0.5
against an intermediate player, and it is 0.7 against a beginner player.
You randomly choose an opponent and play a game. What is the probability of
winning?
Solution 2.2: The sample space is S = {100 players}, and the events are

A = f20 advanced playersg B = f30 intermediate playersg


C = f50 beginner playersg W = fThe number of players you can beatg:
32 2 Total Probability Theorem, Independence, Combinatorial

It is clear that

S=A [ B [ C

and the events A, B, and C are disjoint events.


In the previous example, the probabilities P(A), P(B), and P(C) are calculated as

20 30 50
PðAÞ = PðBÞ = PðC Þ = :
100 100 100

And in this example, the following information is given

PðWjAÞ = 0:2 PðWjBÞ = 0:5 PðWjC Þ = 0:7:

Using total probability law

PðW Þ = PðAÞPðWjAÞ þ PðBÞPðWjBÞ þ PðCÞPðWjC Þ

the probability of winning against a randomly chosen opponent can be calculated as

PðW Þ = 0:2 × 0:2 þ 0:3 × 0:5 þ 0:5 × 0:7 → PðW Þ = 0:54:

Exercise: There is a box, and inside the box there are 100 question cards. Of these
100 mathematic questions, 10 of them are difficult, 50 of them are normal, and 40 of
them are easy. Your probability of solving a difficult question is 0.2, it is 0.4 for
normal questions, and it is 0.6 for easy questions. You randomly choose a card; what
is the probability that you can solve the question on the card?

2.1.2 Bayes’ Rule

Let A1, A2, ⋯, AN be disjoint events that form a partition of a sample space S.
The conditional probability P(Ai| B) can be calculated using

PðAi \ BÞ
PðAi jBÞ =
PðBÞ

which can be written as

PðAi ÞPðBjAi Þ
PðAi jBÞ =
PðBÞ

in which using the total probability theorem for P(B), we obtain


2.1 Total Probability Theorem, and Bayes’ Rule 33

PðAi ÞPðBjAi Þ
PðAi jBÞ = ð2:4Þ
PðA1 ÞPðBjA1 Þ þ PðA2 ÞPðBjA2 Þ þ ⋯ þ PðAN ÞPðBjAN Þ

which is called Bayes’ rule.


Example 2.3: In a chess tournament, there are 100 players. Of these 100 players,
20 of them are at an advanced level, 30 of them are at an intermediate level, and 50 of
them are at a beginner level.
Your probability of winning against an advanced player is 0.2, and it is 0.5
against an intermediate player, and it is 0.7 against a beginner player.
You randomly choose an opponent and play a game and you win. What is the
probability that you won against an advanced player?
Solution 2.3: The sample space is S = {100 players}, and the events are

A = f20 advanced playersg B = f30 intermediate playersg


C = f50 beginner playersg W = fThe number of players you can beatg:

In the question, we are required to find P(A| W ), which can be calculated using

PðAÞPðWjAÞ
PðAjW Þ =
PðW Þ

in which using

PðW Þ = PðAÞPðWjAÞ þ PðBÞPðWjBÞ þ PðCÞPðWjC Þ

with

20 30 50
PðAÞ = PðBÞ = PðC Þ = :
100 100 100

PðWjAÞ = 0:2 PðWjBÞ = 0:5 PðWjC Þ = 0:7:

we obtain

0:2 × 0:2
PðAjW Þ = → PðAjW Þ = 0:074
0:54

which shows that it is very rare to win against an advanced player.


34 2 Total Probability Theorem, Independence, Combinatorial

Exercise
1. An electronic device is produced by three factories: F1, F2, and F3. The factories
F1, F2, and F3 have market sizes of 30%, 30%, and 40%, respectively, and the
probabilities of F1, F2, and F3 for producing a defective device are 0.02, 0.04,
and 0.01. Assume that you purchased the electronic device produced by these
factories, and you found that the device is defective. What is the probability that
the defective device is produced by the second factory, i.e., by F2?
Example 2.4: A box contains two regular coins and one two-headed coin, i.e.,
biased coin. You pick a coin and flip it, and a head shows up. What is the probability
that the chosen coin is the two-headed coin?
Solution 2.4: The experiment for this example can be considered as choosing a coin
and flipping it. Since the box contains two fair and one two-headed coins, we can
write the sample space as

S = fH 1 , T 1 , H 2 , T 2 , H b1 , H b2 g

where H1, T1, H2, T2 corresponds to the fair coins, and Hb, Hb corresponds to the
two-headed coin. Let’s define the events

A = fheads show upg → A = fH 1 , H 2 , H b1 , H b2 g


B = fbiased heads show upg → B = fH b1 , H b2 g

In our example, the conditional probability

PðBjAÞ

is asked. We can calculate

PðBjAÞ

as

PðB \ AÞ PðfH b1 , H b2 g \ fH 1 , H 2 , H b1 , H b2 gÞ
PðBjAÞ = → PðBjAÞ = → PðBjAÞ
PðAÞ PðfH 1 , H 2 , H b1 , H b2 gÞ
2
PðfH b1 , H b2 gÞ 2
= → PðBjAÞ = 6 → PðBjAÞ = :
PðfH 1 , H 2 , H b1 , H b2 gÞ 4 4
6

In fact, if we inspect the event A = {H1, H2, Hb1, Hb2}, we see that half of the
heads are biased.
2.2 Multiplication Rule 35

2.2 Multiplication Rule

For N events of an experiment, we have

PðA1 \ A2 ⋯ \ AN Þ
= PðA1 ÞPðA2 jA1 ÞPðA3 jA1 \ A2 Þ⋯PðAN jA1 \ A2 ⋯AN - 1 Þ ð2:5Þ

which can be written mathematically in a more compact manner as

N
P \Ni = 1 Ai = P Ai \ij -
= 1 Aj :
1
ð2:6Þ
i=1

Proof: We can show the correctness of (2.5) using the definition of the conditional
probability as in

PðA1 \ A2 ⋯ \ AN Þ = PðA1 Þ
PðA1 \ A2 Þ PðA3 \ A1 \ A2 Þ PðAN \ A1 \ A2 ⋯AN - 1 Þ
 ⋯
PðA1 Þ PðA1 \ A2 Þ PðA1 \ A2 ⋯AN - 1 Þ

in which canceling the common terms, we get

PðA1 \ A2 ⋯ \ AN Þ = PðAN \ A1 \ A2 ⋯AN - 1 Þ

which is a correct equality.


Example 2.5: There is a box containing 6 white and 6 black balls. We pick 3 balls
from the box without replacement, i.e., without putting them back to the box, in a
sequential manner. What is the probability that all the drawn balls are white in color?
Solution 2.5: Let’s define the events
A1 = {The first drawn ball is white:g}
A2jA1 = {The second drawn ball is white assuming
that the first drawn ball is white:g.}
A3jA1, A2 = {The third drawn ball is white assuming that the first and
second drawn balls are white:g.}
Note here that A2 j A1 or A3 j A1, A2 are just notations; they are used to express
the conditional occurrence of events in a short way.
36 2 Total Probability Theorem, Independence, Combinatorial

Before starting to the drawls, the initial sample space is

S1 = f6 White Balls, 6 Black Ballsg

Since the experiment is a fair one, the probability of the event A1 can be calculated
as

N ðA1 Þ
PðA1 Þ = ð2:7Þ
N ð S1 Þ

where N(A1) and N(S1) are the number of simple events in the event A1 and S1,
respectively.
The probability (2.7) can be calculated as

N ðA 1 Þ 6
PðA1 Þ = → PðA1 Þ = :
N ð S1 Þ 12

After the first experiment, the sample space has one missing element, and the
sample space can be written as

S2 = f5 White Balls, 6 Black Ballsg

The probability of A2 given A1 can be calculated as

N ðA2 jA1 Þ 5
PðA2 jA1 Þ = = :
N ð S2 Þ 11

Similarly, the probability of P(A3| A1 \ A2) is calculated as

4
PðA3 jA1 \ A2 Þ = :
10

In the question, P(A1 \ A2 \ A3) is asked. We can calculate P(A1 \ A2 \ A3) as

PðA1 \ A2 \ A3 Þ = PðA1 ÞPðA2 jA1 ÞPðA3 jA1 \ A2 Þ → PðA1 \ A2 \ A3 Þ


6 5 4
= × × :
12 11 10

2.3 Independence

The events A and B are said to be independent events if the occurrence of the event
B does not change the probability of the occurrence of event A. That is, if
2.3 Independence 37

PðAjBÞ = PðAÞ ð2:8Þ

then the events A and B are said to be independent events. The independence
condition in (2.8) can alternatively be expressed as

PðA \ BÞ
PðAjBÞ = PðAÞ → = PðAÞ → PðA \ BÞ = PðAÞPðBÞ:
PðBÞ

Namely, the events A and B are independent of each other, if

PðA \ BÞ = PðAÞPðBÞ

is satisfied.
Note: For disjoint events A and B, we have P(A \ B) = 0, and for independent
events A and B, we have P(A \ B) = P(A)P(B).
Example 2.6: Show that two disjoint events A and B can never be independent
events.
Proof 2.6: Let A and B be two disjoint events such that

PðAÞ > 0 PðBÞ > 0

and

PðA \ BÞ = 0:

It is clear that

PðAÞPðBÞ > 0:

This means that

PðA \ BÞ ≠ PðAÞPðBÞ:

Thus, two disjoint events can never be independent.


Example 2.7: A three-sided fair die is tossed twice.
(a) Write the sample space of this experiment.
(b) Consider the following events

A = fThe first flip shows up f 1 g


B = fThe second flip shows up f 3 g:

Decide whether the events A and B are independent events or not.


38 2 Total Probability Theorem, Independence, Combinatorial

Solution 2.7: The sample space of the first toss is S1 = {f1, f2, f3}. The sample space
of the two tosses can be calculated as

S = S 1 × S 1 → S = f f 1 f 1 , f 1 f 2 , f 1 f 3 , f 2 f 1 , f 2 f 2 , f 2 f 3 , f 3 f 1 , f 3 f 2 , f 3 f 3 g:

The events A and B can be written as

A = ff 1 f 1 , f 1 f 2 , f 1 f 3 g B = ff 1 f 3 , f 2 f 3 , f 3 f 3 g

whose probabilities are evaluated as

3 3
PðAÞ = PðBÞ = : ð2:9Þ
9 9

The event A \ B can be found as

A \ B = ff 1 f 3 g

whose probability is

1
PðA \ BÞ = : ð2:10Þ
9

Since

PðA \ BÞ = PðAÞ × PðBÞ

is satisfied, we can conclude that the events A and B are independent of each other.

2.3.1 Independence of Several Events

Let A1, A2, ⋯, AN be the events of an experiment. The events A1, A2, ⋯, AN are
independent of each other, if

P \ Ai = PðAi Þ for every subset of B = f1, 2, ⋯, N g: ð2:11Þ


i2B
i2B

Example 2.8: If the events A1, A2, and A3 are independent of each other, then all of
the following equalities must be satisfied.
1. P(A1 \ A2) = P(A1)P(A2)
2. P(A1 \ A3) = P(A1)P(A3)
2.4 Conditional Independence 39

3. P(A2 \ A3) = P(A2)P(A3)


4. P(A1 \ A2 \ A3) = P(A1)P(A2)P(A3)
Exercise: For the two independent tosses of a fair die, we have the following events
defined:

A = fFirst flip shows up 1 or 2g


B = fSecond flip shows up 2 or 4g
C = fThe sum of the two numbers is 8g:

Decide whether the events A, B, and C are independent of each other or not.

2.4 Conditional Independence

The events A and B are said to be conditionally independent, if for a given event C

PðA \ BjC Þ = PðAjC ÞPðBjCÞ ð2:12Þ

is satisfied.
The left side of the conditional independence in (2.12) can be written as

PðA \ B \ C Þ
PðA \ BjCÞ =
PðC Þ

in which using the property

PðA \ B \ CÞ = PðCÞPðBjC ÞPðAjB \ C Þ

we obtain

PðC ÞPðBjC ÞPðAjB \ CÞ


PðA \ BjC Þ =
PðC Þ

leading to

PðA \ BjC Þ = PðBjC ÞPðAjB \ C Þ: ð2:13Þ


40 2 Total Probability Theorem, Independence, Combinatorial

Substituting (2.13) for the left-hand side of (2.12), we get

PðBjCÞPðAjB \ C Þ = PðAjC ÞPðBjC Þ

where canceling the common terms from both sides, we get

PðAjB \ CÞ = PðAjCÞ: ð2:14Þ

The conditional independence implies that, if the event C did occur, the additional
occurrence of the event B does not have any effect on the probability of occurrence
of event A.
Example 2.9: For the two tosses of a fair coin experiment, the following events are
defined

A = fFirst flip shows up a Headg


B = fSecond flip shows up a Tailg
C = fIn both flips, at least one Head appearsg:

Decide whether the events A and B are conditionally independent given the
event C.
Solution 2.9: The events A, B, and C can be written as

A = fHH, HT g B = fHT, TT g C = fHT, TH, HH g


S = fHH, HT, TH, TT g:

For the conditional independence of A and B given C, we must have

PðAjB \ C Þ = PðAjC Þ

which can be written as

PðA \ B \ CÞ PðA \ C Þ
= : ð2:15Þ
P ðB \ C Þ PðC Þ

Using the given events, the probabilities in (2.15) can be calculated as

1
PðA \ B \ C Þ = PfHT g → PðA \ B \ C Þ =
4
1
PðB \ C Þ = PfHT g → PðB \ C Þ =
4
2.4 Conditional Independence 41

2
PðA \ CÞ = PfHH, HT g → PðA \ C Þ = :
4

Then, from (2.15) we have

1 2
2
4
1
= 4
3
→1=
4 4
3

which is not correct. Thus, for the given events, we have

PðAjB \ C Þ ≠ PðAjCÞ

which means that the events A and B given C are not conditionally independent of
each other.
Example 2.10: Show that if A and B are independent events, so are the A and Bc.
Proof 2.10: If A and B are independent events, then we have

PðA \ BÞ = PðAÞPðBÞ:

The event A can be written as

A=A \ S

in which substituting S = B [ Bc, we get

A = A \ ðB [ Bc Þ → A = ðA \ BÞ [ ðA \ Bc Þ

where employing the probability law axiom-2, we obtain

PðAÞ = PðA \ BÞ þ PðA \ Bc Þ

in which using P(A \ B) = P(A)P(B), we get

PðAÞ = PðAÞPðBÞ þ PðA \ Bc Þ

leading to

PðAÞ - PðAÞPðBÞ = PðA \ Bc Þ → PðAÞð1 - PðBÞÞ = PðA \ Bc Þ → PðA \ Bc Þ = PðAÞPðBc Þ:


PðBc Þ
42 2 Total Probability Theorem, Independence, Combinatorial

Exercise: Show that if A and B are independent events, so are the Ac and Bc.
Hint: Ac = Ac \ S and S = B [ Bc.
Exercise: Show that if A and B are independent events, so are the Ac and B.
Hint: Ac = Ac \ S and S = B [ Bc and use the result of the previous example.

2.5 Independent Trials and Binomial Probabilities

Assume that we perform an experiment, and at the end of the experiment, we wonder
whether an event has occurred or not, for example, flip of a fair coin and occurrence
of head, success or failure from an exam, winning or losing a game, it rains or does
not rain, toss of a die and occurrence of an even number, etc. Let’s assume that such
experiments are repeated N times in a sequential manner, for instance, flipping a fair
coin ten times, playing 10 chess games, etc. We wonder about the probability of the
same event occurring k times out of N trials. Let’s explain the topic with an example.
Example 2.11: Consider the flip of a biased coin experiment. The sample space is
S1 = {H, T} and the simple events have the probabilities

PðH Þ = p PðT Þ = 1 - p:

Let’s say that we flip the coin 5 times. In this case, sample space is calculated by
taking the 5 Cartesian product of S1 by itself, i.e.,

S = S1 × S 1 × S 1 × S 1 × S 1

which includes 32 elements, and each element of S contains 5 simple events, for
instance, HHHHH, HHHHT, ⋯ etc. Now think about the question, what is the
probability of seeing 3 heads and 2 tails after 5 flips of the coin?
Consider the event A having 3 heads and 2 tails; the event A can be written as

A ¼ fHHHTT; HHTTH; HTTHH; TT HHH,


THTHH; HTHTH; HHTHT; THHTH; HTHHT; THHHTg

The probability of any simple event containing 3 heads and 2 tails equals p3(1 - p)2,
for instance, P(HHHTT) can be calculated as

PðHHHTT Þ = PðH Þ PðH Þ PðH Þ PðT Þ PðT Þ → PðHHHTT Þ = p3 ð1 - pÞ2


p p p 1-p 1-p
2.5 Independent Trials and Binomial Probabilities 43

The probability of the event A can be calculated by summing the probabilities of


simple events appearing in A. Since there are 10 simple events, each having
probability of occurrence p3(1 - p)2 in A. The probability of A can be calculated as

PðAÞ = p3 ð1 - pÞ2 þ p3 ð1 - pÞ2 þ ⋯ þ p3 ð1 - pÞ2 → PðAÞ = 10 × p3 ð1 - pÞ2

which can be written as

5 3
PðAÞ = p ð 1 - pÞ 2 :
3

Thus, the probability of seeing 3 heads and 2 tails after 5 tosses of the coin is

5 3
p ð1 - pÞ2 :
3

Now consider the events

A0 = f0 Head 5 tailsg
A1 = f1 Head 4 tailsg
A2 = f2 Heads 3 tailsg
A3 = f3 Heads 2 tailsg
A4 = f4 Heads 1 Tailg
A5 = f5 Heads 0 Tailg

It is obvious that the events A0, A1, A2, A3, A4, A5 are disjoint events, i.e.,
Ai \ Aj = ϕ, i, j = 0, 1, ⋯, 5, i ≠ j, and we have

S = A0 [ A1 [ A2 [ A3 [ A4 [ A5 :

According to the probability law axioms-2 and 3 in (1.2) and (1.3), we have

PðSÞ = 1 → PðA0 Þ þ PðA1 Þ þ PðA2 Þ þ PðA3 Þ þ PðA4 Þ þ PðA5 Þ = 1

leading to

5 0 5 1 5 2 5 3
p ð 1 - pÞ 5 þ p ð1 - pÞ4 þ p ð 1 - pÞ 3 þ p ð 1 - pÞ 2
0 1 2 3
5 4 5 5
þ p ð1 - pÞ1 þ p ð 1 - pÞ 0 = 1
4 5

which can be written as


44 2 Total Probability Theorem, Independence, Combinatorial

5
5
pk ð1 - pÞ5 - k = 1:
k=0 k

Example 2.12: Consider the flip of a fair coin experiment. The sample space is
S1 = {H, T}. Let’s say that we flip the coin N times. In this case, sample space is
calculated by taking the N Cartesian product of S1 by itself, i.e.,

S = S1 × S1 × ⋯S1

What is the probability of seeing k heads at the end of N trials. Following a similar
approach as in the previous example, we can write that

N k
P k heads appear in N flips → PðAk Þ = p ð 1 - pÞ N - k ð2:16Þ
k
Ak

Considering the disjoint events A1, A2, ⋯, AN such that S = A1 [ A2 [ ⋯ [ AN,


we can write

N
N k
p ð1 - pÞN - k = 1:
k=0 k

Now consider the event Ak, the number of heads appearing in N tosses is a number
between k1 and k2. The event Ak can be written as

Ak = Ak1 [ Ak1 þ1 [ ⋯ [ Ak2

where the events Ak1 , Ak1 þ1 , ⋯, Ak2 are disjoint events.


The probability of Ak can be calculated as

P the number of heads in N flips is a number between k 1 and k 2 →


Ak
PðAk Þ = PðAk1 [ Ak1 þ1 [ ⋯Ak2 Þ = PðAk1 Þ þ PðAk1 þ1 Þ þ ⋯ þ PðAk2 Þ
k2 N k
= p ð 1 - pÞ N - k
k = k1 k
ð2:17Þ
2.5 Independent Trials and Binomial Probabilities 45

Note: Let x and y be two simple events of a sample space, then we have

x [ y = fx, yg

and for the Cartesian product, we can write

ðx [ yÞ × ðx [ yÞ → fx, yg × fx, yg = fxx, xy, yx, yyg

thus

ðx [ yÞ × ðx [ yÞ = fxx, xy, yx, yyg

A similar approach can be considered for the two events A and B of an


experiment.
Example 2.13: A fair die is tossed 5 times. What is the probability that a number
divisible by 3 appears 4 times?
Solution 2.13: For the fair die toss experiment, the sample space is

S1 = f1, 2, 3, 4, 5, 6g:

The event “a number divisible by 3 appears” can be written as

A = f3, 6g:

We can write the sample space S for our experimental outcomes as

S2 = A [ B

where B = {1, 2, 4, 5}. The probabilities of the events A and B are

2 4
PðAÞ = PðBÞ = :
6 6

When the fair die is flipped 5 times, the sample space happens to be

S = S2 × S2 × S2 × S2 × S2

which includes 32 elements, i.e.,

S = fAAAAA, AAAAB, AAABA, ⋯, BBBBBg:

The event of S in which A appears 4 times, i.e., a number divisible by 3 appears


4 times, is
46 2 Total Probability Theorem, Independence, Combinatorial

C = fAAAAB, AAABA, AABAA, ABAAA, BAAAAg:

The probability of the event C can be calculated as

PðC Þ = PðAAAABÞ þ Pð AAABAÞ þ Pð AABAAÞ þ PðABAAAÞ þ PðBAAAAÞ


ð13Þ ð23Þ ð13Þ ð23Þ ð13Þ ð23Þ ð13Þ ð23Þ ð13Þ ð23Þ
2 2 2 2 2

leading to

2
1 2
P ðC Þ = 5 ×
3 3

which can be written as

2
5 1 2
PðC Þ = × :
4 3 3

In fact, using the formula

N k
p ð 1 - pÞ N - k
k

directly for the given example, we can get the same result.
Theorem 2.1: Let S be the sample space of an experiment, and A is an event.
Assume that the experiment is performed N times. The probability of an event
occurring k times in N trials can be calculated as

N k
PN ðAk Þ = p ð1 - pÞN - k ð2:18Þ
k

where p = Prob(A).
Example 2.14: A biased coin has the simple event probabilities

2 1
PðH Þ = PðT Þ = :
3 3

Assume that the biased coin is flipped and a fair die is tossed together 8 times.
What is the probability that a tail and an even number appear together 5 times?
Solution 2.14: The sample space of the biased coin toss experiment is

S1 = fH, T g:

The sample space of the flipping-a-die experiment is


2.5 Independent Trials and Binomial Probabilities 47

S2 = f1, 2, 3, 4, 5, 6g:

The combined experiment has the sample space

S = S1 × S2 → S = fH1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6g:

The event

A = fA tail and an even number appearsg

can be written as

A = fT2, T4, T6g:

and the sample space S3, considering the experimental outcomes given in the
question, is

S3 = A [ B

where

B = fH1, H2, H3, H4, H5, H6, T1, T3, T5g:

The probability of the event A can be calculated using

PðAÞ = PðT2Þ þ PðT4Þ þ PðT6Þ → PðAÞ = PðTÞPð2Þ þ PðTÞPð4Þ þ PðTÞPð6Þ

leading to

1 1 1 1 1 1 1
PðAÞ = × þ × þ × → PðAÞ = :
3 6 3 6 3 6 6

Now consider the combined experiment, i.e., the biased is flipped and a fair die is
tossed together 8 times. The probability that event A occurs 5 times in 8 trials of the
experiment can be calculated using

5 3
N k 8 1 5
PN ðAk Þ = p ð 1 - pÞ N - k → P 8 ð A 5 Þ = :
k 5 6 6

Example 2.15: We flip a biased coin and draw a ball with replacement from a box
that contains 2 red, 3 yellow, and 2 blue balls. For the biased coin, the probabilities
of the head and tail are
48 2 Total Probability Theorem, Independence, Combinatorial

1 3
P ðH b Þ = PðT b Þ = :
4 4

If we repeat the experiment 8 times, what is the probability of seeing a tail and
drawing a blue ball together 5 times?
Solution 2.15: For the biased coin flip experiment, the sample space can be written
as

S1 = fH b , T b g

and for the ball drawing experiment, we can write the sample space as

S2 = fR1 , R2 , Y 1 , Y 2 , Y 3 , B1 , B2 g:

If we consider two experiments at the same time, i.e., the combined experiment, the
sample space can be formed as

S = S1 × S2 → S = fH b , T b g × fR1 , R2 , Y 1 , Y 2 , Y 3 , B1 , B2 g →
H b R1 , H b R2 , H b Y 1 , H b Y 2 , H b Y 3 , H b B1 , H b B2 ,
S= :
T b R1 , T b R2 , T b Y 1 , T b Y 2 , T b Y 3 , T b B1 , T b B2

Let’s define the event A as

A = fseeing a tail and drawing a blue g:

We can write the elements of A explicitly as

A = f T b B1 , T b B2 g:

The probability of A can be calculated as

PðAÞ ¼ PðT b B1 Þ þ PðT b B2 Þ!PðAÞ


3 1 3 1 3
¼ PðT b ÞPðB1 Þ þ PðT b ÞPðB2 Þ!PðAÞ ¼ × þ × !PðAÞ ¼ :
4 7 4 7 14

In our question, the experiment is repeated 8 times, and the probability of seeing a
tail and drawing a blue ball together 5 times is asked. We can calculate the asked
probability as

5
8 3 11 3
PðA5 Þ = :
5 14 14
2.5 Independent Trials and Binomial Probabilities 49

Exercise: A biased coin is flipped and a 4-sided biased die is tossed together
8 times. The probabilities of the simple events for the separate experiments are

2 1
PðH Þ = P ðT Þ =
3 3

2 1 2 1
Pðf 1 Þ = Pðf 2 Þ = Pðf 3 Þ = Pðf 4 Þ = :
3 3 3 3

What is the probability of seeing a head and an odd number 3 times in 8 tosses?
Example 2.16: An electronic device is produced by a factory. The probability that
the produced device is defective equals 0.1. We purchase 1000 of these devices.
What is the probability that the total number of defective devices is a number
between 50 and 150.
Solution 2.16: Consider the coin toss experiment. The sample space is S1 = {H, T}.
If you flip the coin N times, you calculate the sample space using N times Cartesian
product

S = S1 × S1 × ⋯ × S1 :

The solution of the given question can be considered in a similar manner.


Purchasing the electronic device can be considered an experiment. The simple
events of this experiment are defective and non-defective devices, i.e.,

S2 = fD, N g

where D and N refer to purchasing of defective and non-defective devices, respec-


tively. Purchasing 1000 electronic devices can be considered as repeating the
experiment 1000 times, and the sample space of this experiment can be calculated
in a similar manner to the coin toss experiment as

S = S2 × S2 × ⋯ × S2 :

The probability of the simple event with N letters in which D appears can be
calculated as

p k ð 1 - pÞ N - k

and the probability of the event including the simple events of S in which D appears
k times is calculated as
50 2 Total Probability Theorem, Independence, Combinatorial

N k
p ð 1 - pÞ N - k :
k

And if k is a number between k1 and k2, the sum of probabilities of all these events
equals

k2
N k
PðAk Þ = p ð 1 - pÞ N - k :
k k = k1 k

For our question, the probability that the total number of defective devices is a
number between 50 and 150 can be calculated as

150
1000
PðAk Þ = 0:1k × 0:9100 - k :
k k = 50 k

2.6 The Counting Principle

Assume that there are M experiments with samples spaces S1, S2, ⋯, SM. The
number of elements in the sample spaces S1, S2, ⋯, SM are N1, N2, ⋯, NM,
respectively. If the experiments are all considered together as a single experiment,
then the sample space of the combined experiment is calculated as

S = S1 × S2 × ⋯ × SM ð2:19Þ

and the number of elements in the sample space S equals to

N = N1 × N2 × ⋯ × NM : ð2:20Þ

Example 2.17: Consider the integer set Fq = {0, 1, 2, ⋯, q - 1}. Assume that we
construct integer vectors v = ½v1 v2 ⋯vL  using the integers in Fq. How many
different integer vectors we can have?
Solution 2.17: Selecting a number from the integer set Fq = {0, 1, 2, ⋯, q - 1} can
be considered as an experiment, and the sample space of this experiment is

S1 = f0, 1, 2, ⋯, q - 1g:

To construct the integer vector v including L integers, we need to repeat the


experiment L times. And the sample space of the combined experiment can be
obtained by taking the L times Cartesian product of S1 by itself as
2.7 Permutation 51

S = S1 × S 1 × ⋯ × S 1 :

The elements of S are integer vectors containing L numbers. The number of


vectors in S is calculated as

q × q × ⋯ × q → qL :
L times

Example 2.18: Consider the integer set F3 = {0, 1, 2, 3}. Assume that we construct
integer vectors v = ½v1 v2 ⋯v10  including 10 integers using the elements of F3. How
many different integer vectors can we have?
Solution 2.18: The answer is

3 × 3 × ⋯ × 3 = 310 :
10 times

2.7 Permutation

Consider the integer set S1 = {1, 2, ⋯, N}. Assume that we draw an integer from the
set S1 without replacement, and we repeat this experiment k times in total. The
sample space of the kth draw, i.e., kth experiment, is indicated by Sk. The sample
space of the combined experiment

S = S1 × S 2 × ⋯ × S N

contains

N × ðN- 1Þ × ⋯ × ðN- k þ 1Þ

k-digit integer sequences, and it is read as k permutation of N, and it is shortly


expressed as

N!
: ð2:21Þ
ðN - k Þ!

The sample space S of the combined experiment contains simple events consisting
of k distinct integers chosen from S1. Thus, at the end of the kth trial, we obtain a
sequence of k distinct integers. And the number N × (N - 1) × ⋯ × (N - k + 1)
indicates the total number of integer sequences containing k distinct integers, i.e., the
number of elements in the sample space S.
52 2 Total Probability Theorem, Independence, Combinatorial

The discussion given above can be extended to any set containing objects rather
than integers. In that case, while forming the distinct combination of objects, we pay
attention to the index of the objects.
Example 2.19: The set S1 = {1, 2, 3} is given. We draw 2 integers from the set
without replacement. Write the possible generated sequences.
Solution 2.19: Assume that at the first trial 1 is selected, then at the end of second
trial, we can get the sequences

1 × f2, 3g → 12, 13:

If at the first trial 2 is selected, then at the end of second trial, we can get the
sequences

2 × f1, 3g → 21, 23:

If at the first trial 3 is selected, then at the end of second trial, we can get the
sequences

3 × f1, 2g → 31, 32:

Hence, the possible 2-digit sequences containing distinct elements are

f12, 13, 21, 23, 31, 32g:

The number of 2-digit sequences is 6, which can be obtained by taking the


2-permutation of 3 as

3!
= 6:
ð3 - 2Þ!

Example 2.20: In English language, there are 26 letters. How many words can be
formed consisting of 5 distinct letters?
Solution 2.20: You can consider this question as the draw of letters from the
alphabet box without replacement, and we repeat the experiment 5 times. Then,
the number of words that contains 5 distinct letters can be calculated using

26 × 25 × 24 × 23 × 22

which is nothing but 5 permutations of 26.


2.8 Combinations 53

2.8 Combinations

Assume that there are N people, and we want to form a group consisting of k persons
selected from N people. How many different groups can we form?
The answer to this question passes through permutation calculation. We can find
the answer by calculating the k permutation of N. However, since humans are
considered while forming the sequences, some of the sequences include the same
persons although their order is different in the sequence. For instance, the sequences
abcd and bcda contain the same persons and they are considered the same.
The elements of a sequence containing k distinct elements can be reordered in k!
different ways.
Example 2.21: The sequence aec can be reordered as

ace eac eca cea cae:

Hence, for a sequence including 3 distinct elements, it is possible to obtain 3 ! = 6


sequences, and if each letter indicates a human, then it is obvious that all these
groups are the same of each other.
Considering all the above discussion, we can conclude that in k permutation of N,
each sequence appears k! times including its reordered versions. Then, the total
number of unique sequences without any reordered replicas equals

N!
ð2:22Þ
ðN - k Þ! × k!

which is called k combination of N and it is shortly indicated as

N
: ð2:23Þ
k

Example 2.22: Consider the sample space S = {a, b, c, d}. The number of different
sequences containing 2 distinct letters from S can be calculated using 2 permutations
of 4 as

4!
= 12
ð4 - 2Þ!

and these sequences can be written as

ab ac ad ba bc bd ca cb cd da db dc:

On the other hand, if reordering is not wanted, then the number of sequences
containing 2 distinct letters can be calculated using
54 2 Total Probability Theorem, Independence, Combinatorial

4!
=6
ð4 - 2Þ! × 2!

and the sequences can be written as

ab ac ad bc bd cd:

Example 2.23: A box contains 60 items, and of these 60 items 15 of them are
defective. Suppose that we select 23 items randomly. What is the probability that
from these 23 items 8 of them are defective?
Solution 2.23: Let’s formulate the solution as follows. Sample space is

S = fSelecting 23 items out of 60 itemsg:

The sample space contains

60
N ðSÞ =
23

number of different elements, i.e., sequences. Let’s define the event A as

A = fFrom 23 selected items, 8 of them are defective and 15 of them are robustg:

In fact, the event A can be written as

A = A1 × A2

where the event A1 and A2 are defined as

A1 = fChoose 8 defective items out of 15 defective itemsg


A2 = fChoose 15 robust items out of 45 robust itemsg

The event A contains

15 45
N ðAÞ = ×
8 15

elements. The probability of the event A is calculated as

N ðA Þ 15
× 45
PðAÞ = → PðAÞ = 8 15
N ð SÞ 60
23
2.8 Combinations 55

Example 2.24: An urn contains 3 red and 3 green balls, each of which is labeled by
a different number. A sample of 4 balls are drawn without replacement. Find the
number of elements in the sample space.
Solution 2.24: Let’s show the content of the urn by the set {R1, R2, R3, G1, G2, G3}.
After drawing of the 4 balls, we can get the sequences,

R1 R2 R3 G1 R1 R2 R3 G2 R1 R2 R3 G3 R1 R2 G1 G2 R1 R2 G1 G3
3R 1G 3R 1G 3R 1G 2R 2G 2R 2G

R1 R2 G2 G3 R2 R3 G1 G2 R2 R3 G1 G3 R2 R3 G2 G3 R1 R3 G1 G2
2R 2G 2R 2G 2R 2G 2R 2G 2R 2G

R1 R3 G1 G3 R1 R3 G2 G3 R1 G1 G2 G3 R2 G1 G2 G3 R3 G1 G2 G3
2R 2G 2R 2G 1R 3G 1R 3G 1R 3G

The total number of sequences is 15, which is equal to

6
4

i.e., 4 combinations of 6. That is,

6
N ð SÞ = :
4

Example 2.25: For the previous example, consider the event A defined as

A = f2 red balls are drawn, 2 green balls are drawng:

Find the probability of event A.


Solution 2.25: The event A can be written as

A = A1 × A2

where the events A1 and A2 are defined as

A1 = f2 red balls are drawng A2 = f2 green balls are drawng:

We have

3 3
N ðA1 Þ = N ðA2 Þ =
2 2

and
56 2 Total Probability Theorem, Independence, Combinatorial

3 3
N ðAÞ = N ðA1 Þ × N ðA2 Þ → N ðAÞ = × :
2 2

The probability of the event A can be calculated as

N ðA Þ 3
× 3
PðAÞ = → PðAÞ = 2 2
:
N ðSÞ 6
4

2.9 Partitions

Suppose that we have N distinct objects, and N = N1 + N2 + ⋯ + Nr. We first draw


N1 objects without replacement and make a group with these N1 objects, and draw N2
objects from the remaining without replacement and make other groups with these
N2 objects, and go on like this until the formation of the last group containing Nr
objects. Each draw can be considered a separate experiment, and let’s denote the
sample space of an experiment by Sk. The sample space of the combined experiment
can be formed using the Cartesian product

S = S1 × S 2 × ⋯ × S N r

and the size of the sample space S, denoted by jSj, which indicates the number of
ways these groups can be formed, can be calculated using

j S j = j S1 j × j S2 j × ⋯ × j SN r j

leading to

N N - N1 N - N1 - N2 N - N1 - ⋯ - Nr - 1
× × ×⋯×
N1 N2 N3 Nr

which can be simplified as

N! ðN - N 1 Þ ðN - N 1 - ⋯ - N r - 1 Þ!
× ×⋯×
ðN - N 1 Þ! × N 1 ! ðN - N 1 - N 2 Þ! × N 2 ! ðN - N 1 - ⋯ - N r - 1 - N r Þ × N r !

where canceling the same terms in numerators and denominators, we obtain

N!
: ð2:24Þ
N1! × N2! × ⋯ × Nr !
2.9 Partitions 57

The idea of the partitions can also be interpreted in a different way considering the
permutation law. If there are N distinct objects available, the number of N-object
sequences that can be formed from these N objects can be calculated as

N × ðN- 1Þ × ⋯ × 1 = N!: ð2:25Þ

That is, the total number of permutations for N objects equals N!.
In fact, the result in (2.25) is nothing but the number of elements in the sample
space of the combined experiment, and there are N experiments in total and the
sample space of the kth, k = 1⋯N experiment contains

N -k
1

elements.
Note: |S| indicates the number of elements in the set S.
If N1 objects are the same, then the total number of permutations is

N!
:
N 1!

If N1 objects are the same, and N2 objects are the same, then the total number of
permutations is

N!
:
N 1 !N 2 !

In a similar manner, if N1 objects are the same, N2 objects are the same, and so on
until the Nr are the same objects, the total number of permutations is

N!
< N! ð2:26Þ
N 1 !N 2 !⋯N r !

Example 2.26: In the English language, there are 26 letters. How many words can
be formed consisting of 5 distinct letters?
Solution 2.26: You can consider this question as the draw of letters from the
alphabet box without replacement, and we repeat the experiment 5 times. Then,
the number of words that contains 5 distinct letters can be calculated using

26 × 25 × 24 × 23 × 22

which is nothing but 5 permutations of 26.


58 2 Total Probability Theorem, Independence, Combinatorial

Example 2.27: The total number of permutations of the sequence abc is 3! = 6, and
these sequences are

abc acb bac bca cab cba:

On the other hand, the total number of permutations of the sequence aab is
3!/2! = 3, and these sequences are

aab aba baa

Which contains 3 sequences, i.e., 6/2!; the reason for this reduction can be seen
from

aab aba baa :


acb abc bac
cab cba bca

Example 2.28: The total number of permutations for the sequence abcd is 4! = 24.
That is, by reordering the items in abcd, we can write 24 distinct sequences in total.
On the other hand, the total number of permutations for the sequence abac is
4!/2! = 12, and these sequences are

abac acab abca acba aabc aacb cbaa bcaa baca caba caab baac

Exercise: For the sequences abcde and aaabbc write all the possible permutations,
and show the relation between the permutation number of both sequences.
Example 2.29: How many different letter sequences by reordering the letter in the
word TELLME?
Solution 2.29: The number of different letter sequences equals

6!
:
2! × 2!

Partitions Continued
Let S be the sample space of an experiment, and A1, A2, A3, ⋯, Ar be the disjoint sets
forming a partition of S such that Ai \ Aj = ϕ, i ≠ j, and A1 [ A2 [ ⋯ [ Ar = S. The
probabilities of the disjoint events A1, A2, A3, ⋯, Ar are

p1 = PðA1 Þ p2 = ðA2 Þ ⋯ pR = ðAr Þ

such that
2.9 Partitions 59

p1 þ p2 þ ⋯ þ pr = 1:

The sample space can be written as

S = fA1 , A2 , ⋯, Ar g:

Assume that we repeat the experiment N times. Consider the event B defined as

B = fA1 occurs N 1 times, A2 occurs N 2 times, Ar occcurs N r times:g

The probability of the event B can be calculated as

N!
PðBÞ = pN 1 × pN2 2 × ⋯ × pNr r ð2:27Þ
N 1! × N 2! × ⋯ × N r ! 1

where

pN1 1 × pN2 2 × ⋯ × pNr r ð2:28Þ

denotes the probability of a single element of B, and

N!
ð2:29Þ
N 1! × N 2! × ⋯ × Nr !

is the total number of elements in B. Every element of B has the same probability of
occurrence.
Example 2.30: A fair die is tossed 15 times. What is the probability that the
numbers 2 or 4 appear 5 times and 3 appears 4 times?
Solution 2.30: For the given experiment, the sample space is S1 = {1, 2, 3, 4, 5, 6}.
We can define the disjoint events A1, A2, and A3 as

A1 = f2, 4g A2 = f3g A3 = f1, 5, 6g:

Considering the experimental output expected in the question, we can write the
sample space of the experiment as

S1 = A1 [ A2 [ A3 :

The probabilities of the events A1, A2, and A3 can be calculated as

2 1 3
PðA1 Þ = PðA2 Þ = PðA3 Þ = :
6 6 6
60 2 Total Probability Theorem, Independence, Combinatorial

We perform the experiment 15 times. The sample space of the repeated combined
experiment can be found by taking 15 times the Cartesian product of S1 by itself as

S = S1 × S 1 × ⋯ × S 1

whose elements consists of the 15 letters, i.e.,

S¼ A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 ,

A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A2 , ⋯ :

Let’s define the event B for the combined experiment as

B = fA1 occurs 5 times, A2 occurs 3 times, A3 occurs 6 timesg:

The probability of event A1 occurring 5 times, event A2 occurring 4 times, and


event A3 occurring 6 times, i.e., the probability of the event B, can be calculated as

5 4 6
15! 2 1 3
PðBÞ = × ×
5! × 4! × 6! 6 6 6

where the coefficient

15!
5! × 4! × 6!

indicates the number of elements is the event B, and the multiplication

5 4 6
2 1 3
× ×
6 6 6

refers to the probability of an element appearing in B.


Example 2.31: Assume that a dart table has 4 regions. And the probabilities for a
thrown dart to fall into these regions are 0.1, 0.4, 0.1, and 0.4, respectively. If we
throw the dart 12 times, find the probability that each area is hit 3 times.
Solution 2.31: Throwing a dart to a dart table can be considered an experiment. The
sample space of this experiment can be considered as hitting targeted areas. Let’s
indicate hitting 4 different targeted areas by the letters T1, T2, T2, and T2. Then,
sample space can be written as

S1 = fT 1 , T 2 , T 3 , T 4 g:

The probabilities of the simple events T1, T2, T3, T4 are given in the question as
2.10 Case Study: Modeling of Binary Communication Channel 61

PðT 1 Þ = 0:1 PðT 2 Þ = 0:4 PðT 3 Þ = 0:1 PðT 4 Þ = 0:4:

Throwing the dart 12 times can be considered as repeating the same experiment
12 times, and the sample space of the combined experiment in this case can be
calculated by taking 12 times the Cartesian product of S1 by itself, i.e.,

S = S1 × S 1 × ⋯ × S 1 :

The sample space S contains elements consisting of 12 letters, i.e.,

S = fT 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 , T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 2 , ⋯ g:

Let’s define an event B of S as follows:

B = fT 1 appears 3 times, T 2 appears 3 times, T 3 appears 3 times, T 4 appears 3 timesg

i.e.,

B = f T 1 T 1 T 1 T 2 T 2 T 2 T 3 T 3 T 3 T 4 T 4 T 4 , T 1 T 2 T 1 T 1 T 2 T 2 T 3 T 3 T 3 T 4 T 4 T 4 , ⋯g:

The probability of the event B can be calculated using

N!
PðBÞ = pN 1 × pN2 2 × ⋯ × pNr r
N1! × N 2! × ⋯ × N r ! 1

as

12!
PðBÞ = 0:13 × 0:43 × 0:13 × 0:43 :
3! × 3! × 3! × 3

Exercise: A fair die is flipped 8 times. Determine the probability of an odd number
appearing 2 times and 4 appearing 3 times.

2.10 Case Study: Modeling of Binary Communication


Channel

The binary communication channel is shown in Fig. 2.4.


We define the following events for the binary symmetric channels:

T 0 = fTransmitting a 0g T 1 = fTransmitting a 1g
R0 = fReceiving a 0g R1 = fReceiving a 1g
E = fError at receiverg
62 2 Total Probability Theorem, Independence, Combinatorial

Fig. 2.4 The binary P( R0 | T0 )


communication channel T0 R0
Transmitter Receiver
P( R0 | T1 ) P( R1 | T0 )
T1 R1
P( R1 | T1 )

Channel

The events T0 and T1 are disjoint events, i.e., T0 \ T1 = ϕ. The error event E can
be written as

E = ðT 0 \ R 1 Þ [ ð T 1 \ R 0 Þ

where T0 \ R1 and T1 \ R0 are disjoint events. Then, using probability axiom-2,


we get

PðEÞ = PðT 0 \ R1 Þ þ PðT 1 \ R0 Þ

which can be written as

PðEÞ = PðR1 jT 0 ÞPðT 0 Þ þ PðR0 jT 1 ÞPðT 1 Þ: ð2:30Þ

Since

S = T0 [ T1

we can write R1 as

R1 = R1 \ S → R1 = R1 \ ðT 0 [ T 1 Þ

leading to

R1 = ðR1 \ T 0 Þ [ ðR1 \ T 1 Þ

from which we get

PðR1 Þ = PðR1 \ T 0 Þ þ PðR1 \ T 1 Þ

which can also be written as

PðR1 Þ = PðR1 jT 0 ÞPðT 0 Þ þ PðR1 jT 1 ÞPðT 1 Þ: ð2:31Þ


2.10 Case Study: Modeling of Binary Communication Channel 63

Fig. 2.5 Binary symmetric P( R0 | T0 ) = 0.95


channel T0 R0

P( R0 | T1 ) = 0.1 P( R1 | T0 ) = 0.05

T1 R1
P( R1 | T1 ) = 0.90

Proceeding in a similar manner, we can write

PðR0 Þ = PðR0 jT 0 ÞPðT 0 Þ þ PðR0 jT 1 ÞPðT 1 Þ: ð2:32Þ

Example 2.32: For the binary symmetric channel shown in Fig. 2.5, if the bits “0”
and “1” have an equal probability of transmission, calculate the probability of error
at the receiver side.
Solution 2.32: Since the bits “0” and “1” have equal transmission probabilities,
then we have

1
PðT 0 Þ = PðT 1 Þ = :
2

Using the channel transition probabilities, the transmission error can be calculated as

PðEÞ = PðR1 jT 0 ÞPðT 0 Þ þ PðR0 jT 1 ÞPðT 1 Þ

leading to

1 1
PðE Þ = 0:05 × þ 0:1 × → PðE Þ = 0:075:
2 2

Example 2.33: For the binary symmetric channel shown in Fig. 2.6, the bits “0”
and “1” have equal probability of transmission.
If a “1” is received at the receiver side:
(a) What is the probability that a “1” was sent?
(b) What is the probability that a “0” was sent?
Solution 2.33:
(a) We are asked to find P(T1| R1), which can be calculated as

P ðT 1 \ R 1 Þ
PðT 1 jR1 Þ =
PðR1 Þ
PðR1 jT 1 ÞPðT 1 Þ
=
PðR1 jT 1 ÞPðT 1 Þ þ PðR1 jT 0 ÞPðT 0 Þ
0:95 × 0:5
=
0:95 × 0:5 þ 0:05 × 0:5
= 0:95:
64 2 Total Probability Theorem, Independence, Combinatorial

Fig. 2.6 Binary symmetric P( R0 | T0 ) = 0.95


channel for Example 2.33 T0 R0

P( R0 | T1 ) = 0.1 P( R1 | T0 ) = 0.05

T1 R1
P( R1 | T1 ) = 0.90

(b) We are asked to find P(T0| R1), which can be calculated as

P ðT 0 \ R 1 Þ
PðT 0 jR1 Þ =
PðR1 Þ
PðR1 jT 0 ÞPðT 0 Þ
=
PðR1 jT 1 ÞPðT 1 Þ þ PðR1 jT 0 ÞPðT 0 Þ
0:05 × 0:5
=
0:95 × 0:5 þ 0:05 × 0:5
= 0:0526:

Problems

1. The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g:

Partition S as the union of three disjoint events.


2. The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g

where simple events have equal probability of occurrence.


The events A, B, C are given as

A = fs1 , s3 , s6 g B = fs2 , s4 , s5 g C = fs7 , s8 g

such that

S = A [ B [ C:

The event D is defined as

D = fs1 , s4 , s5 , s7 g:
Problems 65

Verify that

PðDÞ = PðAÞPðDjAÞ þ PðBÞPðDjBÞ þ PðC ÞPðDjC Þ:

3. In a tennis tournament, there are 80 players. Of these 80 players, 20 of them are at


an advanced level, 40 of them are at an intermediate level, and 20 of them are at a
beginner level. You randomly choose an opponent and play a game.
(a) What is the probability that you will play against an advanced player?
(b) What is the probability that you will play against an intermediate player?
(c) What is the probability that you will play against a beginner player?
(d) You randomly choose an opponent and play a game. What is the probability
of winning?
(e) You randomly choose an opponent and play a game and you win. What is the
probability that you won against an advanced player?
4. A box contains two regular coins, one two-headed coin, and three two-tailed
coins. You pick a coin and flip it, and a head shows up. What is the probability
that the chosen coin is a regular coin?
5. A box contains a regular coin and a two-headed coin. We randomly select a coin
with replacement and flip it. We repeat this procedure twice.
(a) Write the sample space of the flipping a single coin experiment.
(b) Write the sample space of flipping two coins in a sequential manner
experiment.
(c) The events A, B, and C are defined as

A = fFirst flip result in a headg B = fSecond flip result in a headg


C = fIn both flips; the regular coin is selectedg:

Write these events explicitly, and decide whether the events A and B are
independent of each other. Decide whether the events A and B are conditionally
independent of each other given the event C.
6. Assume that you get up early and go for the bus service for your job every
morning. The probability that you miss the bus service is 0.1. Calculate the
probability that you miss the bus service 5 times in 30 days, i.e., in a month.
7. A three-sided biased die is tossed. The sample space of this experiment is given as

S = ff 1 , f 2 , f 3 g

where the simple events have the probabilities

1 2 1
Pðf 1 Þ = Pðf 2 Þ = Pðf 3 Þ = :
4 4 4
66 2 Total Probability Theorem, Independence, Combinatorial

Assume that we toss the die 8 times. What is the probability that f1 appears
5 times out of these 8 tosses.
8. Using the integers in integer set F4 = {0, 1, 2, 3}, how many different integer
vectors consisting of 12 integers can be formed?
9. From a group of 10 men and 8 women, 6 people will be selected to form a jury for
a court. It is required that the jury would contain at least 2 men. In how many
different ways we can form the jury?
Chapter 3
Discrete Random Variables

3.1 Discrete Random Variables

Let S = {s1, s2, ⋯, sN} be the sample space of a discrete experiment, and X ðÞ be a
real valued function that maps the simple events of the sample space to real numbers.
This is illustrated in Fig. 3.1.
Example 3.1: Consider the coin flip experiment. The sample space is S = {H, T}.
A random variable X ðÞ can be defined on the simple events, i.e., outcomes, as

X ðH Þ = 3:2 X ðT Þ = - 2:4:

Then, X is called a discrete random variable.


Example 3.2: One fair and one biased coin are flipped together. The sample space
of the combined experiment can be written as

S = fHH b , HT b , TH b , TT b g:

Let’s define a real valued function X ðÞ on simple outcomes of the combined
experiment as

1 if si contains H b
X ðsi Þ = ð3:1Þ
3 if si contains T b :

According to (3.1), we can write

X ðHH b Þ = 1 X ðHT b Þ = 3 X ðTH b Þ = 1 X ðTT b Þ = 3:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 67


O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_3
68 3 Discrete Random Variables

~
X()

Real Valued Function


Sample Space ~
S X()
Experiment
S {s1 , s2 , , sN } s 1 s2 s3 sN

Fig. 3.1 The operation of the random variable function

Fig. 3.2 The graph of the ~


random variable function for X (s )
Example 2.2
3

s
s1 s 2 s3 s4

If we denote the simple events HHb, HTb, THb, TTb by s1, s2, s3, s4, we can draw
the graph of

X ðÞ

as in Fig. 3.2.
Example 3.3: Consider the toss of a fair die experiment. The sample space of this
experiment can be written as S = {s1, s2, s3, s4, s5, s6}. Let’s define the random
variable X ðÞ on simple events of S as

2i - 1 if i is odd
X ðs i Þ =
2i þ 1 if i is even:

The random variable function can be explicitly written as

X ðs1 Þ = 2 × 1 - 1 → X ðs1 Þ = 1
X ðs2 Þ = 2 × 2 þ 1 → X ðs2 Þ = 5
X ðs3 Þ = 2 × 3 - 1 → X ðs3 Þ = 5
X ðs4 Þ = 2 × 4 þ 1 → X ðs4 Þ = 9
X ðs5 Þ = 2 × 5 - 1 → X ðs5 Þ = 9
X ðs6 Þ = 2 × 6 þ 1 → X ðs6 Þ = 13:
3.2 Defining Events Using Random Variables 69

Then, we can state that the random variable function X ðÞ takes the values from
the set {1, 5, 9, 13}, which can be called a range set of the random variable X and can
be denoted as

R = f1, 5, 9, 13g:
X

3.2 Defining Events Using Random Variables

An event, i.e., a subset of the sample space S, can be defined using

si jX ðsi Þ = x ð3:2Þ

which indicates the subset, i.e., event, of S consisting of si which satisfy X ðsi Þ = x.
Example 3.4: For the toss-of-a-die experiment in the previous question, the random
variable is defined as

2i - 1 if i is odd
X ðs i Þ =
2i þ 1 if i is even:

Then, the event

A = si jX ðsi Þ = 5

can be explicitly written as

A = fs 2 , s 3 g

since X ðs2 Þ = 5, X ðs3 Þ = 5:


Example 3.5: Consider the toss-of-a-fair-die experiment. The sample space is

S = fs1 , s2 , s3 , s4 , s5 , s6 g:

The random variable X ðÞ is defined on the simple events of S as

1 if i is odd
X ðs i Þ =
-1 if i is even:

The event A is defined as


70 3 Discrete Random Variables

A = si jX ðsi Þ = - 1 :

Write the elements of A explicitly.


Solution 3.5: The random variable function can be explicitly written as

X ðs1 Þ = 1 X ðs2 Þ = - 1 X ðs3 Þ = 1 X ðs4 Þ = - 1 X ðs5 Þ = 1 X ðs6 Þ = - 1:

Since

X ðs2 Þ = - 1 X ðs4 Þ = - 1 X ðs6 Þ = - 1

the event

A = si jX ðsi Þ = - 1

can be explicitly written as

A = fs2 , s4 , s6 g:

Example 3.6: Consider the two independent flips of a fair coin. The sample space
of this experiment can be written as S = {HH, HT, TH, TT}. Let’s define the random
variable X ðÞ on simple events of S as

X ðsi Þ = fnumber of heads in si g

where si is one of the simple events of S. The random variable function can be
explicitly written as

X ðs1 Þ = X ðHH Þ → X ðHH Þ = 2


X ðs2 Þ = X ðHT Þ → X ðHT Þ = 1
X ðs3 Þ = X ðTH Þ → X ðTH Þ = 1
X ðs4 Þ = X ðTT Þ → X ðTT Þ = 0:

We define the events

A = si jX ðsi Þ = 1
B = si jX ðsi Þ = - 1
C = si jX ðsi Þ = 2
D = si jX ðsi Þ = 1 or X ðsi Þ = 0 :

Write the events A, B, C,and D explicitly.


3.2 Defining Events Using Random Variables 71

Solution 3.6: The expression

si jX ðsi Þ = x

means finding those si that satisfy X ðsi Þ = x and using all these si forming an event
of S.
Since

X ðHT Þ = 1 X ðTH Þ = 1

the event

A = si jX ðsi Þ = 1

can be explicitly written as

A = fHT, TH g:

For the event

B = si jX ðsi Þ = - 1

since there is no si satisfying X ðsi Þ = - 1, we can write event B as

B = f g:

In a similar manner, for the event

C = si jX ðsi Þ = 2

since X ðsi Þ = 2 is satisfied for only si = TT, i.e., X ðTT Þ = 2, the event C can be
written as

C = fTT g:

The event

D = si jX ðsi Þ = 1 or X ðsi Þ = 0

can be explicitly written as

D = fHT, TH, TT g

since
72 3 Discrete Random Variables

X ðHT Þ = 1 X ðTH Þ = 1 X ðTT Þ = 0:

The expression

si jX ðsi Þ = x

represents an event, and this representation can be shortly written as

X=x : ð3:3Þ

That is, the mathematical expressions si jX ðsi Þ = x and X = x mean the same
thing, i.e.,

X = x means si jX ðsi Þ = x : ð3:4Þ

The expression

si jX ðsi Þ ≤ x

means making a subset of S, i.e., an event, from those si satisfying X ðsi Þ ≤ x, and the
event

si jX ðsi Þ ≤ x

can also be represented by

X ≤x : ð3:5Þ

Example 3.7: Consider the roll-of-a-die experiment. The sample space of this
experiment can be written as S = {s1, s2, s3, s4, s5, s6}. The random variable X ðÞ
on simple events of S is defined as

X ðsi Þ = 4 × i

which can explicitly be written as

X ðs1 Þ = 4 × 1 → X ðs1 Þ = 4
X ðs2 Þ = 4 × 2 → X ðs2 Þ = 8
X ðs3 Þ = 4 × 3 → X ðs3 Þ = 12
X ðs4 Þ = 4 × 4 → X ðs4 Þ = 16
X ðs5 Þ = 4 × 5 → X ðs5 Þ = 20
X ðs6 Þ = 4 × 6 → X ðs6 Þ = 24:
3.2 Defining Events Using Random Variables 73

The events A, B, C, and D are defined as

A = si jX ðsi Þ ≤ 10
B = si jX ðsi Þ ≤ 14
C = si jX ðsi Þ ≤ 20
D = si jX ðsi Þ ≤ 25 :

Write the events A, B, C, and D explicitly.


Solution 3.7: For the event

A = si jX ðsi Þ ≤ 10

Since the simple events s1, s2 satisfy

X ðs1 Þ = 4 ≤ 10 and X ðs2 Þ = 8 ≤ 10

the event A can be written as

A = fs1 , s2 g:

Proceeding in a similar manner, we can write the events B, C, and D as

B = fs 1 , s 2 , s 3 g C = fs1 , s2 , s3 , s4 , s5 g D = fs1 , s2 , s3 , s4 , s5 , s6 g:

For the easiness of illustration, we can use

X ≤x

for the expression

si jX ðsi Þ ≤ x

i.e.,

X≤x means si jX ðsi Þ ≤ x :

Example 3.8: The range set of the random variable X is given as R = f- 1, 1, 3g:
X
Verify that

S= X = -1 [ X=1 [ X=3 :
74 3 Discrete Random Variables

Fig. 3.3 Random variable


function mapping disjoint
s1
subsets ~
{ X = 3} s.3 3
.
.

s2
~
{ X = 1} s4 1
.
.
.

~ sN
{ X = 1} . 1
.
.

Solution 3.8: The random variable function X ðÞ is defined on the simple outcomes
of the sample space S, and it is a one-to-one function between simple events and real
numbers. The events

X = -1 X =1 X =3

are disjoint events and their union gives S. This is illustrated in Fig. 3.3.
Example 3.9: Consider the two independent tosses of a fair coin. The sample space
of this experiment can be written as S = {HH, HT, TH, TT}. Let’s define the random
variable X ðÞ on simple events of S as

X ðsi Þ = fnumber of heads in si g

where si is one of the simple events of S. The random variable function can be
explicitly written as

X ðs1 Þ = X ðHH Þ → X ðHH Þ = 2


X ðs2 Þ = X ðHT Þ → X ðHT Þ = 1
X ðs3 Þ = X ðTH Þ → X ðTH Þ = 1
X ðs4 Þ = X ðTT Þ → X ðTT Þ = 0:

Show that X = 0 , X = 1 , and X = 2 form a partition of S, i.e.,

S= X =0 [ X =1 [ X =2

and X = 0 , X = 1 , and X = 2 are disjoint events.


3.2 Defining Events Using Random Variables 75

Solution 3.9: The events X = 0 , X = 1 , and X = 2 can be written as

X = 0 = fTT g X = 1 = fHT, TH g X = 2 = fHH g

where it is clear that X = 0 , X = 1 , and X = 2 are disjoint events, i.e.,

X=0 \ X=1 =ϕ

X=0 \ X=2 =ϕ

X=1 \ X=2 =ϕ

X =0 \ X =1 \ X =2 =ϕ

and we have

S= X=0 [ X=1 [ X=2 :

Example 3.10: The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 g:

The random variable X on S is defined as

X ðs1 Þ = - 2 X ðs2 Þ = - 2 X ðs3 Þ = - 2


X ðs4 Þ = 3 X ðs5 Þ = 4 X ð s6 Þ = 4

(a) Find the following events

X= -2 X =3 X =4 :

(b) Are the events X = - 2 , X = 3 , and X = 4 disjoint?


(c) Show that

X = - 2 [ X = 3 [ X = 4 = S:

Solution 3.10:
(a) The events X = - 2 , X = 3 , and X = 4 can be explicitly written as

X = - 2 = fs 1 , s 2 , s 3 g
76 3 Discrete Random Variables

X = 3 = fs4 g

X = 4 = fs5 , s 6 g

(b) Considering the explicit form of the events in part a, it is obvious that the events
X = - 2 , X = 3 , and X = 4 are disjoint of each other, i.e.,

X = -2 \ X=3 =ϕ

X = -2 \ X=4 =ϕ

X=3 \ X=4 =ϕ

X= -2 \ X =3 \ X =4 =ϕ

(c) The union of X = 0 , X = 1 , and X = 2 is found as

X = - 2 [ X = 3 [ X = 4 → fs1 , s2 , s3 g [ fs4 g [ fs5 , s6 g = S

which is the sample space. The partition of the sample space is depicted in
Fig. 3.4.

3.3 Probability Mass Function for Discrete Random


Variables

The probability mass function p(x) for discrete random variable X is defined as

pðxÞ = Prob X = x

where x is a value of the random variable function X ðÞ. The probability mass
function can also be indicated as

p ðxÞ = Prob X = x ð3:6Þ


X

where the subscript of p(x), i.e., X, points to a random variable to which the
probability mass function belongs to. For the easiness of the notation, we will not
use the subscript in the probability mass function expression unless otherwise
indicated.
Let’s illustrate the concept of probability mass function with an example.
3.3 Probability Mass Function for Discrete Random Variables 77

Fig. 3.4 The partition of ~

the sample space for { X = 3}


Example 3.10
~
{ X =  2} s4
s1 , s2 , s3

s5 , s6

~
{ X = 4}

Example 3.11: Consider the experiment, the two independent flips of a fair coin.
The sample space of this experiment can be written as S = {HH, HT, TH, TT}. Let’s
define the random variable X ðÞ on simple events of S as

X ðsi Þ = fnumber of heads in si g

where si is one of the simple events of S. The random variable function can be
explicitly written as

X ðHH Þ = 2 X ðHT Þ = 1 X ðTH Þ = 1 X ðTT Þ = 0:

(a) Write the range set of the random variable X.


(b) Obtain the probability mass function of the discrete random variable X.
Solution 3.11:
(a) Considering the distinct values generated by the random variable, the range set
of the random variable can be written as

R = f0, 1, 2g:
X

(b) The probability mass function of the random variable X is defined as

pðxÞ = Prob X = x

where x takes one of the values from the set R = f0, 1, 2g, i.e., x can be either
X
0, or it can be 1, or it can be 2. We will consider each distinct x value for the
calculation of p(x). For x = 0, the probability mass function p(x) is calculated as
78 3 Discrete Random Variables

pðx= 0Þ = Prob X = 0

where the event X = 0 equals to {TT}, i.e.,

X = 0 = fTT g:

Then, we have

1
pðx= 0Þ = PfTT g → pðx= 0Þ = :
4

For x = 1, the probability mass function p(x) is calculated as

pðx= 1Þ = Prob X = 1

where the event X = 1 equals {HT, TH}, i.e.,

X = 1 = fHT, TH g:

Then, we have

1
pðx= 1Þ = PfHT, TH g → pðx= 1Þ = PfHT g þ PfTH g → pðx= 1Þ = :
2

For x = 2, the probability mass function p(x) is calculated as

pðx= 2Þ = Prob X = 2

where the event X = 2 equals {HH}, i.e.,

X = 2 = fHH g:

Then, we have

1
pðx= 2Þ = PfHH g → pðx= 2Þ = PfHH g → pðx= 2Þ = :
4

Hence, the values of the probability mass function p(x) are found as

1 1 1
pðx= 0Þ = pðx= 1Þ = pðx= 2Þ = :
4 2 4

We can draw the graph of probability mass function p(x) with respect to x as
in Fig. 3.5.
3.3 Probability Mass Function for Discrete Random Variables 79

Fig. 3.5 The probability p( x)


mass function for
Example 3.11
1/ 2

1/4

x
0 1 2

For this example,

1 1 1
pðx= 0Þ þ pðx= 1Þ þ pðx= 2Þ = þ þ → pðx= 0Þ þ pðx= 1Þ þ pðx= 2Þ = 1:
4 2 4

That is,

pðxÞ = 1:
x

Theorem 3.1:
(a) The probability mass function of a discrete random variable X satisfies

pðxÞ = 1: ð3:7Þ
x

Proof 3.1: Let the range set of the random variable X be as

R = fx1 , x2 , x3 g:
X

We know that the events X = x1 , X = x2 , and X = x3 form a partition of


the sample space S. That is,

S = X = x1 [ X = x2 [ X = x3 ð3:8Þ

and X = x1 , X = x2 , and X = x3 are disjoint events. If the probability of (3.8)


is calculated, we get

P ð SÞ = P X = x1 þ P X = x2 þ P X = x3
=1 pðx = x1 Þ pðx = x2 Þ pðx = x3 Þ

which can be written as


80 3 Discrete Random Variables

p ð x = x 1 Þ þ pð x = x 2 Þ þ pð x = x 3 Þ = 1 → pðxÞ = 1:
x

This process can be performed for any range set with any number of elements.

3.4 Cumulative Distribution Function

The cumulative distribution function of a random variable X is defined as

F ðxÞ = Prob X ≤ x

which can also be written as

F ðxÞ = Prob X ≤ x :
X

where x is a real number.


Note that X ≤ x is an event, and F(x) is nothing but the probability of the event
X≤x .
Let the range set of the random variable X be R = fa1 , a2 , ⋯, aN g such that
X
a1 < a2 < ⋯ < aN. To find the cumulative distribution function F(x), we consider
the following steps:
1. We first form the x-intervals on which the cumulative distribution function F(x) is
to be calculated as

- 1 < x < a1
a1 ≤ x < a2
a2 ≤ x < a3

aN - 1 ≤ x < a N
aN ≤ x < 1

2. In step 2, for each decided interval of step 1, we calculate the cumulative


distribution function

F ðxÞ = Prob X ≤ x : ð3:9Þ


3.4 Cumulative Distribution Function 81

Example 3.12: Consider again the experiment, the two independent tosses of a fair
coin. The sample space of this experiment can be written as S = {HH, HT, TH, TT}.
Let’s define the random variable X ðÞ on simple events of S as

X ðsi Þ = fnumber of heads in si g

where si is one of the simple events of S. The random variable function can be
explicitly written as

X ðHH Þ = 2 X ðHT Þ = 1 X ðTH Þ = 1 X ðTT Þ = 0:

Calculate and draw the cumulative distribution function, i.e., F(x), of the random
variable X.
Solution 3.12: The range set of the random variable X can be written as

R = f0, 1, 2g:
X

To draw the cumulative distribution function F(x), we first determine


the x-intervals considering the values in the range set of X. The x-intervals can be
written as

-1<x<0
0≤x<1
1≤x<2
2 ≤ x < 1:

In the second step, we determine the cumulative distribution function, i.e., CDF,
F(x) for the given intervals. To determine the CDF for the given intervals, we can
pick a value for x for the interval under concern and calculate the value of F(x). For
our example, we can proceed as follows:

- 1 < x < 0 → F ðxÞ = Prob X ≤ x → F ðxÞ = Prob X ≤ - 1


→ F ðxÞ = Probfg → F ðxÞ = 0
0 ≤ x < 1 → F ðxÞ = Prob X ≤ x → F ðxÞ = Prob X ≤ 0:5
1
→ F ðxÞ = ProbfTT g → F ðxÞ =
4
1 ≤ x < 2 → F ðxÞ = Prob X ≤ x → F ðxÞ = Prob X ≤ 1:6
3
→ F ðxÞ = ProbfHT, TH, TT g → F ðxÞ =
4
2 ≤ x < 1 → F ðxÞ = Prob X ≤ x → F ðxÞ = Prob X ≤ 2:4
→ F ðxÞ = ProbfHT, TH, TT, HH g → F ðxÞ = 1:
82 3 Discrete Random Variables

Fig. 3.6 The cumulative F (x)


distribution function for
Example 3.12
1

3/ 4

1/ 4

x
0 1 2

Thus, the cumulative distribution function F(x) can be written as

- 1 < x < 0 → F ðxÞ = 0


1
0 ≤ x < 1 → F ð xÞ =
4
3
1 ≤ x < 2 → F ð xÞ =
4
2 ≤ x < 1 → F ðxÞ = 1:

The graph of the CDF can be drawn as shown in Fig. 3.6.


Example 3.13: The range set of the discrete random variable X is given as
R = f3, 7, 12g: Determine the x-intervals on which the cumulative distribution
X
function F(x) is calculated.
Solution 3.13: The value of the cumulative distribution function, i.e., F(x), is
decided on each of the following intervals:

-1<x<3
3≤x<7
7 ≤ x < 12
12 ≤ x < 1:

Property 3.1: The value of the cumulative distribution function

F ðxÞ = Prob X ≤ x

at a point xi can be calculated by employing the probability mass function


3.4 Cumulative Distribution Function 83

pðxÞ = Prob X = x

as

F ð xi Þ = pðxÞ: ð3:10Þ
x ≤ xi

Example 3.14: The range set of a discrete random variable X is given as

R = f1, 2, 3, 4, 5g:
X

Calculate the value of the cumulative distribution function F(x) at x = 3.4 in terms
of its probability mass function p(x).
Solution 3.14: When the cumulative distribution function

F ðxÞ = Prob X ≤ x

is evaluated at x = 3.4, we get

F ð3:4Þ = Prob X ≤ 3:4

where Prob X ≤ 3:4 means the probability of the discrete random variable X
producing values less than 3.4. Since the values produced by discrete random
variable X less than 3.4 are 1, 2,and 3,

F ð3:4Þ = Prob X ≤ 3:4

can be written as

F ð3:4Þ = Prob X = 1 þ Prob X = 2 þ Prob X = 3

in which using the definition of probability mass function

pðxÞ = Prob X = x

we get

F ð3:4Þ = pð1Þ þ pð2Þ þ pð3Þ:


84 3 Discrete Random Variables

The event X ≤ 3:4 can also be considered as the union of the mutually
exclusive events X = 1 , X = 2 , X = 3 , i.e.,

X ≤ 3:4 = X = 1 [ X = 2 [ X = 3

from which we can write

Prob X ≤ 3:4 = Prob X = 1 þ Prob X = 2 þ Prob X = 3

and the rest is as explained before.


Example 3.15: The range set of a discrete random variable X is given as

R = fx1 , x2 , x3 , x4 g where x1 < x2 < x3 < x4 :


X

Show that the value of the cumulative distribution function F(x) on the interval
x3 < x < x4 can be written in terms of the probability mass function values as

F ðxÞ = pðx1 Þ þ pðx2 Þ þ pðx3 Þ:

Solution 3.15: Since the range set of the discrete random variable is given as

R = fx1 , x2 , x3 , x4 g where x1 < x2 < x3 < x4 :


X

The sample space of the random variable can be partitioned as in Fig. 3.7.
Considering Fig. 3.7, the event

X ≤x where x3 < x < x4

can be written as

X ≤ x = X = x1 [ X = x2 [ X = x3 ð3:11Þ

where X = x1 , X = x2 , and X = x3 are disjoint events. Taking the probability


of both sides of (3.11), we get

Prob X ≤ x = Prob X = x1 þ Prob X = x2 þ Prob X = x3

which can be written as

F X ≤ x = pðx1 Þ þ pðx2 Þ þ pðx3 Þ:


3.4 Cumulative Distribution Function 85

Fig. 3.7 Sample space with S


disjoint subsets for Example ~
3.15 X = x1 ~
X = x3

~
X = x2 ~
X = x4

Example 3.16: The range set of a discrete random variable X is given as

RX~ = f- 2:3, - 1:5, 0, 1:4, 2:3, 4:1g:

F(x) and p(x) are the cumulative distribution and probability mass function of the
discrete random variable X, respectively. How do we calculate F(0.5) and F(2) using
probability mass function p(x)?
Solution 3.16: Considering the discussion in the previous example, we can write
F(0.5) and F(2) as

F ð0:5Þ = Prob X ≤ 0:5 → F ð0:5Þ = pð- 2:3Þ þ pð- 1:5Þ þ pð0Þ

F ð2Þ = Prob X ≤ 2 → F ð0:5Þ = pð- 2:3Þ þ pð- 1:5Þ þ pð0Þ þ pð1:4Þ:

Example 3.17: The probability mass function values of a discrete random variable
X is given as

pð- 1Þ = a pð2Þ = 2a pð3:5Þ = 4a:

Write the range set of the random variable and find the value of a.
Solution 3.17: The range set of the random variable X is

R = f- 1, 2, 3:5g:
X

Since

pð x Þ = 1
x

we have

pð- 1Þ þ pð2Þ þ pð3:5Þ = 1 → a þ 2a þ 4a = 1 → a = 1=7:


86 3 Discrete Random Variables

Example 3.18: The probability mass function of a discrete random variable X is


given as

1 2 3
p ð 2Þ = p ð 5Þ = p ð 8Þ = :
6 6 6

Find the range set, and draw the cumulative distribution function of the discrete
random variable X.
Solution 3.18: The range set of the discrete random variable X can be written as

R = f2, 5, 8g:
X

To draw the cumulative distribution function, let’s first write the x-intervals as

-1<x<2
2≤x<5
5≤x<8
8 ≤ x < 1:

In the second step, we calculate the value of the probability distribution function

F ðxÞ = Prob X ≤ x

on the determined intervals. For this purpose, we can select a value on the deter-
mined interval and calculate the value of the probability distribution function on the
concerned interval as follows:

- 1 < x < 2!F ðxÞ ¼ Prob X ≤ x !F ð1Þ ¼ Prob X ≤ 1 !F ðxÞ ¼ 02 ≤ x < 5!F ðxÞ
¼ Prob X ≤ x !F ð3Þ ¼ Prob X ≤ 3 !F ðxÞ ¼ pð2Þ5 ≤ x < 8!F ðxÞ ¼ Prob X ≤ x !F ð6Þ
¼ Prob X ≤ 6 !F ðxÞ ¼ pð2Þ þ pð5Þ8 ≤ x < 1!F ðxÞ ¼ Prob X ≤ x !F ð9Þ
¼ Prob X ≤ 9 !F ðxÞ ¼ pð2Þ þ pð5Þ þ pð8Þ:

Hence, we have

- 1 < x < 2 → F ðxÞ = 0


1
2 ≤ x < 5 → F ð xÞ =
6
3
5 ≤ x < 8 → F ð xÞ =
6
6
8 ≤ x < 1 → F ð xÞ = :
6
3.5 Expected Value (Mean Value), Variance, and Standard Deviation 87

Fig. 3.8 The graph of the F (x)


cumulative distribution
function F(x) for
Example 3.18 6/6

3/ 6

1/ 6
x
2 5 8

The graph of the cumulative distribution function F(x) happens to be as in Fig. 3.8
Exercise: A fair and a biased coin are tossed together. For the fair coin, we have

1 1
ProbðH Þ = ProbðT Þ = :
2 2

For the biased coin, we have

2 1
ProbðH b Þ = ProbðT b Þ = :
3 3

(a) Find the sample space of the combined experiment.


(b) The random variable function X for the combined experiment is defined as

X ðsi Þ = f2  number of fair heads in si - number of biased tails in si g:

(c) Find the probability density function p(x) and cumulative distribution function
F(x) of the discrete random variable X ðÞ, and draw the graphs of p(x) and F(x).

3.5 Expected Value (Mean Value), Variance, and Standard


Deviation

Expected value and variance are two important parameters of a random variable.
Expected or mean value is also called the probabilistic average.

3.5.1 Expected Value

Before introducing the fundamental formulas for mean and variance calculations,
let’s consider the average value calculations via some examples. Assume that we
have a digit generator machine and the generated digits are 1, 2, and 6. Besides, each
88 3 Discrete Random Variables

digit has the same probability of occurrence. Let’s say that 60 digits are generated,
i.e., each digit is generated 20 times. The arithmetic average of the generated digit
sequence can be calculated as

20 × 1 þ 20 × 2 þ 20 × 6 1þ2þ6
→ → 3: ð3:12Þ
60 3

Now assume that the probability of occurrence of digits 1 and 2 are equal to each
other and it equals half of the probability of occurrence of digit 6. In this case, out of
60 generated digits, 15 of them are 1, the other 15 of them are 2, and 30 of them are
6. The arithmetic average of the digit sequence can be calculated as

15 × 1 þ 15 × 2 þ 30 × 6 1 1 1
→ × 1 þ × 2 þ × 6 → 3:75: ð3:13Þ
60 4 4 2

When (3.12) and (3.13) are compared to each other, we see in (3.13) we have a
larger number. This is due to the higher probability of occurrence of digit 6. Now it is
time to state the expected value concept.
The probabilistic average value, i.e., expected or mean value, of a discrete
random variable X with probability mass function p(x) is calculated using

E X = xpðxÞ
x

which can also be shown as

m= xpðxÞ: ð3:14Þ
x

If the range set of the discrete random variable X contains N values, and if the
probability of occurrence of values in the range set is equal to 1/N, then the mean
value expression in (3.14) happens to be the arithmetic average expression, i.e., if
p(x) = 1/N, then we get

x 1
m= →m= x:
x N N x

3.5.2 Variance

If the variance of a random variable is a small number, it means that the generated
values are close to the mean value of the random variable; on the other hand, if the
variance of a random variable is a large number, then it means that the spread of the
3.5 Expected Value (Mean Value), Variance, and Standard Deviation 89

generated values is very wide and the generated values are neither close to each other
nor close to the mean value.
Example 3.19: Assume that the sequences

v1 = ½1 2 3 2 1 3 1 4  v2 = ½- 10 12 3 0 87 34 5 - 2 

are generated by two different random variables X 1 , X 2 . Compare the variances of


X 1 and X 2 .
Solution 3.19: Comparing the sequences v1 and v2 , we can write that

Var X 1 < Var X 2 :

Now let’s state the variance formula. The variance of a discrete random variable X
with mean value m and probability mass function p(x) is calculated as

2
Var X = E X - m2

where

2
E X = x2 pðxÞ:
x

The variance of a discrete random variable X can also be found using

2
Var X = E X - m

whose explicit form can be written as

Var X = ðx - mÞ2 pðxÞ:


x

3.5.3 Standard Deviation

The standard deviation of a random variable is nothing but the square root of its
variance, i.e.,

Standard deviation of X = Variance of X

which can be written as


90 3 Discrete Random Variables

σ= Var X : ð3:15Þ

Example 3.20: The range set of a discrete random variable X is given as

R = f- 1, 0, 2, 3g:
X

The probability mass function p(x) of the discrete random variable X is given as

2 1 2 2 1
pðx = - 1Þ = pðx = 0Þ = pðx = 1Þ = pðx = 2Þ = pðx = 3Þ = :
8 8 8 8 8

Find the mean value, i.e., probabilistic average or expected value, variance, and
standard deviation of the discrete random variable X.
Solution 3.20: The mean value of the discrete random variable X is calculated as

E X = xpðxÞ →
x

E X = - 1 × pð- 1Þ þ 0 × pð0Þ þ 1 × pð1Þ þ 2 × pð2Þ þ 3 × pð3Þ →


2 1 2 2 1
E X = -1× þ0× þ 1× þ2× þ 3× →
8 8 8 8 8

leading to

7
E X =m= :
8

The variance of the random variable X can be calculated using

Var X = x2 pðxÞ - m2 ð3:16Þ


x

where

x2 pðxÞ
x
3.6 Expected Value and Variance of Functions of a Random Variable 91

is computed as

2 1 2 2 1
x 2 p ð x Þ = ð - 1Þ 2 × þ ð0Þ2 × þ ð 1Þ 2 × þ ð 2Þ 2 × þ ð3Þ2 ×
x 8 8 8 8 8

resulting in

21
x2 pð x Þ = : ð3:17Þ
x 8

Using (3.17) in (3.16), we get

2
21 7 119
Var X = - → Var X = → Var X ≈ 1:86:
8 8 64

Since standard deviation is nothing but the square root of the variance, we can get
it as
p
119
σ= Var X → σ = → σ ≈ 1:36:
8

The variance of the random variable can also be calculated using

Var X = ðx - mÞ2 pðxÞ


x

as

7 2 2 7 2 1 7 2
2
Var X = - 1 - × þ 0- × þ 1- ×
8 8 8 8 8 8
7 2 2 7 2 1
þ 2- × þ 3- ×
8 8 8 8

leading to the same result

Var X ≈ 1:86:

3.6 Expected Value and Variance of Functions


of a Random Variable

Let X be a discrete random variable, and g X is a function of this random variable.


The function of a random variable is another random variable. The mean or
expected, i.e., probabilistic average, value of g X is calculated using
92 3 Discrete Random Variables

E g X = gðxÞpðxÞ
x

which is usually denoted by m, i.e.,

m= gðxÞpðxÞ:
x

The variance of g X is computed using

2
Var g X =E g X - m2

where

2
E g X = ½gðxÞ2 pðxÞ:
x

The variance of a discrete random variable X can also be found using

Var g X = E ½gðxÞ - m2

which is computed as

Var g X = ½gðxÞ - m2 pðxÞ:


x

Example 3.21: The range set of a discrete random variable X is given as

R = f- 1, 1, 2g:
X

The probability mass function p(x) of the discrete random variable X is specified
as

1 1 1
pðx= - 1Þ = pðx= 1Þ = pðx= 2Þ = :
4 4 2

Find the following

3 2 2
ð aÞ E X ð bÞ E X - 1 ð cÞ E X þ 1 :
3.6 Expected Value and Variance of Functions of a Random Variable 93

Solution 3.21: We know that

E g X = gðxÞpðxÞ:
x

(a) For

3
g X =X

we calculate E g X as

3
E g X = gðxÞpðxÞ → E X = x3 pð x Þ
x x

leading to

3 3 1 1 1
E X = x3 pð x Þ → E X = ð - 1Þ 3 × þ ð 1Þ 3 × þ ð 2Þ 3 ×
x
4 4 2

resulting in

3
E X = 4:

(b) For

2
g X =X -1

we calculate E g X as

2
E g X = gðxÞpðxÞ → E X - 1 = x2- 1 pðxÞ
x x

leading to

2
E X -1 = x2- 1 pðxÞ →
x

2 1 1 1
E X - 1 = ð - 1Þ2 - 1 × þ 12- 1 × þ 22- 1 × →
4 4 2

resulting in
94 3 Discrete Random Variables

2 3
E X -1 = :
2

(c) For

2
g X =X þ 1

we calculate E g X as

2
E g X = gðxÞpðxÞ → E X þ 1 = x2 þ 1 pð x Þ
x x

leading to

2
E X þ1 = x2 þ 1 pð x Þ →
x

2 1 1 1
E X þ 1 = ð - 1Þ 2 þ 1 × þ 12 þ 1 × þ 22 þ 1 × →
4 4 2

resulting in

2 7
E X -1 = :
2

Example 3.22: The range set of a discrete random variable X is given as

R = f- 1, 1, 2g:
X

The probability mass function p(x) of the discrete random variable X is specified
as

1 1 1
pð x = - 1Þ = pð x = 1Þ = pð x = 2Þ = :
4 4 2

Find the variance of 2X þ 1.


Solution 3.22: First, let’s calculate the mean value of 2X þ 1. For

g X = 2X þ 1

we calculate E g X as
3.6 Expected Value and Variance of Functions of a Random Variable 95

E g X = gðxÞpðxÞ → E 2X þ 1 = ð2x þ 1ÞpðxÞ


x x

leading to

E 2X þ 1 = ð2x þ 1ÞpðxÞ →
x

1 1 1
E 2X þ 1 = ð2 × ð- 1Þ þ 1Þ × þ ð 2 × 1 þ 1Þ × þ ð2 × 2 þ 1Þ × →
4 4 2

resulting in

12
E 2X þ 1 = → E 2X þ 1 = 3:
4

We know that

2
Var g X =E g X - m2

where

2
E g X = ½gðxÞ2 pðxÞ:
x

For

g X = 2X þ 1

2
we calculate E g X as

2 2
E g X = ½gðxÞ2 pðxÞ → E 2X þ 1 = ð2x þ 1Þ2 pðxÞ →
x x

2 1 1 1
E 2X þ 1 = ð2 × ð - 1Þ þ 1Þ2 × þ ð2 × 1 þ 1Þ2 × þ ð2 × 2 þ 1Þ2 ×
4 4 2

resulting in

2 60
E 2X þ 1 = :
4
96 3 Discrete Random Variables

Finally, the variance can be calculated as

2 60
Var 2X þ 1 = E 2X þ 1 - m2 → Var 2X þ 1 = - 32 → Var 2X þ 1 = 6:
4

Example 3.23: The probability mass function of the discrete random variable X is
given as

0:1 for x = 0:3


0:2 for x = 0:5
pð x Þ = 0:2 for x = 0:7
0:2 for x = 0:9
0:3 for x = 1:5:

(a) Find the range set R :


X
(b) Find Prob X ≤ 0:6 :
(c) Find Prob X ≥ 0:7 :
(d) Find Prob 0:4 ≤ X ≤ 1:1 :
(e) Find Prob X ≥ 0:5 j X < 0:9 .
Solution 3.23:
(a) The range set is R = f0:3, 0:5, 0:7, 0:9, 1:5g.
X
Prob X ≤ 0:6 = Prob X = 0:5 þ Prob X = 0:3 →
(b)
Prob X ≤ 0:6 = pð0:5Þ þ pð0:3Þ → Prob X ≤ 0:6 = 0:3:
Prob X ≥ 0:7 = Prob X = 0:7 þ Prob X = 0:9 þ Prob X = 1:5 →
(c) Prob X ≥ 0:7 = pð0:7Þ þ pð0:9Þ þ pð1:5Þ →
Prob X ≥ 0:7 = 0:2 þ 0:2 þ 0:3 → Prob X ≥ 0:7 = 0:7:
Prob 0:4 ≤ X ≤ 1:1 = Prob X = 0:5 þ Prob X = 0:7 þ Prob X = 0:9 →
(d) Prob 0:4 ≤ X ≤ 1:1 = pð0:5Þ þ pð0:7Þ þ pð0:9Þ →
Prob 0:4 ≤ X ≤ 1:1 = 0:2 þ 0:2 þ 0:2 → Prob 0:4 ≤ X ≤ 1:1 = 0:6:
ProbðfX~ ≥ 0:5g\fX~ < 0:9gÞ
Prob X~ ≥ 0:5 j X~ < 0:9 = ProbðfX~ < 0:9gÞ

(e) Prob X~ = 0:5 þProb X~ = 0:7


=
Prob X~ = 0:3 þProb X~ = 0:5 þProb X~ = 0:7
= 0:1þ0:2þ0:2 = 5:
0:2þ0:2 4

Example 3.24: Show that the right-hand side of

2
Var X = E X -m

2
equals E X - m2 :
3.6 Expected Value and Variance of Functions of a Random Variable 97

Proof 3.24: If we expand the square expression in

2
E X-m

we get

2
E X - 2mX þ m2

which can be evaluated as

2
E X - 2mX þ m2 = x
x2- 2mx þ m2 pðxÞ ð3:18Þ

Expanding the right-hand side of (3.18), we obtain

E X~ 2 - 2mX~ þ m2 = x2 - 2mx þ m2 pðxÞ


x
= x2 pðxÞ - 2m xpðxÞ þ m2 pð x Þ
x x x
= E X~ 2 - 2mE X~ þ m2
= E X~ 2 - 2m × m þ m2
= E X~ 2 - m2 :

Thus, we showed that

2 2
E X-m =E X - m2 :

Example 3.25: The probability mass function of a random variable is given as

1
x= -2
4
1
pðxÞ = x=0
2
1
x=3 :
4

Find E X , Var X , and σ, i.e., standard deviation.


98 3 Discrete Random Variables

Solution 3.25: The mean value is calculated as

1 1 1 1
E X = xpðxÞ → E X = - 2 × þ 0× þ 3× →E X = :
x 4 2 4 4

The variance can be calculated using

2 2
Var X = E X-m →E X-m = ð x - m Þ 2 pð x Þ
x

leading to

X~ - m
2
E = ð x - m Þ 2 pð x Þ →
x
2 2
2 1 1 1 1 1
= - 2 - 14 × þ 0- × þ 3- ×
4 4 2 4 4
204
= :
16

Standard deviation σ is nothing but the square root of the variance, then we have

204
σ= → σ ≈ 3:57:
16

Property 3.2: X is a random variable and mx = E X . If Y = aX þ b, then we have

E Y = aE X þ b: ð3:19Þ

Proof: Let E X = mx : Since

E g X = gðxÞpðxÞ
x

for Y = aX þ b, we have

E Y~ = E aX~ þ b
= ðax þ bÞpðxÞ
x
=a xpðxÞ þ b pð x Þ
x x
= aE X~ þ b
= amx þ b:

Thus, we obtained
3.7 Some Well-Known Discrete Random Variables in Mathematic Literature 99

my = amx þ b:

Property 3.3: X is a random variable and σ 2 = Var X . If Y = aX or Y = aX þ b,


then we have

Var Y = a2 Var X :

Proof: Let E X = mx : If Y = aX þ b, then

my = amx þ b:

The variance of Y can be calculated using

2
Var Y = E Y - my

in which substituting

my = amx þ b

we get

Var Y~ = E Y~ - amx - b
2

= ðax þ b - amx - bÞ2 pðxÞ


x
= a2 ðx - mx Þ2 pðxÞ
x
= a2 Var X~ :

Hence, we showed that

Var Y = a2 Var X : ð3:20Þ

3.7 Some Well-Known Discrete Random Variables


in Mathematic Literature

We can define a countless number of random variables. However, in the literature,


some of the random variables are used in practical systems, and private names are
assigned to these random variables. The same names are also used for the probability
mass functions of these specific random variables. In this sub-section, we will
explain some of these specific discrete random variables.
100 3 Discrete Random Variables

3.7.1 Binomial Random Variable

Let X be a discrete random variable and x be an integer such that x 2 {0, 1, ⋯, N},
i.e., x is an integer taking values from the integer set {0, 1, ⋯, N}. If the random
variable X has the probability mass function

N x
pð x Þ = p ð1 - pÞN - x , 0 ≤ p ≤ 1 x = 0, 1, ⋯, N ð3:21Þ
x

then X is called a binomial random variable with parameters N and p. The mass
function p(x) is called binomial distribution or binomial probability mass function.
Since

pð x Þ = 1 ð3:22Þ
x

if we substitute (3.21) into (3.22), we get

N
px ð1 - pÞN - x = 1: ð3:23Þ
x x

The graphs of the binomial distribution, i.e., binomial probability mass function
p(x), for N = 80, p = 0.1, p = 0.5, and p = 0.9 are drawn in Fig. 3.9.

3.7.2 Geometric Random Variable

Let X be a discrete random variable and x be an integer such that x 2 {0, 1, ⋯, 1},
i.e., x is non-negative integer. If the random variable X has the probability mass
function

pðxÞ = ð1 - pÞx - 1 p, 0 ≤ p ≤ 1 x2ℕ ð3:24Þ

then X is called a geometric random variable with parameter p. The mass function
p(x) is called geometric distribution or geometric probability mass function.
The graphs of the geometric distribution, i.e., geometric probability mass function
p(x), for p = 0.2, p = 0.4, and N = 20 are drawn in Fig. 3.10.
3.7 Some Well-Known Discrete Random Variables in Mathematic Literature 101

Fig. 3.9 Binomial distribution for N = 80, p = 0.1, p = 0.5, and p = 0.9.

3.7.3 Poisson Random Variable

Let X be a discrete random variable and x be an integer such that x 2 {0, 1, ⋯, 1},
i.e., x is non-negative integer. If the random variable X has the probability mass
function

λx
pð x Þ = e - λ , x2ℕ ð3:25Þ
x!

then X is called a Poisson random variable with parameter λ. The mass function p(x)
is called the Poisson distribution or Poisson probability mass function.
The graphs of the Poisson distributions, i.e., Poisson probability mass functions,
for λ = 4 and λ = 10 are drawn in Fig. 3.11.
102 3 Discrete Random Variables

Fig. 3.10 Geometric distribution for p = 0.2, p = 0.4, and N = 20

Fig. 3.11 Poisson distribution for λ = 1/2

3.7.4 Bernoulli Random Variable

Let X be a discrete random variable and x be an integer such that x 2 {0, 1}. If the
random variable X has the probability mass function

p if x = 1
pð xÞ = ð3:26Þ
1 - p if x = 0

then X is called a Bernoulli random variable with parameter p. The mass function
p(x) is called the Bernoulli distribution or Bernoulli probability mass function.
3.7 Some Well-Known Discrete Random Variables in Mathematic Literature 103

3.7.5 Discrete Uniform Random Variable

Let X be a discrete random variable and x be an integer such that x 2 {k, k + 1, ⋯,


k + N - 1}. If the random variable X has the probability mass function

1
if x 2 fk, k þ 1, ⋯, k þ N - 1g
pð x Þ = N ð3:27Þ
0 otherwise

then X is called a discrete uniform random variable with parameters k and N. The
mass function p(x) is called discrete uniform distribution or discrete uniform prob-
ability mass function.
Example 3.26: Calculate the mean and variance of the Bernoulli random variable.
Solution 3.26: We can calculate the mean value of the Bernoulli random variable X
using its probability mass function

p if x = 1
pð x Þ =
1-p if x = 0

as

E X = xpðxÞ → E X = 1 × p þ 0 × ð1- pÞ → E X = p:
x

The variance of the Bernoulli random variable X can be calculated using

2 2
Var X = E X - E X

2
where E X is computed as

2 2 2
E X = x2 pð x Þ → E X = 12 × p þ 02 × ð1- pÞ → E X = p:
x

Then, variance is found as

Var X = p - p2 → Var X = pð1- pÞ:


104 3 Discrete Random Variables

Example 3.27: The probability mass function of a discrete uniform random vari-
able X is given as

1
if - 2 ≤ x ≤ 3
pðxÞ = 6
0 otherwise:

Find the mean and variance of the discrete uniform random variable X.
Solution 3.27: The mean value can be calculated as

1 1 1 1 1 1
E X = xpðxÞ → E X = - 2 × þ ð- 1Þ × þ 0 × þ 1 × þ 2 × þ 3 ×
x 6 6 6 6 6 6

leading to

1 3
E X = ð- 2 - 1 þ 0 þ 1 þ 2 þ 3Þ × →E X = :
6 6
2
For the variance calculation, we first compute E X as

2 2 1 2 19
E X = x2 pðxÞ → E X = ð - 2Þ2 þ ð - 1Þ2 þ 02 þ 12 þ 22 þ 32 × →E X = :
x 6 6

Then, variance is found as

2 2 19 1 35
Var X = E X - E X → Var X = - → Var X = :
6 4 12

Example 3.28: Calculate the mean value of the Poisson random variable.
Solution 3.28: We can calculate the mean value of the Poisson random variable X
using its probability mass function

λx
pð x Þ = e - λ , x2ℕ
x!

as
Problems 105

E X = xpðxÞ
x
1 λx
= xe - λ
x=0 x!
1 λ x
= xe - λ
x=1 x!
1 λx
= e-λ
x=1 ðx - 1Þ!
1
λx - 1
= λe - λ
x = 1 ðx - 1Þ!
1
-λ λm
= λe m=x-1
m = 0 m!
1
λm
= λe - λ
m=0
m!

= λe - λ eλ

Thus, the mean value of the Poisson random variable is λ, i.e.,

E X = λ: ð3:28Þ

Problems

1. A fair three-sided die is tossed twice. Write the sample space for the combined
experiment. Let si be a simple outcome of the experiment, i.e., si denotes the
integer pairs 11, 12, 13, 21, ⋯, 33. The random variable function X ~ for the
simple events is defined as

~ ðsi Þ = fsum of integers in si mod 3g:


X

(a) Draw the graph of the random variable function.


(b) Write the following events explicitly

~ ð si Þ = 0
A = si j X ~ ð si Þ = 1
B = si j X ~ ðs i Þ = 2
C = si j X ~ ðs i Þ ≤ 1 :
D = si j X

(c) Verify that the events defined in part b are disjoint events and they make a
partition of the sample space, i.e., they are disjoint and S = A [ B [ C.
106 3 Discrete Random Variables

2. Sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g:

~ on S is defined as
The random variable X

~ ðs 1 Þ = - 1
X ~ ðs2 Þ = 0
X ~ ðs 3 Þ = 1
X ~ ðs 4 Þ = 0
X
~ ðs5 Þ = 1
X ~ ðs6 Þ = 0
X ~ ðs7 Þ = - 1
X ~ ðs 8 Þ = 1
X

(a) Find the following events

~ = -1
X ~ =0
X ~ =1 :
X

~ = -1 ,
(b) Are the events X ~ = 0 , and X
X ~ = 1 disjoint?
(c) Show that

~ = -1 [ X
X ~ =0 [ X
~ = 1 = S:

3. Sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g:

~ on S is defined as
The random variable X

~ ðs 1 Þ = - 2
X ~ ðs2 Þ = 1
X ~ ðs 3 Þ = 1
X ~ ðs 4 Þ = 2
X
~ ðs 5 Þ = 2
X ~ ðs6 Þ = 1
X ~ ðs7 Þ = - 2
X ~ ðs8 Þ = - 2:
X

(a) Write the range set of the random variable.


(b) Calculate and draw the probability mass function p(x) of the discrete random
~
variable X.

~ is given as
4. The range set of a discrete random variable X

RX~ = f- 1, 0, 2g:

The events A, B, C, D are defined as

~ = -1
A= X ~ =0
B= X ~ =2
C= X D = A [ B [ C:
Problems 107

Calculate the probabilities

PðAjBÞ PðAjDÞ PðBjC Þ PðDÞ:

5. A fair coin is flipped and a three-sided fair die is tossed at the same time. Let si
be a simple outcome of the combined experiment. The random variable X ~ for the
simple events of the sample space is defined as

ð - 1 þ the integer value in si Þ if si contains a head


X ð si Þ =
ð1 þ the integer value in si Þ mod 4 if si contains a tail:

~
(a) Write the range set of the random variable X.
~
(b) Calculate and draw the probability mass function of X.

~ is given as
6. The range set of a discrete random variable X

RX~ = f- 1, 0, 1, 3, 7g:

Write the x-intervals for which the cumulative distribution function F(x) is
calculated.
~ is given as
7. The probability mass function p(x) of a discrete random variable X

pðx= - 2Þ = a pðx= 0Þ = 2a pðx= 1Þ = 2a pðx= 4Þ = a:

(a) Find the value of a.


(b) ~
Write the range set of the random variable X.
(c) Draw the graph of p(x).
(d) Calculate the cumulative distribution function F(x) of the random variable
~ and draw it.
X,

~ is
8. The graph of the probability mass function of a discrete random variable X
depicted in Fig. 3P.1.
(a) Write the range set of the random variable.
(b) Calculate and draw the cumulative distribution function F(x).
~
(c) Calculate the mean value, variance, and standard deviation of X:

9. The graph of the cumulative distribution function of a discrete random variable


~ is depicted in Fig. 3P.2.
X
(a) Write the range set of the random variable.
(b) Calculate and draw the probability mass function p(x).
~
(c) Calculate the mean value, variance, and standard deviation of X:
108 3 Discrete Random Variables

Fig. 3P.1 Probability mass p(x)


function of a discrete
random variable 1.5 a

x
2 1 0 1 2

Fig. 3P.2 Cumulative F (x )


distribution function of a
discrete random variable
1

3/8
1/ 4

x
1 0 1 2

~ is given as
10. The probability mass function p(x) of a discrete random variable X

pðx= - 1Þ = a pðx= 1Þ = a pðx= 2Þ = 2a:

~ is defined as Y~ = 2X
A function of X ~ þ 2.

(a) Find the value of a.


(b) ~
Write the range set of the random variable X.
(c) ~
Write the range set of Y.
(d) Find the probability mass function of Y~ and draw it.
(e) Calculate the mean value and variance of Y.~

~ equals 2.5.
11. The variance of the discrete random variable X
(a) Find the variance of Y~ = 2X:
~
~ ~
(b) Find the variance of Y = 2X þ 1:
Problems 109

Fig. 3P.3 Probability mass p( x)


function of a discrete
random variable
2.5 a

2a

x
2 1 0 1 2

12. Write the distribution functions of geometric, binomial, and Poisson discrete
random variables.
13. A uniform discrete random variable is defined in the integer interval [-3, -2,
⋯, 4, 5]. Find the mean value and variance of this uniform random variable.
~ is depicted in
14. The probability mass function p(x) of a discrete random variable X
Fig. 3P.3. Without mathematically calculating the mean value of this random
variable, decide whether the mean value is a positive or negative number.
Chapter 4
Functions of Random Variables

4.1 Probability Mass Function for Functions of a Discrete


Random Variable

Let X ~ be a discrete random variable, and Y~ = g X ~ be a function of X. ~ If the


probability mass function of X ~ is px(x), then the probability mass function of Y,
~
i.e., py( y), can be calculated from px(x) using

p y ð yÞ = p ðxÞ:
fxjy = gðxÞg x
ð4:1Þ

Equation (4.1) can be calculated in two different ways.


In the first method, if RX~ is given, for each x value in the range set RX~ , calculate
y = g(x) and evaluate py( y) using (4.1).
In the second approach, we solve the equation y = g(x) for x in terms of y, and use
(4.1) for the evaluation of py( y).
~ the probability mass function
Example 4.1: For the discrete random variable X,
px(x) is given as

4
x= -1
8
1
px ð xÞ = x=0
8
3
x = 1:
8

If Y~ = X
~ , determine the probability mass function of Y,
~ i.e., py( y) = ?
2

Solution 4.1: If Y~ = g X
~ , the relation between probability mass functions of X
~ and
~
Y is given as

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 111
O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_4
112 4 Functions of Random Variables

Fig. 4.1 Probability mass p y ( y)


function py( y)

7/8

1/8

y
0 1

py ð y Þ = p ð xÞ
fxjy = gðxÞg x

where using y = x2, we get

py ðyÞ = p ð xÞ
fxjy = x2 g x

which can be calculated for the given px(x) as

x = - 1 → y = x2 → y = 1
x = 0 → y = x2 → y = 0
x = 1 → y = x2 → y = 1
4 3 7
py ðy = 1Þ = px ðx = - 1Þ þ px ðx = 1Þ → py ðy = 1Þ = þ → py ðy = 1Þ =
8 8 8
1
py ð y = 0 Þ = p x ð x = 0 Þ → p y ð y = 0 Þ = :
8

The graph of the probability mass function py( y) is shown in Fig. 4.1.
Example 4.2: For the discrete random variable X, ~ the probability mass function is
~ ~ 2
~ i.e., py( y), in terms of
px(x). If Y = X , determine the probability mass function of Y,
px(x).
~ and Y~ is given as
Solution 4.2: Since the relation between random variables X

Y~ = X
~2

we first consider the values x and y generated by these random variables and solve
the equation
4.2 Joint Probability Mass Function 113

y = x2

for x. The solution is


p
x = ± y:

According to (4.1), py( y) is calculated as


p p
py ðyÞ = px y þ px - y :

~ the probability mass function px(x) is


Exercise: For the discrete random variable X,
given as

1
px ð x Þ = -1≤x≤2
4
0 otherwise:

If Y~ = X ~ i.e., py( y) = ?
~ , determine the probability mass function of Y,

Exercise: For the discrete random variable X,~ the probability mass function is px(x).
If Y~ = X ~ i.e., py( y), in terms of px(x).
~ , determine the probability mass function of Y,
3

4.2 Joint Probability Mass Function

For a given experiment, let S be the sample space of the experiment, and on this
sample space, let’s define two random variables X ~ and Y.
~ Consider the events
X~ = x and Y~ = y . The intersection of these events is

~ = x \ Y~ = y which means si j X
X ~ ðsi Þ = x \ si j Y~ ðsi Þ = y :

~ and Y~ is
The joint probability mass function for the discrete random variables X
defined as

~ = x \ Y~ = y
pðx, yÞ = Prob X ð4:2Þ

which can also be written as

~ = x, Y~ = y
pðx, yÞ = Prob X ð4:3Þ

or as
114 4 Functions of Random Variables

~ = x and Y~ = y :
pðx, yÞ = Prob X ð4:4Þ

Example 4.3: For the two tosses of a coin experiment, the sample space is

S = fHH, HT, TH, TT g:

~ and Y~ as
Let’s define the discrete random variables X

~ ðsi Þ = f2 × number of heads in si - 1g


X Y~ ðsi Þ = f2 × number of tails in si - 1g

then we have,

~ ðHH Þ = 3
X ~ ðHT Þ = 1
X ~ ðTH Þ = 1
X ~ ðTT Þ = - 1
X
Y~ ðHH Þ = - 1 Y~ ðHT Þ = 1 Y~ ðTH Þ = 1 Y~ ðTT Þ = 3:

The range sets of these two random variables are

RX~ = f- 1, 1, 3g RY~ = f- 1, 1, 3g:

~ and Y~ can be
The joint probability mass function p(x, y) of the random variables X
calculated considering all possible values of (x, y) pairs as

~ = x, Y~ = y →
pðx, yÞ = Prob X
x = - 1, y = - 1 → pðx = - 1, y = - 1Þ = Prob X~ = - 1, Y~ = - 1 →
pðx = - 1, y = - 1Þ = ProbðfTT g \ fHH gÞ → pðx = - 1, y = - 1Þ = ProbðϕÞ →
pðx = - 1, y = - 1Þ = 0
x = - 1, y = 1 → pðx = - 1, y = 1Þ = Prob X~ = - 1, Y~ = 1 →
pðx = - 1, y = 1Þ = ProbðfTT g \ fHT, TH gÞ → pðx = - 1, y = 1Þ = ProbðϕÞ →
pðx = - 1, y = 1Þ = 0
~ = - 1, Y~ = 3 →
x = - 1, y = 3 → pðx = - 1, y = 3Þ = Prob X
pðx = - 1, y = 3Þ = ProbðfTT g \ fTT gÞ → pðx = - 1, y = 3Þ = ProbðfTT gÞ →
1
pðx = - 1, y = 3Þ =
4
~ = 1, Y~ = - 1 →
x = 1, y = - 1 → pðx = 1, y = - 1Þ = Prob X
pðx = 1, y = - 1Þ = ProbðfHT, TH g \ fHH gÞ → pðx = 1, y = - 1Þ = ProbðϕÞ →
pðx = 1, y = - 1Þ = 0
4.2 Joint Probability Mass Function 115

x=1,y=1→pðx=1,y=1Þ=Prob X ~ =1, Y~ =1 →
pðx=1,y=1Þ=ProbðfHT,TH g\ fHT,TH gÞ→pðx=1,y=1Þ=ProbðfHT,TH gÞ →
2
pðx=1,y=1Þ=
4
~ = 1, Y~ = 3 →
x = 1, y = 3 → pðx = 1, y = 3Þ = Prob X
pðx = 1, y = 3Þ = ProbðfHT, TH g \ fTT gÞ → pðx = 1, y = 3Þ = ProbðϕÞ →
pðx = 1, y = 3Þ = 0
x = 3, y = - 1 → pðx = 3, y = - 1Þ = Prob X ~ = 3, Y~ = - 1 →
pðx = 3, y = - 1Þ = ProbðfHH g \ fHH gÞ → pðx = 3, y = - 1Þ = ProbðfHH gÞ →
1
pðx = 3, y = - 1Þ =
4
~ = 3, Y~ = 1 →
x = 3, y = 1 → pðx = 3, y = 1Þ = Prob X
pðx = 3, y = 1Þ = ProbðfHH g \ fHT, TH gÞ → pðx = 3, y = 1Þ = ProbðϕÞ →
pðx = 3, y = 1Þ = 0
~ = 3, Y~ = 3 →
x = 3, y = 3 → pðx = 3, y = 3Þ = Prob X
pðx = 3, y = 3Þ = ProbðfHH g \ fTT gÞ → pðx = 3, y = 3Þ = ProbðϕÞ →
pðx = 3, y = 3Þ = 0

Thus, considering all the calculated values, we can write p(x, y) as

1
pðx= - 1, y = - 1Þ = 0 pðx= - 1, y = 1Þ = 0 pðx= - 1, y = 3Þ =
4
2
pðx= 1, y = - 1Þ = 0 pðx= 1, y = 1Þ = pðx= 1, y = 3Þ = 0
4
1
pðx= 3, y = - 1Þ = pðx= 3, y = 1Þ = 0 pðx= 3, y = 3Þ = 0:
4

Example 4.4: Sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 g

~ and Y~ are defined as


and the discrete random variables X

~ ðs 1 Þ = 1
X ~ ðs2 Þ = 1
X ~ ðs 3 Þ = 2
X ~ ðs 4 Þ = 2
X
Y~ ðs1 Þ = - 1 Y~ ðs2 Þ = 0 Y~ ðs3 Þ = - 1 Y~ ðs4 Þ = 0:

Find the following:


~ = 1 , Y~ = - 1
(a) X
~ = 1, Y~ = - 1
(b) X
116 4 Functions of Random Variables

~ =1
(c) Prob X
(d) Prob X = 1, Y~ = - 1
~

Solution 4.4:
~ =x
(a) Remembering that X ~ ðsi Þ = x , we can find X
means si j X ~ =1 and
Y~ = - 1 as

~ = 1 = fs1 , s2 g
X Y~ = - 1 = fs1 , s3 g

(b) Knowing that X ~ = x, Y~ = y means ~ = x \ Y~ = y , we can calculate


X
~ = 1, Y~ = - 1 as
X

X = 1, Y = - 1 = X=1 \ Y = -1
= ff s 1 , s 2 g \ fs 1 , s 3 g g
= fs1 g:

Thus, we have

~ = 1, Y~ = - 1 = fs1 g
X

~ = 1 as
(c) Using the result of part-a, we can calculate Prob X

Prob X = 1 = Probfs1 , s2 g
2
= :
4
~ = 1, Y~ = - 1 as
(d) Using the result of part-c, we can calculate Prob X

Prob X = 1, Y = - 1 = Probfs1 g
1
= :
4

Theorem 4.1: The joint and marginal probability mass functions for the discrete
random variables X ~ and Y~ are denoted by p(x, y), px(x), and py( y) respectively. Show
that the marginal probability mass functions p(x) and p( y) can be obtained from the
joint probability mass function p(x, y) via

p x ð xÞ = y
pðx, yÞ py ð y Þ = x
pðx, yÞ: ð4:5Þ
4.2 Joint Probability Mass Function 117

Proof 4.1: Let’s prove

px ð xÞ = y
pðx, yÞ:

For the simplicity of the proof, assume that the range set of the random variable
Y~ is given as

RY~ = fy1 , y2 , y3 g:

The sample space S can be expressed as

S = Y~ = y1 [ Y~ = y2 [ Y~ = y3 : ð4:6Þ

~ = x can be written as
The event X

~ =x = X
X ~ =x \ S

in which substituting (4.6), we get

~ =x = X
X ~ =x \ Y~ = y1 [ Y~ = y2 [ Y~ = y3

leading to

~ =x = X
X ~ = x \ Y~ = y1 [ X
~ = x \ Y~ = y2 [ X
~ = x \ Y~ = y3 ð4:7Þ

Taking the probability of both sides of (4.7), we obtain

~ = x = Prob
Prob X X ~ = x \ Y~ = y2 [ X
~ = x \ Y~ = y1 [ X ~ = x \ Y~ = y3

which can be written as

Prob X~ = x = Prob X~ = x \ Y~ = y1 þ Prob X~ = x \ Y~ = y2


þProb X~ = x \ Y~ = y3

That is,

Prob X~ = x = Prob X~ = x; Y~ = y1 þ Prob X~ = x; Y~ = y2


þProb X~ = x; Y~ = y3

which can be expressed in terms of probability mass functions as

px ðxÞ = pðx, y1 Þ þ pðx, y2 Þ þ pðx, y3 Þ: ð4:8Þ

We can generalize (4.8) as


118 4 Functions of Random Variables

px ð x Þ = y
pðx, yÞ:

The proof of

py ð y Þ = x
pðx, yÞ

can be performed in a similar way to the proof of

px ð x Þ = y
pðx, yÞ: ð4:9Þ

4.3 Conditional Probability Mass Function

The conditional probability mass function p(x| y) is defined as

~ = xjY~ = y :
pðxjyÞ = Prob X ð4:10Þ

Example 4.5: Show that the joint probability mass function p(x, y) can be expressed
as

pðx, yÞ = pðxjyÞpy ðyÞ

or as

pðx, yÞ = pðyjxÞpx ðxÞ:

Solution 4.5: The joint probability mass function p(x, y) is defined as

~ = x, Y~ = y
pðx, yÞ = Prob X

which can also be written as

~ = x \ Y~ = y
pðx, yÞ = Prob X

where employing the conditional probability definition

ProbðA \ BÞ = ProbðAjBÞProbðBÞ

we obtain
4.3 Conditional Probability Mass Function 119

~ = x \ Y~ = y →
pðx, yÞ = Prob X
~ = xjY~ = y Prob Y~ = y
pðx, yÞ = Prob X

which is equal to

pðx, yÞ = pðxjyÞpy ðyÞ:

Example 4.6: Show that

x
pðxjyÞ = 1:

Solution 4.6: Substituting the conditional probability mass function expression

pðx, yÞ
pðxjyÞ =
py ðyÞ

into

x
pðxjyÞ

we get

pðx, yÞ
x py ð y Þ

which can be written as

1
pðx, yÞ
p y ð yÞ x

leading to

1 py ð yÞ
pðx, yÞ → → 1:
py ð y Þ x py ð y Þ
pðyÞ

Theorem 4.2: For the joint probability mass function p(x, y), we have

x,y
pðx, yÞ = 1: ð4:11Þ
120 4 Functions of Random Variables

Fig. 4.2 Joint probability x p ( x, y )


mass function p(x, y) of two
discrete random variables

a 2a
2

3a a
1

y
1 2

Proof 4.2: The mathematical expression

x,y
pðx, yÞ

can be written as

x y
pðx, yÞ
px ðxÞ

leading to

p ðxÞ → 1:
x x

Example 4.7: The joint probability mass function p(x, y) of two discrete random
variables X~ and Y~ is depicted in Fig. 4.2.
Determine the value of a, and find the marginal probability mass functions p(x)
and p( y).
Solution 4.7:
(a) Expanding

x,y
pðx, yÞ = 1

for the given p(x, y) in Fig. 4.2, we obtain

x,y
pðx, yÞ = 1 →

x y
pðx, yÞ = 1 →

x
pðx, 1Þ þ pðx, 2Þ = 1 →

pð1, 1Þ þ pð1, 2Þ þ ð2, 1Þ þ pð2, 2Þ = 1 →


1
3a þ a þ 2a þ a = 1 → 7a = 1 → a = :
7
4.4 Joint Probability Mass Function of Three or More Random Variables 121

(b) Using

px ð x Þ = y
pðx, yÞ py ðyÞ = x
pðx, yÞ

we get

4
px ðx= 1Þ = pðx= 1, y = 1Þ þ pðx= 1, y = 2Þ → px ðx= 1Þ = 4a → px ðx= 1Þ =
7
3
px ðx= 2Þ = pðx= 2, y = 1Þ þ pðx= 2, y = 2Þ → px ðx= 2Þ = 3a → px ðx= 2Þ =
7

and

4
py ðy = 1Þ = pðx = 1; y = 1Þ þ pðx = 2; y = 1Þ → py ðy = 1Þ = 4a → py ðy = 1Þ =
7
3
py ðy = 2Þ = pðx = 1; y = 2Þ þ pðx = 2; y = 2Þ → py ðy = 2Þ = 3a → py ðy = 2Þ =
7

4.4 Joint Probability Mass Function of Three or More


Random Variables

For the sample space of an experiment, we can define any number of random
variables. In this case, we can define the joint probability mass functions for a
group of random variables. Assume that for the sample space S, we have four
defined random variables W, ~ X,
~ Y,
~ Z,
~ and for any group of random variables, we
~ Y,
can define a probability mass function; for example, for X, ~ Z~ we can define p(x, y,
z) as

~ = x, Y~ = y, Z~ = z
pðx, y, zÞ = Prob X ð4:12Þ

which can also be written as

~ = x \ Y~ = y \ Z~ = z :
pðx, y, zÞ = Prob X ð4:13Þ

~ X,
For four random variables W, ~ Y,
~ Z,
~ we define p(w, x, y, z) as

~ = w, X
pðw, x, y, zÞ = Prob W ~ = x, Y~ = y, Z~ = z : ð4:14Þ

If a group of random variables is a subset of another group, then the probability


mass function of the former can be obtained from the latter via summation over the
missing random variables of the former group. That is:
122 4 Functions of Random Variables

pðx, y, zÞ = w
pðw, x, y, zÞ

pxy ðx, yÞ = w,z


pðw, x, y, zÞ

pxz ðx, zÞ = w,y


pðw, x, y, zÞ

py ð yÞ = x,w,z
pðw, x, y, zÞ

pxy ðx, yÞ = z
pðx, y, zÞ

p x ð xÞ = y,z
pðx, y, zÞ

Note: Note that the triple summation can be expanded as

x,y,z
ð ⋯Þ = x y z
ð⋯Þ:

4.5 Functions of Two Random Variables

~ and Y~ as
Let’s define the function of two discrete random variables X

Z~ = g X,
~ Y~ :

~ and Y~ produces another


Since the function of two discrete random variables X
~ we can consider the mean and variance of g X,
random variable Z, ~ Y~ . The mean
~ ~
value of g X, Y is calculated as

~ Y~
E g X, = gðx, yÞpðx, yÞ ð4:15Þ
x,y

which can also be written as

~ Y~
E g X, = gðx, yÞpðx, yÞ: ð4:16Þ
x y

~ and Y~ are given as


Example 4.8: The range sets of the discrete random variables X

RX~ = f- 1, 1g RY~ = f- 2, 3g:

~ and Y~ is defined as
The joint probability mass function p(x, y) for X
4.5 Functions of Two Random Variables 123

1 2 2 1
pð- 1, - 2Þ = pð- 1, 3Þ = pð1,- 2Þ = pð1, 3Þ = :
6 6 6 6

~ and Y~ is defined as
A function of two random variables X

g X, ~ Y:
~ Y~ = X ~

~ Y~ .
Calculate E g X,
Solution 4.8: Employing the formula

~ Y~
E g X, = gðx, yÞpðx, yÞ
x,y

~ Y~ = X
for g X, ~ Y,
~ we get

~ Y~ =
E X xypðx, yÞ
x y

which can be expanded using the y values as

~ Y~ =
E X x × ð- 2Þ × pðx,- 2Þ þ x × ð3Þ × pðx, 3Þ
x

leading to

E X~ Y~ = ð - 1Þ × ð - 2Þ × pð - 1; - 2Þ þ ð - 1Þ × ð3Þ × pð - 1; 3Þ
þ1 × ð - 2Þ × pð1; - 2Þ þ 1 × ð3Þ × pð1; 3Þ

in which substituting the values of the joint probability mass function, we get

1 2
E X~ Y~ = ð - 1Þ × ð - 2Þ × þ ð - 1Þ × ð3Þ ×
6 6
2 1
þ1 × ð - 2Þ × þ 1 × ð3Þ ×
6 6

resulting in

~ Y~ = - 7 :
E X
6

Example 4.9: Show that

E aX ~ þ bE Y~ :
~ þ bY~ = aE X ð4:17Þ
124 4 Functions of Random Variables

Solution 4.9: Using the formula

~ Y~
E g X, = gðx, yÞpðx, yÞ
x y

~ Y~ = aX
for g X, ~ þ bY,
~ we obtain

~ Y~
E g X, = ðax þ byÞpðx, yÞ
x y

which can be written as

~ þ bY~ =
E aX axpðx, yÞ þ bypðx, yÞ
x y y x

leading to

~ þ bY~ = a
E aX x pðx, yÞ þ b y pðx, yÞ
x y y x

pðxÞ pðyÞ

resulting in

~ þ bY~ = a
E aX xpðxÞ þ b ypðyÞ
x y

which can be written as

~ þ bY~ = aE X
E aX ~ þ bE Y~ :

4.6 Conditional Probability Mass Function

For a discrete experiment, let S be the sample space and A be any event. The
conditional probability mass function conditioned on the particular event A is
defined as

~ = xjA
pðxjAÞ = Prob X ð4:18Þ

which is equal to
4.6 Conditional Probability Mass Function 125

~ =x \ A
Prob X
pðxjAÞ = : ð4:19Þ
ProbðAÞ

Example 4.10: Show that Prob(A) in (4.19) can be written as

ProbðAÞ = Prob ~ =x \ A :
X ð4:20Þ
x

~
Solution 4.10: Assume that the range set of the random variable X is
RX~ = fx1 , x2 , x3 g. Then, we have

~ = x1 [ X
S= X ~ = x2 [ X
~ = x3

~ = x1 , X
where X ~ = x2 , X
~ = x3 are disjoint events, and for the event A, we can
write

A=S \ A→A= X ~ = x2 [ X
~ = x1 [ X ~ = x3 \A

leading to

A= ~ = x1 \ A [
X ~ = x2 \ A [
X ~ = x3 \ A :
X ð4:21Þ

Taking the probability of both sides of (4.21), we get

ProbðAÞ = Prob ~ = x1 \ A þ Prob


X ~ = x2 \ A
X
þ Prob ~ = x3 \ A :
X ð4:22Þ

Equation (4.22) can be generalized as

ProbðAÞ = Prob ~ =x \ A :
X ð4:23Þ
x

Theorem 4.3: The conditional probability mass function satisfies

x
pðxjAÞ = 1: ð4:24Þ

Proof 4.3: We defined the conditional probability mass function as

~ =x \ A
Prob X
pðxjAÞ = ð4:25Þ
ProbðAÞ

If we sum both sides of (4.25) over x, we get


126 4 Functions of Random Variables

1 ~ =x \ A
pðxjAÞ = Prob X
x ProbðAÞ x

in which substituting (4.23), we get

1
pðxjAÞ = ProbðAÞ → pðxjAÞ = 1:
x ProbðAÞ x

Thus, we have

x
pðxjAÞ = 1:

Example 4.11: The sample space of an experiment is given as

S = f s1 , s 2 , s 3 , s 4 g

~ on the simple events is defined as


and a random variable X

~ ðs 1 Þ = 3 X
X ~ ðs2 Þ = 1 X
~ ðs 3 Þ = 1 X
~ ðs4 Þ = - 1:

An event A is defined as

A = fs1 , s2 , s3 g:

Find the conditional probability mass function p(x| A).


~ can be written as
Solution 4.11: The range set of the random variable X

RX~ = f- 1, 1, 3g

~ = - 1, X
and the events X ~ = 1, and X
~ = 3 can be written as

~ = - 1 = fs4 g
X ~ = 1 = fs 2 , s 3 g
X ~ = 3 = fs1 g:
X

The conditional probability mass function

~ =x \ A
Prob X
pðxjAÞ =
ProbðAÞ

can be evaluated for x = - 1 as


4.6 Conditional Probability Mass Function 127

~ = -1 \A
Prob X
pðx = - 1jAÞ = →
ProbðAÞ
Probðfs4 g \ fs1 , s2 , s3 gÞ ProbðϕÞ
pðx = - 1jAÞ = → pðx = - 1jAÞ = →
Probðfs1 , s2 , s3 gÞ Probðfs1 , s2 , s3 gÞ

pðx = - 1jAÞ = 0

and for x = 1 as

~ =1 \ A
Prob X Probðfs2 , s3 g \ fs1 , s2 , s3 gÞ
pðx= 1jAÞ = → pðx= - 1jAÞ =
ProbðAÞ Probðfs1 , s2 , s3 gÞ
Probðfs2 , s3 gÞ 2
→ pðx= - 1jAÞ = → pðx= 1jAÞ =
Probðfs1 , s2 , s3 gÞ 3

and for x = 3 as

~ =3 \ A
Prob X Probðfs1 g \ fs1 , s2 , s3 gÞ
pðx= 3jAÞ = → pðx= 3jAÞ =
ProbðAÞ Probðfs1 , s2 , s3 gÞ
Probðs1 Þ 1
→ pðx= 3jAÞ = → pðx= 3jAÞ = :
Probðfs1 , s2 , s3 gÞ 3

Thus, we calculated the conditional probability mass function as

2 1
pðx= - 1jAÞ = 0 pðx= 1jAÞ = pðx= 3jAÞ = :
3 3

From the calculated values, it is seen that

x
pðxjAÞ = 1:

Consider that we have two random variables X ~ and Y~ defined on the simple events
of the same sample space. If the event A is chosen as A = Y~ = y , then the
conditional probability mass function p(x| A) happens to be

~ = xjY~ = y
pðxjyÞ = Prob X ð4:26Þ

and it is not difficult to show that

pðx, yÞ
pðxjyÞ = : ð4:27Þ
py ð y Þ
128 4 Functions of Random Variables

4.7 Conditional Mean Value

~ and Y~ be two random variables defined for the simple events of the same
Let X
sample space, and A be an event.
~ is defined as
The conditional expected value of X

~
E XjA = xpðxjAÞ ð4:28Þ
x

~ i.e., g X
and for a function of X, ~ ,E g X
~ jA is calculated using

~ jA =
E g X gðxÞpðxjAÞ: ð4:29Þ
x

If the event A is chosen as A = Y~ = y , then E XjA


~ is calculated as

~ Y~ = y =
E Xj xpðxjyÞ: ð4:30Þ
x

~ can be calculated in terms of the conditional


Theorem 4.4: The expected value of X
expected value as

~ =
E X ~ Y~ = y :
pðyÞE Xj ð4:31Þ
y

Proof 4.4: We know that

pð x Þ = y
pðx, yÞ

which can be written as

pð x Þ = y
pðyÞpðxjyÞ: ð4:32Þ

Multiplying both sides of (4.32) by x and summer over x, we get

x
xpðxÞ = x
x y
pðyÞpðxjyÞ
E ðX

the right side of which can be rearranged as


4.7 Conditional Mean Value 129

~ =
E X pð y Þ xpðxjyÞ
y x

E ðXj
~ Y~ = yÞ

which can be written as

~ =
E X ~ Y~ = y :
pðyÞE Xj
y

~ is a
Theorem 4.5: If A1, A2, ⋯, AN form a partition of a sample space S, and X
~
random variable, then the expected value of X can be calculated as

~ =
E X
N
~ i :
PðAi ÞE XjA ð4:33Þ
i=1

Proof 4.5: For the simplicity of the proof, assume that N = 3, i.e., there are three
disjoint events A1, A2, and A3 such that

S = A1 [ A2 [ A3 :

~ = x can be written as
Then, the event X

~ =x = X
X ~ =x \ S→ X
~ =x = X
~ = x \ fA1 [ A2 [ A3 g →
~ =x = X
X ~ = x \ A2 [ X
~ = x \ A1 [ X ~ = x \ A2

from which, we get

~ = x = Prob X
Prob X ~ = x \ A1 þ Prob X
~ = x \ A2 þ Prob X
~ = x \ A2

which can be written as

pðxÞ = pðx, A1 Þ þ pðx, A2 Þ þ pðx, A3 Þ ð4:34Þ

Equation (4.34) can be generalized as

pð x Þ = Ai
pðx, Ai Þ

which can also be written as

pð x Þ = Ai
pðxjAi ÞpðAi Þ: ð4:35Þ

Multiplying both sides of (4.35) by x and summing over x, we get


130 4 Functions of Random Variables

x
xpðxÞ = Ai x
xpðxjAi ÞpðAi Þ ð4:36Þ
E ðX
~Þ E ðXjA
~ iÞ

resulting in

~ =
E X ~ i pðAi Þ:
E XjA ð4:37Þ
Ai

~ is given as
Exercise: Probability mass function of a discrete random variable X

1
x= -2
4
1
pð x Þ = x=1
2
1
x = 3:
4

(a) E X~ =?
(b) Var X~ =?
(c) Find and draw the cumulative distribution function F(x).
~ =X
(d) g X ~ 2 - 1, E g X
~ = ? Var g X ~ =?
(e) Prob - 2 ≤ X~ ≤2 =?

Exercise: Sample space of an experiment is given as S = {s1, s2, s3, s4, s5}. The
~ is defined as
random variable X

X ~ ðs 2 Þ = 1 X
~ ðs1 Þ = - 1 X ~ ðs3 Þ = - 1 X
~ ðs 4 Þ = 1 X
~ ðs5 Þ = 2:

The event A is defined as

A = fs1 , s2 , s5 g

Calculate p(x| A).


Exercise: Sample space of an experiment is given as S = {s1, s2, s3, s4, s5}. The
~ and Y~ are defined as
random variables X

X ~ ðs 2 Þ = 1 X
~ ð s1 Þ = - 1 X ~ ðs 3 Þ = - 1 X
~ ðs 4 Þ = 1 X
~ ðs5 Þ = - 1

Y~ ðs1 Þ = 1 Y~ ðs2 Þ = 1 Y~ ðs3 Þ = - 1 Y~ ðs4 Þ = 1 Y~ ðs5 Þ = - 1:

Find joint probability mass function p(x, y).


4.8 Independence of Random Variables 131

4.8 Independence of Random Variables

Two discrete random variables X ~ and Y~ are independent of each other if their joint
probability mass function p(x, y) satisfies

pðx, yÞ = px ðxÞpy ðyÞ8x, y ð4:38Þ

which implies that

pðxjyÞ = px ðxÞ: ð4:39Þ

The random variables X ~ and Y~ are said to be conditionally independent consid-


ering the event A, if the conditional joint probability mass function p(x, y| A) satisfies

pðx, yjAÞ = px ðxjAÞpy ðyjAÞ ð4:40Þ

which implies that

pðxjy, AÞ = px ðxjAÞ: ð4:41Þ

4.8.1 Independence of a Random Variable from an Event

The random variable X ~ is independent of the event A if the joint probability mass
function p(x, A) satisfies

pðx, AÞ = pðxÞProbðAÞ ð4:42Þ

where

~ =x \ A
pðx, AÞ = Prob X ~ =x
pðxÞ = Prob X ð4:43Þ

which can also be written as

~ = x and A :
pðx, AÞ = Prob X ð4:44Þ

The independence condition

pðx, AÞ = pðxÞProbðAÞ ð4:45Þ

implies that
132 4 Functions of Random Variables

pðxjAÞ = pðxÞ: ð4:46Þ

Example 4.12: If p(x, y| A) = p(x| A)p(y| A), show that p(x| y, A) = p(x| A).
Solution: The conditional expression p(x| y, A) can be written as

pðx, y, AÞ
pðxjy, AÞ =
pðy, AÞ

which can be manipulated as

pðx, y, AÞ pðx, yjAÞProbðAÞ


pðxjy, AÞ = → pðxjy, AÞ =
pðy, AÞ pðy, AÞ

where using

pðx, yjAÞ = pðxjAÞpðyjAÞ

we get

,
pð y A Þ
ProbðAÞ

pðx, yjAÞProbðAÞ pðxjAÞ pðyjAÞ ProbðAÞ


pðxjy, AÞ = → pðxjy, AÞ =
pðy, AÞ pðy, AÞ

leading to

pðxjy, AÞ = pðxjAÞ:

~ and Y~ are independent random variables, then we have


Theorem 4.6: If X

~ Y~ = E X
E X ~ E Y~ : ð4:47Þ

Proof 4.6: Employing

~ Y~
E g X, = gðx, yÞpðx, yÞ
x,y

~ Y~ = X
for g X, ~ Y,
~ we get

~ Y~ =
E X xy pðx, yÞ
x,y

in which using p(x, y) = p(x)p( y), we obtain


4.8 Independence of Random Variables 133

~ Y~ =
E X xy pðxÞpðyÞ
x,y

which can be rearranged as

~ Y~ =
E X xpðxÞ ypðyÞ
x y

leading to

~ Y~ = E X
E X ~ E Y~ :

Example 4.13: Show that

x2 pðx, yÞ = ~2
x2 pð x Þ = E X ð4:48Þ
x,y x

Solution 4.13: The double summation

x,y
x2 pðx, yÞ

can be written as

x y
x2 pðx, yÞ

which can be rearranged as

x
x2 y
pðx, yÞ
= pðxÞ

leading to

x
x2 pð x Þ

which is nothing but

~2 :
E X
134 4 Functions of Random Variables

Example 4.14: Using

~ Y~
E g X, = gðx, yÞpðx, yÞ
x,y

show that

~ 2 þ Y~ 2 þ 2X
E X ~ Y~ = E X
~ 2 þ E Y~ 2 þ E 2X
~ Y~ : ð4:49Þ

Solution 4.14: Employing

~ Y~
E g X, = gðx, yÞpðx, yÞ
x,y

for

~ Y~ = X
g X, ~ 2 þ Y~ 2 þ 2X
~ Y~

we obtain

2 2
E X þ Y þ 2XY = x2 þ y2 þ 2xy pðx, yÞ
x, y

= x pðx, yÞ þ
2
y2 pðx, yÞ þ 2 xypðx, yÞ
x, y x, y x, y

2 2
E X E Y E XY

2 2
=E X þE Y þ 2E XY :

~ and Y~ are independent random variables and Z~ = X


Theorem 4.7: If X ~ þ Y,
~ then we
have

Var Z~ = Var X
~ þ Var Y~ : ð4:50Þ

Proof 4.7: If Z~ = X
~ þ Y,
~ then using

~ Y~
E g X, = gðx, yÞpðx, yÞ
x,y

it can be shown that

E Z~ = E X
~ þ E Y~ :
4.8 Independence of Random Variables 135

~ , E Y~ , and E Z~ by mx, my, and mz, respectively, then we can


If we denote E X
write

mz = mx þ my :

The variance of Z~ can be calculated using

Var Z~ = E Z~ - m2z
2

in which substituting Z~ = X
~ þ Y~ and mz = mx + my, we get

Var Z~ = E ~ þ Y~ 2 2
X - mx þ my

which can be written as

Var Z~ = E X
~ 2 þ Y~ 2 þ 2X
~ Y~ - m2x - m2y - 2mx my

leading to

Var Z~ = E X
~ 2 þ E Y~ 2 þ E 2X
~ Y~ - m2x - m2y - 2mx my

which can be rearranged as

Var Z~ = E X
~ 2 - m2x þ E Y~ 2 - m2y þ 2E X
~ E Y~ - 2mx my
mx my
Var ðX
~Þ Var ðY~ Þ

leading to

Var Z~ = Var X
~ þ Var Y~ :

Theorem 4.8: If X~ and Y~ are independent random variables, then the functions of
~ , h Y~ are independent of each other, i.e.,
these random variables g X

~ h Y~
E g X ~ E h Y~ :
=E g X ð4:51Þ
136 4 Functions of Random Variables

4.8.2 Independence of Several Random Variables

The random variables X ~ , Y~ , and Z~ are independent of each other, if joint probability
mass function p(x, y, z) satisfies

pðx, y, zÞ = px ðxÞpy ðyÞpz ðzÞ


pðx, yÞ = px ðxÞpy ðyÞ
ð4:52Þ
pðx, zÞ = px ðxÞpz ðzÞ
pðy, zÞ = py ðyÞpz ðzÞ:

~ Y,
If the random variables X, ~ and Z~ are independent of each other, then their
functions are also independent of each other; for instance,

~ Z~
g X,

is independent of

h Y~ :

Problems

~ is given as
1. The probability mass function p(x) of a discrete random variable X

1
x= -1
4
2
pð x Þ = x=1
4
1
x = 2:
4

~ i.e., write RX~ .


(a) Write the range set of X,
(b) If Y~ = X ~ i.e., RY~ .
~ þ 3, determine the range set of Y,
2

(c) Find the probability mass function of Y.~

2. The probability mass function for a discrete random variable X ~ is px(x). If


~ ~ 3
~
Y = X þ 1, determine the probability mass function of Y, i.e., py( y), in terms
of px(x).
Problems 137

Fig. 4P.1 Joint probability x p ( x, y )


mass function p(x, y) of two
discrete random variables

a 4a
2

4a a
1

y
1 2

3. Sample space of an experiment is given as

S = fs1 , s2 , s3 g

~ and Y~ are defined as


and the discrete random variables X

~ ðs 1 Þ = - 1
X ~ ðs 2 Þ = 1
X ~ ðs3 Þ = - 1
X
Y~ ðs1 Þ = - 1 Y~ ðs2 Þ = 1 Y~ ðs3 Þ = 1:

Find the following:


(a) Find X ~ = - 1 , Y~ = 1 .
(b) Find X ~ = - 1, Y~ = 1 :
(c) Find Prob X ~ = -1 :
(d) Prob X = - 1, Y~ = 1 .
~
(e) Determine the joint probability mass function p(x, y) for the discrete random
~ and Y.
variables X ~

4. The joint probability mass function p(x, y) of two discrete random variables X~
~
and Y is depicted in Fig. 4P.1.
Determine the value of a, find the marginal probability mass functions px(x)
and py( y), and find also the conditional probability mass functions p(x| y) and
p(y| x).
5. The range sets of the discrete random variables X~ and Y~ are given as

RX~ = f- 1, 1g RY~ = f1, 2g:

~ and Y~ is defined as
The joint probability mass function p(x, y) for X
138 4 Functions of Random Variables

2 3 3 2
pð- 1, 1Þ = pð- 1, 2Þ = pð1, 1Þ = pð1, 2Þ = :
8 8 8 8
~ and Y~ is defined as
A function of two random variables X

Z~ = g X, ~ Y~ 2 þ Y~ 3 :
~ Y~ = X

~
(a) Determine the range set of Z.
~ i.e., pz(z), and draw its graph.
(b) Determine the probability mass function of Z,
~ Y~ , i.e., calculate E Z~ .
(c) Calculate E g X,

~ and Y,
6. For two discrete random variables X ~ we have

~ = 2:5 E Y~ = 4:
E X

If Z~ = 2X
~ þ 3Y,
~ calculate E Z~ .
7. The sample space of an experiment is given as

S = fs1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 g

~ on the simple events is defined as


and a random variable X

X ~ ðs2 Þ = - 1
~ ðs 1 Þ = 1 X ~ ðs 3 Þ = 1 X
X ~ ðs4 Þ = 2
~ ðs5 Þ = 1
X ~ ðs6 Þ = 2
X ~ ðs 7 Þ = - 1 X
X ~ ðs8 Þ = 2:

The events A, B, C are defined as

A = fs1 , s3 , s4 g B = fs2 , s5 , s7 g C = fs6 , s8 g:

(a) Find the conditional probability mass functions p(x| A), p(x| B), and p(x| C).
Determine the result of

pðxjAÞ þ pðxjBÞ þ pðxjC Þ:

~ , E XjB
(b) Calculate E XjA ~ , E XjC
~ , and E X
~ 2 þ 1jA .
(c) If the events A, B, and C are defined as

A = fs1 , s3 , s7 g B = fs2 , s3 , s7 g C = fs1 , s2 , s3 g

Find the conditional probability mass functions p(x| A, B) and p(x| B, C).
Problems 139

8. Sample space of an experiment is given as

S = fs1 , s2 , s3 g

~ and Y~ are defined as


and the discrete random variables X

~ ðs 1 Þ = - 1 X
X ~ ðs 2 Þ = - 1 X
~ ðs3 Þ = 1 X
~ ðs4 Þ = 1

Y~ ðs1 Þ = 2 Y~ ðs2 Þ = 3 Y~ ðs3 Þ = 2 Y~ ðs4 Þ = 3

The event A is defined as A = {s1, s2}. Show that the random variables X ~ and
~
Y are conditionally independent of each other given the event A.
9. Write the criteria for the independence of four random variables from each other.
10. The variance of discrete random variable X ~ is 4. Find the variance of Y~ = 2X:
~
Chapter 5
Continuous Random Variables

5.1 Continuous Probability Density Function

The random variable functions, which are used for experiments having sample
spaces including an uncountable number of simple outcomes, are called continuous
random variables. The probability density function f(x) of a continuous random
~ satisfies
variable X

b
~ ≤b =
Prob a ≤ X f ðxÞdx: ð5:1Þ
a

~ we have
Note that for discrete random variable X,

b
Prob a ≤ X ≤ b = pðxÞ: ð5:2Þ
x=a

A typical plot of f(x) is depicted in Fig. 5.1.


In Fig. 5.1, the range set of the continuous random variable is the real number
interval

RX~ = ½x1 x2 :

For continuous random variables, we do not consider a single value of the random
variable; instead, we consider intervals on which the random variable can have a
value.
The probability that the continuous random variable X ~ takes a value on the
interval I ⊂ RX~ is calculated as

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 141
O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_5
142 5 Continuous Random Variables

Fig. 5.1 Probability density f (x)


function of a random
variable

x1 x2 x

Fig. 5.2 A continuous f (x)


event

I
x1 x
a b x2

~ =x 2 I =
Prob X f ðxÞdx ð5:3Þ
I

which can be written in a more compact form as

~2I =
Prob X f ðxÞdx: ð5:4Þ
I

~ 2 I is calculated as
If the interval I equals [a b], i.e., I = [a b], then Prob X

b
~2I =
Prob X f ðxÞdx: ð5:5Þ
a

The interval I = [a b] is depicted in Fig. 5.2.


Some Properties
~ is a continuous random variable with probability density function f(x).
X
1. The probability of a single point equals 0, i.e.,
x1
~ = x1 =
Prob X f ðxÞdx = 0 ð5:6Þ
x1

2. A single point does not affect the probability of an interval, i.e.,

~ ≤ x2 = Prob x1 < X
Prob x1 ≤ X ~ ≤ x2 = Prob x1 ≤ X
~ < x2 = Prob x1 < X
~ < x2
5.2 Continuous Uniform Random Variable 143

Fig. 5.3 Approximation for f (x)


probability calculation

f ( x1 )
Area | G f ( x1 )

x1 x1  G x

3. The total area under the probability density function equals 1, i.e.,
1
~ ≤1 =
Prob - 1 ≤ X f ðxÞdx = 1 ð5:7Þ
-1

Now let’s consider a very short interval I, i.e.,

I = ½x1 x1 þ δ

where δ is a very small number. The probability

~ =x 2 I
Prob X

can be approximated as

x1 þδ
Prob X ~ ≤ x1 þ δ =
~ = x 2 I = Prob x1 ≤ X f ðxÞdx ≈ δf ðx1 Þ ð5:8Þ
x1

which is nothing but the area of the rectangle in Fig. 5.3.


We can define the probability density function as

1 ~ ≤x þδ :
f ðxÞ = lim Prob x ≤ X ð5:9Þ
δ→0 δ

5.2 Continuous Uniform Random Variable

~ is
The probability density function of a continuous uniform random variable X
defined on the interval RX~ = ½a b, 0 < a < b as
144 5 Continuous Random Variables

Fig. 5.4 Uniform f ( x)


probability density function
1
ba

x
a b

1
f ðxÞ = if a ≤ x ≤ b ð5:10Þ
b-a
0 otherwise

which is graphically depicted in Fig. 5.4.


From Fig. 5.4, it is clear that
1
f ðxÞdx = 1: ð5:11Þ
-1

~
Example 5.1: The probability density function of a continuous random variable X
is given as

K
0≤x≤1
f ðxÞ = x1=3
0 otherwise:

Find the value of K.


Solution 5.1: Employing
1
f ðxÞdx = 1
-1

for the given distribution, we get

1
K
dx = 1
0 x1=3

which is solved for K as

1
K 3 2 1 3 2
dx = 1 → K x3 = 1 → K = 1 → K = :
0 x1=3 2 0 2 3
5.3 Expectation and Variance for Continuous Random Variables 145

5.3 Expectation and Variance for Continuous Random


Variables

The probabilistic average, or expected value, or mean value of the continuous


random variable X~ is calculated as

1
~ =
E X xf ðxÞdx: ð5:12Þ
-1

Let’s denote the mean value of X ~ by m, i.e., m = E X


~ . The variance of the
~
random variable X is calculated using

~ =E X
Var X ~ -m 2
ð5:13Þ

which is explicitly written as


1
~ =
Var X ðx - mÞ2 f ðxÞdx: ð5:14Þ
-1

~ can also be calculated using the alternative


The variance of the random variable X
formula

~ =E X
Var X ~ 2 - m2 ð5:15Þ

~ 2 is computed as
where E X

1
~2 =
E X x2 f ðxÞdx: ð5:16Þ
-1

~
Example 5.2: The probability density function of a continuous random variable X
is shown in Fig. 5.5. Calculate the mean value and variance of the random
~
variable X.

Fig. 5.5 Probability density f ( x)


function of a random
variable
1
4

x
2 6
146 5 Continuous Random Variables

Solution 5.2: Employing the formula


1
E X = xf ðxÞdx
-1

for the probability density function depicted in Fig. 5.5, we calculate the mean value
of the random variable as

2 6
x x 3
m = E X~ = dx þ x - þ dx
0 4 2 16 8
≈ 2:3:

2
For the variance calculation, we first evaluate E X using

1
2
E X = x2 f ðxÞdx
-1

as

2 2 6
x x 3
E X~ 2 = dx þ x2 - þ dx
0 4 2 16 8
≈ 6:7

Finally, the variance is calculated using

2
Var X ≈ E X - m2

as

Var X ≈ 6:7 - 2:32 → Var X ≈ 1:41:

5.4 Expectation and Variance for Functions of Random


Variables

We can define a function of random variable X as Y = g X whose mean value my is


calculated using
5.4 Expectation and Variance for Functions of Random Variables 147

1
my = E g X = gðxÞf ðxÞdx: ð5:17Þ
-1

The variance of g X is calculated using

1
Var g X = ½gðxÞ2 f ðxÞdx - m2y : ð5:18Þ
-1

Example 5.3: Show that

Var X ≥ 0: ð5:19Þ

Solution 5.3: We can calculate the variance of X using


1
Var X = ðx - mÞ2 f ðxÞdx
-1

where (x - m)2 ≥ 0 and f(x) ≥ 0, then it is obvious that Var X ≥ 0:


Example 5.4: Calculate the mean value and variance of the continuous uniform
random variable X whose probability density function is defined as

1
f ðxÞ = if a ≤ x ≤ b ð5:20Þ
b-a
0 otherwise:

Solution 5.4: The mean value E X is calculated as

1 b
1 1 b2 - a2
E X = xf ðxÞdx → E X = xdx → E X =
-1 b-a a b-a 2

resulting in

aþb
m=E X = : ð5:21Þ
2

The variance of X can be obtained using

2
Var X = E X - m2
148 5 Continuous Random Variables

2
where E X is computed as

1 b
2 2 1 2 1 b 3 - a3
E X = x2 f ðxÞdx → E X = x2 dx → E X =
-1 b-a a b-a 3

leading to

2 b2 þ ab þ a2
E X = :
3

Then, Var X is evaluated as

2
2 a2 þ ab þ b2 aþb
Var X = E X - m2 → Var X = -
3 2

resulting in

ðb - aÞ2
Var X = : ð5:22Þ
12

5.5 Gaussian or Normal Random Variable

If the continuous random variable X has the probability density function

1 ðx - mÞ2
f ðxÞ = p e - 2σ2 ð5:23Þ
σ 2π

then X is said to be a Gaussian or normal random variable, and the probability


density function is called the Gaussian or normal distribution, and we use the
notation

X  N m, σ 2 ð5:24Þ

to indicate the Normal random variable with mean m, and variance σ 2. For the
normal random variable X, we have
5.5 Gaussian or Normal Random Variable 149

E X =m

leading to
1 ðx - mÞ2
1
m= p xe - 2σ 2 dx
σ 2π -1

and

Var X = σ 2

leading to
1 ðx - mÞ2
1
σ2 = p ðx - mÞ2 e - 2σ 2 dx: ð5:25Þ
σ 2π -1

5.5.1 Standard Random Variable

The Gaussian random variable X with zero mean and unity variance is called
standard normal random variable, and it is indicated as

X  N ð0, 1Þ:

The Gaussian random variable Y with mean m and variance σ 2, i.e.,

Y  N m, σ 2

can be expressed in terms of the standard random variable as

Y = m þ σX

which implies that

Y -m
X= : ð5:26Þ
σ

The cumulative distribution function F(x) for a continuous random variable X is


calculated using its probability density function f(x) as
150 5 Continuous Random Variables

Fig. 5.6 Normal f (x)


distribution Area I (0.7)

x
0.7

F ðxÞ = f ðt Þdt: ð5:27Þ


-1

For X  N ð0, 1Þ, the probability density function f(x) has the form

1 x2
f ðxÞ = p e - 2 ð5:28Þ

and the cumulative function of X  N ð0, 1Þ can be expressed as

x
1 t2
F ðxÞ = p e - 2 dt: ð5:29Þ

-1

The special cumulative distribution function given in (5.29) is denoted by ϕ(x)


and can be tabulated for different values of x. It indicates the area under the zero
mean unity variance Gaussian probability density function from -1 to x (Fig. 5.6).
Example 5.5: The graphs of the zero mean Gaussian distributions with different σ 2
values are depicted in Fig. 5.7 where it is seen as σ 2 increases we obtain a thinner and
taller Gaussian curve.
Example 5.6: The graphs of the zero mean Gaussian distributions with different σ 2
values are depicted in Fig. 5.8 where it is seen as σ 2 increases we obtain a thinner and
taller Gaussian curve.
Example 5.7: The graphs of the Gaussian distributions with the same non-zero
mean and different σ 2 values are depicted in Fig. 5.9 where it is seen as σ 2 increases
we obtain a thinner and taller Gaussian curve.
Example 5.8: X is a continuous random variable with distribution N(1, 4). If
Y = X þ 2, find the distribution of Y.
5.5 Gaussian or Normal Random Variable 151

f (x)
N (0,1)

N (0,2)

N (0,4)

Fig. 5.7 Normal distributions with mean value m = 0, and variances σ 2 = 1, σ 2 = 2, and σ 2 = 4

f (x)
N (0,1) N (4,1)

x
5 0 9
4

Fig. 5.8 Normal distributions with σ 2 = 1 and mean values m = 0 and m = 4

Solution 5.8: If we add a constant to a random variable, the new random variable
owns the same variance as the added one. Just the mean value of for new random
variable is shifted by the added amount. Thus, the random variable Y has the
distribution N(1 + 2, 4) → N(3, 4).
152 5 Continuous Random Variables

f (x)
N (4,1)

N (4,2)

N (4,4)

x
0 4

Fig. 5.9 Normal distributions with mean value m = 4, and variances σ 2 = 1, σ 2 = 2, and σ 2 = 4

Example 5.9: Find the result of the integral evaluation

0
1 x2
p e - 2 dx:
2π -1

Solution 5.9: The total area under the standard normal distribution equals 1, i.e.,
1 1
1 x2
f ðxÞdx = 1 → p e - 2 dx = 1:
-1 2π -1

The integral expression given in the question corresponds to half of the area under
the Gaussian curve; for this reason, we have

0
1 x2 1
p e - 2 dx = :
2π -1 2

5.6 Exponential Random Variable

The continuous random variable X with probability density function

λe - λx if x ≥ 0
f ðxÞ = ð5:30Þ
0 otherwise
5.6 Exponential Random Variable 153

is called the exponential random variable.


Example 5.10: Calculate the mean and variance of the exponential random
variable.
Solution 5.10: The mean value of the exponential random variable can be calcu-
lated as
1 1
E X = xf ðxÞdx → E X = xλe - λx dx
-1 0

where letting u = x, dv = λe-λxdx and employing integration by parts, i.e.,

udv = uv - vdu

we get
1
1
m = E X~ = - xe - λx 0
þ e - λx dx
0
1
e - λx
=0- λ 0
1
= :
λ
2
For the variance calculation, let’s first calculate E X as follows:

1 1
2 2
E X = x2 f ðxÞdx → E X = x2 λe - λx dx
-1 -1

where letting u = x2, dv = λe-λxdx and employing integration by parts, i.e.,

udv = uv - vdu

we get
1
2 1
E X = - x2 e - λx 0
þ 2xe - λx dx
0
1
2 - λx
=0 þ xλe dx
λ 0

E X

2
= 2:
λ
154 5 Continuous Random Variables

Then, variance can be calculated using

2
Var X = E X - m2

leading to

2 1 1
Var X = - → Var X = 2 :
λ2 λ2 λ

5.7 Cumulative Distribution Function

The cumulative distribution function for the random variable X is defined as

F ðxÞ = Prob X ≤ x ð5:31Þ

which is calculated for discrete and continuous random variables as

F ð xÞ = pð x i Þ ð5:32Þ
xi ≤ x

and
x
F ð xÞ = f ðt Þdt ð5:33Þ
-1

respectively.

5.7.1 Properties of Cumulative Distribution Function

The cumulative distribution function

F ðxÞ = Prob X ≤ x ð5:34Þ

has the following properties:


5.7 Cumulative Distribution Function 155

1. F(x) is a monotonically non-decreasing function, i.e.,

if x ≤ y, then F ðxÞ ≤ F ðyÞ: ð5:35Þ

2. F(x) has the limiting values

F ð- 1Þ = 0 F ð1Þ = 1: ð5:36Þ

3. If the random variable X is a discrete one, then F(x) has a piecewise constant and
staircase shape.
4. If the random variable X is a continuous one, then F(x) has continuous form.
5. For continuous random variable X, the relation between probability density
function f(x) and cumulative distribution function F(x) can be stated as
x
dF ðxÞ
F ð xÞ = f ðxÞdx f ð xÞ = : ð5:37Þ
-1 dx

6. For discrete random variable X with range set R = fx1 , x2 , ⋯, xN g, the relation
X
between probability mass function p(x) and cumulative distribution function F(x)
can be stated as

pðxi Þ = F ðxi Þ - F ðxi - 1 Þ F ðxi Þ = p xj ð5:38Þ


xj ≤ xi

which can also be written as

pðxÞ = F ðxÞ - F ðx - Þ F ð xÞ = p xj : ð5:39Þ


xj ≤ x

Example 5.11: The probability density function of a continuous random variable


X is shown in Fig. 5.10.
(a) Find c.
(b) Calculate and draw the cumulative distribution function F(x) of the random
variable X.

Fig. 5.10 Probability f (x )


density function of a random
variable
c

x
1 3
156 5 Continuous Random Variables

Solution 5.11: Employing


1
f ðxÞdx = 1
-1

for Fig. 5.10, we get

c 2
þ 2c = 1 → c = :
2 5

To draw the cumulative distribution function F(x), let’s first consider the x-
intevals on which F(x) is determined. While determining the x-intevals, we pay
attention to the graph of the f(x), and consider the points at which function changes.
Following this idea, we can determine the x-intevals as

0≤x<1
1 ≤ x ≤ 3:

In the next step, on each interval we calculate the cumulative distribution function
F(x) employing
x
F ð xÞ = f ðt Þdt:
-1

On the interval 0 ≤ x < 1, the probability density function can be written as

2
f ðxÞ = x 0 ≤ x < 1
5

and the cumulative distribution function F(x) is determined as


x x
2 x2
F ðxÞ = f ðt Þdt → F ðxÞ = tdt → F ðxÞ = :
-1 0 5 5

On the interval 1 ≤ x ≤ 3, the probability density function can be written as

2
f ðxÞ = 1≤x≤3
5

and the cumulative distribution function F(x) is determined as

x 1 x
2 2 1 2
F ð xÞ = f ðt Þdt → F ðxÞ = tdt þ dt → F ðxÞ = þ ðx- 1Þ:
-1 0 5 1 5 5 5

Thus, the cumulative distribution function F(x) can be written as


5.8 Impulse Function 157

Fig. 5.11 Cumulative F (x)


distribution function for
Example 5.11
1

1/ 5

x
0 1 3

0 -1<x<0
x2
0≤x<1
5
F ð xÞ = 1 2
þ ð x - 1Þ 1≤x≤3
5 5
1 3 ≤ x < 1:

The graph of F(x) is depicted in Fig. 5.11.

5.8 Impulse Function

The continuous impulse function δ(x) is defined as

1 if x = 0
δðxÞ = ð5:40Þ
0 otherwise

which satisfies
1
δðxÞdx = 1: ð5:41Þ
-1

The shifting operation does not alter the integration property, i.e.,
1
δðx- x0 Þdx = 1: ð5:42Þ
-1

The graph of δ(x - x0) is depicted in Fig. 5.12.


158 5 Continuous Random Variables

Fig. 5.12 Shifted impulse


function
G ( x  x0 )
1

x
0 x0

Fig. 5.13 Unit step u ( x)


function

1
1/ 2
x
0

5.9 The Unit Step Function

The unit step function u(x) can be defined either as

1 if x > 0
1
uð x Þ = if x = 0 ð5:43Þ
2
0 otherwise:

or as

1 if x ≥ 0
uð x Þ = ð5:44Þ
0 otherwise:

The graph of the unit step function is depicted in Fig. 5.13.


The relationship between δ(x) and u(x) is given as

duðxÞ x
δðxÞ = → uðxÞ = δðt Þdt: ð5:45Þ
dx -1

Some functions can be expressed as the sum of the shifted impulses or unit steps.
For instance, the function shown in Fig. 5.14 can be expressed in terms of shifted
unit functions as

gðxÞ = uðx- 1Þ þ 2uðx- 2Þ:

Then, the derivative of g(x) equals to


5.9 The Unit Step Function 159

Fig. 5.14 A staircase g ( x)


function
3
1
x
0 1 2

Fig. 5.15 Derivative of g(x) dg ( x )


in Fig. 5.10 dx

1
x
0 1 2

Fig. 5.16 Cumulative F (x)


distribution function of a
continuous random variable
1

0 .4

x
1 4

dgðxÞ
= δðx- 1Þ þ 2δðx- 2Þ
dx

whose graph is depicted in Fig. 5.15.


When Figs. 5.14 and 5.15 are compared to each other, we see that the probability
density function contains impulses at the points of the cumulative distribution
function where discontinuities are available.
Example 5.12: The cumulative distribution function of a continuous random var-
iable is given in Fig. 5.16. Find the probability density function of the continuous
random variable.
Solution 5.12: The probability density function is calculated by taking the deriva-
tive of cumulative distribution function, i.e.,

dF ðxÞ
f ðxÞ = :
dx

For the given F(x), the calculation of f(x) is depicted in Fig. 5.17.
160 5 Continuous Random Variables

Fig. 5.17 Calculation of F (x)


probability density function
1
0 .6
0 .4

x
1 4

d/dx d/dx d/dx d/dx


f (x)

0 .6
0 .4

x
1 4

Fig. 5.18 Probability f (x )


density function of a
continuous random variable
1/ 2
1/ 4

x
1 0 1 2 3

Example 5.13: The probability density function of a continuous random variable is


depicted in Fig. 5.18. Find and draw the cumulative distribution function of this
random variable.
Solution 5.13: To find the cumulative distribution function of the continuous
random variable, let’s first write the x-intervals on which the cumulative distribution
function is evaluated as

-1<x< -1
-1≤x<1
1≤x<2
2≤x<3
3 ≤ x < 1:

In the second step, employing the formula


5.9 The Unit Step Function 161

x
F ð xÞ = f ðt Þdt
-1

on the determined intervals, we can calculate the cumulative distribution function as


x x
- 1 < x < - 1 → F ð xÞ = f ðt Þdt → F ðxÞ = 0dt → F ðxÞ = 0
-1 -1
x - 1þ
1 1
- 1 ≤ x < 1 → F ðxÞ = f ðt Þdt → F ðxÞ = δðt þ 1Þdt → F ðxÞ =
-1 -1 - 4 4
x - 1þ
1
1 ≤ x < 2 → F ðxÞ = f ðt Þdt → F ðxÞ = δðt þ 1Þdt
-1 - 1- 4

1 3
þ δðt þ 1Þdt → F ðxÞ =
1 - 2 4
x
2 ≤ x < 3!F ðxÞ ¼ f ðt Þdt!F ðxÞ
-1
- 1þ 1þ x
1 1 1 3 1
¼ δðt þ 1Þdt þ δðt þ 1Þdt þ dt!F ðxÞ ¼ þ ðx - 2Þ
-1 - 4 1 - 2 2 4 4 4
x
3 ≤ x < 1 → F ðxÞ = f ðt Þdt → F ðxÞ = 1:
-1

Hence, using the calculated values, we can write the cumulative distribution
function as

0 -1<x< -1
1
-1≤x<1
4
F ð xÞ = 3
1≤x<2
4
x 1
þ 2≤x<3
4 4

whose graph is depicted in Fig. 5.19 with the graph of probability density function.
Exercise: Draw the cumulative distribution function of a random variable whose
probability density function is depicted in Fig. 5.20.
162 5 Continuous Random Variables

Fig. 5.19 Probability f (x )


density and cumulative
distribution functions for
Example 5.13 1/ 2
1/ 4

1 0 1 2 3

F (x )

1
3/ 4

1/ 4

1 0 1 2 3

Fig. 5.20 Probability f (x)


density function for exercise

0.3
0.4

x
1 2 4

5.10 Conditional Probability Density Function

For continuous experiments, sample spaces and events are defined as intervals. We
can indicate the sample spaces and events using random variables. For instance, for a
discrete random variable, let

R = f- 1, 2, 5g
X

be the range set of the random variable. Then, the sample space of the random
variable can be indicated as

S= -1≤X≤5

and an event A can either be characterized by an equality as


5.10 Conditional Probability Density Function 163

A= X = -1

or by an interval as

B= -1≤X<3 :

For continuous random variable, the range set of the random variable is a real
number interval. For instance,

R = ½- 20 60:
X

And similar to the discrete random variables, we can use the continuous random
variable to characterize the sample space of the continuous experiment, and an event
is defined for the given sample space. For instance, using R , we can indicate the
X
sample space of the continuous experiment as

S = - 20 ≤ X ≤ 60

where the sample space S contains an uncountable number of elements, and it


is a real number interval. And an event A of the continuous sample space can be
defined as

A = - 10 ≤ X < 20

where A denotes the interval [-10 20], which is a subset of S, i.e., A ⊂ S.


We know that

x≤X ≤x þ δ

indicates an event. Now consider the events, A= a≤X≤b and


B = x ≤ X ≤ x þ δ , then

A \ B = a ≤ X~ ≤ b \ x ≤ X~ ≤ x þ δ
x ≤ X~ ≤ x þ δ if x 2 ½a b ð5:46Þ
→A \ B=
0 otherwise

That is,

B if x 2 A
A \ B= ð5:47Þ
0 otherwise:
164 5 Continuous Random Variables

Conditional Probability Density Function


Let S be the sample space of a continuous experiment, i.e., an interval, and A be an
event, i.e., a sub-interval contained in S, i.e., A ⊂ S.
Previously we defined the probability density function f(x) in (5.48) as

1
f ðxÞ = lim Prob x ≤ X ≤ x þ δ : ð5:48Þ
δ→0 δ

Similarly, the conditional probability density function conditioned on event


A = a ≤ X ≤ b can be defined as

1
f ðxjAÞ = lim Prob x ≤ X ≤ x þ δjA ð5:49Þ
δ→0 δ

which can be written as

1 Prob x≤X ≤x þ δ \ a≤X ≤b


f ðxjAÞ = lim ð5:50Þ
δ→0 δ ProbðAÞ

where using the result in (5.46), we get

lim Prob x ≤ X~ ≤ x þ δ =δ
δ→0
f ðxjAÞ = if x 2 ½a b ð5:51Þ
ProbðAÞ
0 otherwise:

which can be written as

f ðxÞ
if x 2 A
f ðxjAÞ = ProbðAÞ ð5:52Þ
0 otherwise

where A = [a b].
Example 5.14: Probability density function, i.e., f(x), of a continuous random
variable is depicted in Fig. 5.21.
The events A, B, and C are defined as

Fig. 5.21 Probability f (x )


density function for
Example 5.14
1/3

0 1 2 3
5.10 Conditional Probability Density Function 165

A= 0≤X <1 B= 1≤X <2 C= 2≤X <3 :

Find the conditional distributions

f ðxjAÞ f ðxjBÞ f ðxjC Þ

and verify that

f ðxÞ = f ðxjAÞProbðAÞ þ f ðxjBÞProbðBÞf ðxjCÞProbðC Þ:

Solution 5.14: The events given in the question can be written as intervals, i.e.,

A = ½ 0 1 B = ½1 2 C = ½2 3:

The probabilities of the events A, B, and C can be calculated as

1 1
1 1
ProbðAÞ = f ðxÞdx → ProbðAÞ = dx → ProbðAÞ =
0 0 3 3
3 3
1 1
ProbðBÞ = f ðxÞdx → ProbðBÞ = dx → ProbðBÞ =
2 2 3 3
3 4
1 1
ProbðCÞ = f ðxÞdx → ProbðCÞ = dx → ProbðC Þ =
3 3 3 3

Employing the conditional probability density function definition

f ðxÞ
if x 2 A
f ðxjAÞ = ProbðAÞ
0 otherwise

for the events A, B, and C, we get

f ðxÞ
if x 2 ½0 1 3f ðxÞ if x 2 ½0 1
f ðxjAÞ = ProbðAÞ → f ðxjAÞ =
0 otherwise
0 otherwise
f ðxÞ
if x 2 ½1 2 3f ðxÞ if x 2 ½1 2
f ðxjBÞ = ProbðBÞ → f ðxjBÞ =
0 otherwise
0 otherwise
f ðxÞ
if x 2 ½0 1 3f ðxÞ if x 2 ½2 3
f ðxjAÞ = ProbðC Þ → f ðxjV Þ =
0 otherwise
0 otherwise

The graphs of f(x| A), f(x| B), and f(x| C) are depicted in Fig. 5.15.
166 5 Continuous Random Variables

f ( x | A) f (x | B) f (x | C )

1 1 1

x x x
0 1 0 1 2 0 1 2 3

Fig. 5.22 The graphs of the conditional probability density functions f(x| A), f(x| B), and f(x| C)

Fig. 5.23 The probability f (x)


density function of a
continuous random variable
1

x
0 1 2

It is clear from Figs. 5.21 and 5.22 that the probability density function f(x) can be
written in terms of the conditional probability functions and event probabilities as

f ðxÞ = f ðxjAÞProbðAÞ þ f ðxjBÞProbðBÞ þ f ðxjC ÞProbðC Þ


1 1 1
= f ðxjAÞ þ f ðxjBÞ þ f ðxjC Þ:
3 3 3

5.11 Conditional Expectation

The conditional expected value for the continuous random variable X conditioned on
event A is defined as
1
E XjA = xf ðxjAÞdx ð5:53Þ
-1

and for a function of random variable X, i.e., g X , the conditional expected value is
calculated as
1
E g X jA = gðxÞf ðxjAÞdx ð5:54Þ
-1

Example 5.15: The probability density function of a continuous random variable is


depicted in Fig. 5.23. The event A is defined as A = {0 ≤ x < 1}. Calculate E XjA
2
and E X jA .
5.11 Conditional Expectation 167

Fig. 5.24 The graph of f ( x | A)


conditional probability
density function f(x| A)
2

x
0 1

Solution 5.15: The probability of the event A can be calculated as

1
1
ProbðAÞ = f ðxÞdx → ProbðAÞ = :
0 2

For the event A = [0 1], the conditional probability can be evaluated employing

f ðxÞ
if x 2 A
f ðxjAÞ = ProbðAÞ
0 otherwise

as

f ðxÞ
if x 2 ½0 1 2f ðxÞ if x 2 ½0 1
f ðxjAÞ = 1=2 → f ðxjAÞ =
0 otherwise
0 otherwise

whose graph is depicted in Fig. 5.24.


Using the conditional probability density function in Fig. 5.24, we can calculate
E XjA as

E X~ jA = xf ðxjAÞdx
-1
1

=2 x2 dx
0
2
= :
3
168 5 Continuous Random Variables

2
Similarly, we can evaluate E X jA as

1
E X~ 2 jA = x2 f ðxjAÞdx
-1
1
=2 x3 dx
0
2
= :
4

Theorem 5.1: Let A1, A2, ⋯, AN be the disjoint events, i.e., disjoint intervals, with
P(Ai) ≥ 0, such that

S = A1 [ A2 [ ⋯ [ AN ð5:55Þ

then we have

N
f ðxÞ = ProbðAi Þf ðxjAi Þ: ð5:56Þ
i=1

Proof 5.1: Let’s define the sample space S and the events A, B, and C as

S= a≤X≤d A= a≤X<b B= b≤X<c C= c≤X ≤d

such that A, B, and C are disjoint events and

S = A [ B [ C:

The event D = x ≤ X ≤ x þ δ can be written as

D = D \ S → D = D \ ðA [ B [ C Þ → D = ðD \ AÞ [ ðD \ BÞ [ ðD \ C Þ

leading to

ProbðDÞ = ProbðD \ AÞ þ ProbðD \ BÞ þ ProbðD \ C Þ

which can be written as

ProbðDÞ = ProbðDjAÞProbðAÞ þ ProbðDjBÞProbðBÞ þ ProbðDjC ÞProbðC Þ

where multiplying both sides by 1/δ, and taking the limit as δ → 0, we obtain
5.11 Conditional Expectation 169

f ðxÞ = f ðxjAÞProbðAÞ þ f ðxjBÞProbðBÞ þ f ðxjC ÞProbðCÞ: ð5:57Þ

Equation (5.57) can be generalized as

N
f ðxÞ = ProbðAi Þf ðxjAi Þ: ð5:58Þ
i=1

Theorem 5.2: The expected value E X can be written in terms of conditional


expectation E XjA as

E X = ProbðAi ÞE XjAi ð5:59Þ


i

and in a similar manner we can express E g X as

E g X = ProbðAi ÞE g X jAi : ð5:60Þ


i

Proof 5.2: Multiplying both sides of (5.58) by x, we get

N
xf ðxÞ = x ProbðAi Þf ðxjAi Þ
i=1

which can be written as

N
xf ðxÞ = ProbðAi Þxf ðxjAi Þ ð5:61Þ
i=1

Integrating both sides of (5.61) w.r.t x, we get

1 N 1
xf ðxÞdx = ProbðAi Þ xf ðxjAi Þdx
-1 i=1 -1

which can be expressed as

N
E X = ProbðAi ÞE XjAi :
i=1
170 5 Continuous Random Variables

Fig. 5.25 The probability f (x)


density function of a
continuous random variable
1

x
0 1 2

Example 5.16: The probability density function of a continuous random variable


is depicted in Fig. 5.25. The events A and B are defined as A = 0 ≤ X < 1 and
B = 1 ≤ X ≤ 2 . Calculate E XjA and E B and verify the equalities

N
f ðxÞ = ProbðAi Þf ðxjAi Þ
i=1

and

E X = ProbðAi ÞE XjAi
i

for the given events.


Solution 5.16: If we write the sample space as S = 0 ≤ X ≤ 2 , then the events
A and B form a partition of the sample space, i.e.,

S = A [ B and A \ B = ϕ:

The probabilities of the events A and B can be calculated as

1
1
ProbðAÞ = f ðxÞdx → ProbðAÞ =
0 2
2
1
ProbðBÞ = f ðxÞdx → ProbðBÞ = :
1 2

For the events A = [0 1) and B = [1 2] the conditional probability can be


evaluated employing

f ðxÞ f ðxÞ
if x 2 A if x 2 B
f ðxjAÞ = ProbðAÞ f ðxjBÞ = ProbðBÞ
0 otherwise 0 otherwise

as
5.11 Conditional Expectation 171

Fig. 5.26 The graphs of f ( x | A) f (x | B)


conditional probability
density functions f(x| A) and
f(x| B) 2 2

x x
0 1 0 1 2

f ðxÞ
if x 2 ½0 1Þ 2f ðxÞ if x 2 ½0 1Þ
f ðxjAÞ = 1=2 → f ðxjAÞ =
0 otherwise
0 otherwise
f ðxÞ
if x 2 ½1 2 2f ðxÞ if x 2 ½1 2
f ðxjBÞ = 1=2 → f ðxjBÞ =
0 otherwise:
0 otherwise

leading to

2x if x 2 ½0 1Þ - 2x þ 4 if x 2 ½1 2
f ðxjAÞ = f ðxjBÞ =
0 otherwise 0 otherwise:

whose graphs are depicted in Fig. 5.26.


Using the conditional probability density functions f(x| A) and f(x| B) depicted
in Fig. 5.26, we can calculate the conditional expectations E XjA and E B
employing
1 1
E XjA = xf ðxjAÞdx E XjB = xf ðxjBÞdx
-1 -1

as

1 1
2
E XjA = xf ðxjAÞdx → E XjA = 2 x2 dx → E XjA =
-1 0 3
1 2
4
E XjB = xf ðxjBÞdx → E XjB = - 2x2 þ 4x → E XjB = :
-1 1 3

On the other hand, the mean value of the random variable X can be calculated
using the formulas
172 5 Continuous Random Variables

1
E X = xf ðxÞdx
-1

as

1 1 2
E X = xf ðxÞdx → E X = xf ðxÞdx þ xf ðxÞdx →
-1 0 1
1 2
1 7
E X = 2
x dx þ - x2 þ 2x dx → E X = - þ 3 → E X = 1:
0 1 3 3

Using the probability density function graphs in Figs. 5.25 and 5.26, we can show
that

1 1
f ðxÞ = f ðxjAÞ þ f ðxjBÞ → f ðxÞ = f ðxjAÞProbðAÞ þ f ðxjBÞProbðBÞ:
2 2

Using

1 1
ProbðAÞ = ProbðBÞ =
2 2

and

2 4
E X =1 E XjA = E XjB =
3 3

we can verify that

2 1 4 1
E X = E XjA ProbðAÞ þ E XjB ProbðBÞ → 1 = × þ × → 1 = 1√
3 2 3 2

5.12 Conditional Variance

The conditional variance for random variable X is defined as

2
Var XjA = E X jA - m2xjA ð5:62Þ

where
5.12 Conditional Variance 173

Fig. 5.27 Probability f (x)


density function for
Example 5.17
1/ 2

x
2 3

1 1
2
E X jA = x2 f ðxjAÞdx mxjA = E XjA = xf ðxjAÞdx: ð5:63Þ
-1 -1

Example 5.17: For a continuous random variable X, the probability density func-
tion is depicted in Fig. 5.27.
The events A and B are given as A = 0 ≤ X < 2 , B = 2 ≤ X ≤ 3 .

(a) E X = ? (b) Var X = ? (c) f(x| A) = ? f(x| B) = ?


(d) E XjA = ? (e) Var XjA = ? (f) E XjB = ? (g) Var XjB = ?
(f) Verify the equality

N
E X = ProbðAi ÞE XjAi :
i=1

Solution 5.17:

(a) The mean value of X is calculated as

1 2 3
x 1 8 5
E X = xf ðxÞdx → E X = x dx þ x dx → E X = þ →E X
-1 0 4 2 2 12 4
23
= = m:
12

(b) The variance of X is calculated as

2
Var X = E X - m2

where

1 2 3
2 2 x 1
E X = x2 f ðxÞdx → E X = x2 dx þ x2 dx
-1 0 4 2 2
174 5 Continuous Random Variables

Fig. 5.28 Conditional f ( x | A)


distribution graph

x
2

(c) The probability of the event

A= 0≤X <2

can be calculated as

2 2
ProbðAÞ = f ðxÞdx → ProbðAÞ = f ðxÞdx = 1=2:
0 0

Using

f ðxÞ
if x 2 ½a b
f ðxjAÞ = ProbðAÞ
0 otherwise

we get the probability density function conditioned on the event A as

2f ðxÞ if x 2 ½0 2
f ðxjAÞ =
0 otherwise:

The graph of f(x| A) is depicted in Fig. 5.28.


In a similar manner, the probability of the event

B= 2≤X≤3

can be calculated as

3 3
ProbðBÞ = f ðxÞdx → ProbðBÞ = f ðxÞdx = 1=2:
2 2

Using
5.12 Conditional Variance 175

Fig. 5.29 Conditional f ( x | B)


distribution graph

x
0 2 3

f ðxÞ
if x 2 ½a b
f ðxjBÞ = ProbðBÞ
0 otherwise

we get the probability density function conditioned on the event B as

2f ðxÞ if x 2 ½2 3
f ðxjBÞ =
0 otherwise:

The graph of f(x| B) is depicted in Fig. 5.29.

(d) The conditional expectation conditioned on the event A, i.e., E XjA , can be
calculated using the conditional probability density function f(x| A)

f ( x | A)

x
2

as

1 2
x
E XjA = xf ðxjAÞdx → E XjA = x dx → mxjA = E XjA
-1 0 2
2
1 8
= x2 dx → mxjA =
2 0 6
176 5 Continuous Random Variables

(e) The conditional variance conditioned on the event A, i.e., Var XjA , can be
calculated as

2
Var XjA = E X jA - m2xjA

2
where E X jA is evaluated as

1 2 2 3
2 2 x 2 x
E X jA = x2 f ðxjAÞdx → E X jA = x2 dx → E X jA = dx
-1 0 2 0 2

(f) The conditional expectation conditioned on the event B, i.e., E XjB , can be
calculated using the conditional probability density function f(x| B)

f ( x | B)

x
0 2 3

as

1 3
5
E XjB = xf ðxjBÞdx → E XjB = xdx → mxjB = E XjB =
-1 2 2

(g) The conditional variance conditioned on the event B, i.e., Var XjB , can be
calculated as

2
Var XjB = E X jB - m2xjB

2
where E X jB is evaluated as

1 3
2 2 2 19
E X jB = x2 f ðxjBÞdx → E X jB = x2 dx → E X jB = :
-1 2 3
5.12 Conditional Variance 177

Finally, the conditional variance is calculated as

19 25 1
Var XjB = - → Var XjB = :
3 4 12

(h) Now, let’s verify

N
E X = ProbðAi ÞE XjAi :
i=1

We found that

8 5
E XjA = E XjB =
6 2
1
ProbðAÞ = ProbðBÞ =
2
23
E X = :
12

Expanding

N
E X = ProbðAi ÞE XjAi
i=1

for N = 2, we get

E X = ProbðA1 ÞE XjA1 þ ProbðA2 ÞE XjA1

where using A1 = A and A2 = B for our case, i.e.,

E X = ProbðAÞE XjA þ ProbðBÞE XjB


1 8 1 5
2 6 2 2

we obtain

8 5 23
E X = þ →E X = √
12 4 12

which agrees with the previous mean value calculation result.


178 5 Continuous Random Variables

Fig. 5P.1 Probability f (x)


density function of a
continuous random variable c

x
0 1 3

Fig. 5P.2 Probability f (x)


density function of a
continuous random variable
1

0 x
1 1

Problems

1. The probability density function of a continuous random variable X is depicted


in Fig. 5P.1.
(a) Find the value of constant c.
(b) Calculate the probabilities

P X≤1 , P 1≤X≤3 , P X≤2 , P 1≤X≤2 :

(c) Find and draw the cumulative distribution function F(x) of this random
variable.
2. A continuous uniform random variable is defined on the interval [-2 6]. Draw
the graph of the probability density function of this random variable. Calculate
and draw the cumulative distribution function of this random variable.
3. The probability density function of a continuous random variable X is given as

K
f ð xÞ = 0≤x≤1
x1=40 otherwise:

Find the value of K, and obtain the cumulative distribution function.


Problems 179

Fig. 5P.3 Probability f (x )


density function of a
continuous random variable
1/ 2

x
0 1 4

Fig. 5P.4 Probability f (x)


density function of a
continuous random variable 1/ 2

2 0 2 x

4. The probability density function of a continuous random variable X is depicted


in Fig. 5P.2.
(a) Without mathematically calculating, guess the mean value of this random
variable.
(b) Calculate the mean value mathematically and check your guess you made in
part-a.
5. The probability density function of a continuous random variable X is depicted
in Fig. 5P.3.
(a) Without mathematically calculating, guess the mean value of this random
variable.
(b) Calculate the mean value mathematically and check your guess you made in
part-a.

6. The probability density function of a continuous random variable X is depicted


in Fig. 5P.4.
(a) Without mathematically calculating, guess the variance of this random
variable.
(b) Calculate the variance mathematically and check your guess you made in
part-a.
(c) Calculate and draw the cumulative distribution function for this random
variable.
7. The cumulative distribution function of a continuous random variable X is
depicted in Fig. 5P.5
180 5 Continuous Random Variables

Fig. 5P.5 Cumulative F (x )


distribution function of a
continuous random variable
1
3/ 4

1/ 4
x
1 0 1 2 3

Fig. 5P.6 Cumulative F ( x)


distribution function of a
continuous random variable
1
3/ 4

1/ 4

x
1 0 1 2

(a) Calculate and draw the probability density function for this random variable.
(b) Calculate the probabilities

P -1≤X≤1 P 1≤X ≤2 P 0≤X ≤2 P X≥1 :

.
8. The cumulative distribution function of a continuous random variable X is
depicted in Fig. 5P.6.
(a) Calculate and draw the probability density function for this random variable.
(b) Find the mean value and variance of this random variable.
9. The probability density function of a continuous random variable X is depicted
in Fig. 5P.7.
(a) Find the value of the constant a.
(b) Calculate and draw the cumulative distribution function of this random
variable.
10. The probability density function of a continuous random variable X is depicted
in Fig. 5P.8. The events A, B, and C are defined as
Problems 181

Fig. 5P.7 Probability f (x)


density function of a
continuous random variable 2a
a

x
1 0 1 2 5

Fig. 5P.8 Probability f ( x)


density function of a
continuous random variable
1/ 4

x
0 1 2 3 4 5

A= 1≤X <2 B= 2≤X <4 C= 4≤X <5 :

(a) Calculate the conditional probability density function

f ðxjAÞ, f ðxjBÞ, f ðxjCÞ:

(b) Calculate the conditional expectations

E XjA , E XjB , E XjC :

(c) Calculate the conditional variance

Var XjA , Var XjB , Var XjC :

(d) Are the events A, B, and C disjoint events?


(e) Verify the equalities

f ðxÞ = f ðxjAÞProbðAÞ þ f ðxjBÞProbðBÞ þ f ðxjCÞProbðC Þ,


E ðxÞ = E XjA ProbðAÞ þ E XjB ProbðBÞ þ E XjC ProbðC Þ:
Chapter 6
More Than One Random Variables

6.1 More Than One Continuous Random Variable


for the Same Continuous Experiment

We can define more than one random variable for the sample space of the continuous
~ and Y~ be continuous random variables defined on the same sample
experiment. Let X
~
space. The joint probability density function of the continuous random variables X
~
and Y is defined as

1 ~ ≤ x þ δx , y ≤ Y~ ≤ y þ δy
f ðx, yÞ = lim Prob x ≤ X ð6:1Þ
δx → 0 δx δy
δy → 0

where

~ ≤ x þ δx and y ≤ Y~ ≤ y þ δy
x≤X ð6:2Þ

are events, i.e., subsets of the continuous sample space. Note that for continuous
experiments, events are defined using real number intervals.
~ and Y~ be as
Let the range sets, i.e., intervals, of the random variables X

RX~ = ½xb xe  RY~ = ½yb ye :

Then, it is clear that

~ ≤ xe = yb ≤ Y~ ≤ ye
S = xb ≤ X

and

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 183
O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0_6
184 6 More Than One Random Variables

~
~
X() Y( )

S = [ m n] R ~ = [ xb xe ] S = [m n] R ~ = [ yb ye ]
X Y

Fig. 6.1 Two continuous random variables defined on the same continuous sample space

~ ≤ xe , yb ≤ Y~ ≤ ye = ProbðSÞ = 1:
Prob xb ≤ X

In Fig. 6.1, the concept of continuous random variables on a continuous sample


space is illustrated.
Now consider the events

~ ≤ b B = c ≤ Y~ ≤ d :
A= a≤X

The probability

ProbðA \ BÞ

is calculated as

b d
ProbðA \ BÞ = f ðx, yÞdxdy
a c

which can be written as

b d
~ ≤ b, c ≤ Y~ ≤ d =
Prob a ≤ X f ðx, yÞdxdy ð6:3Þ
a c

The probability expression in (6.3) can also be expressed as

~ Y~ 2 D =
Prob X, f ðx, yÞdxdy ð6:4Þ
ðx, yÞ2D

where D indicates a region in two-dimensional space, as shown in Fig. 6.2.


6.1 More Than One Continuous Random Variable for the Same Continuous Experiment 185

Fig. 6.2 A region D in y


two-dimensional space
d
D
c
x
a b

Fig. 6.3 The triangle region y


on which the probability
function is defined.
1
D
x
1

Properties
1. The total volume of the geometric shape under the function f(x, y) equals 1, i.e.,

1 1
f ðx, yÞdxdy = 1: ð6:5Þ
-1 -1

2. Marginal probability density functions f(x) and f( y) can be obtained from the
joint probability density function as

1 1
f ðxÞ = f ðx, yÞdy f ðyÞ = f ðx, yÞdx: ð6:6Þ
-1 -1

Example 6.1: The joint probability density function f(x, y) of two continuous
random variables X ~ and Y~ is a constant and it is defined on the region shown in
Fig. 6.3. Find f(x, y), f(x), and f( y).
Solution 6.1: The region in Fig. 6.3 is detailed in Fig. 6.4.
Using Fig. 6.4, we can mathematically write f(x, y) as

c for 0 ≤ x ≤ 1 0 ≤ y ≤ 1 - x
f ðx; yÞ =
0 otherwise

where c is a constant. Using the property


186 6 More Than One Random Variables

Fig. 6.4 The triangle region y


in detail
x  y=1
1
y = 1 x

x
x 1

1 1
f ðx, yÞdxdy = 1
-1 -1

we get

1 1-x
cdydx = 1
x=0 y=0

where calculating the inner integral first, we obtain

1
cð1- xÞdx = 1
x=0

leading to

1
c 1- =1
2

from which, we obtain

c = 2:

Then, joint probability density function happens to be

2 for 0 ≤ x ≤ 1 0 ≤ y ≤ 1 - x
f ðx; yÞ =
0 otherwise:

Note that

1 1-x
dxdy
x=0 y=0

is nothing but the area of the triangle in Fig. 6.3.


6.2 Conditional Probability Density Function 187

The marginal probability density function f(x) can be obtained using


1
f ðxÞ = f ðx, yÞdy
-1

in which using x + y = 1 → y = 1 - x → 0 ≤ y ≤ 1 - x, we get

1-x
f ðxÞ = 2dy → f ðxÞ = 2ð1- xÞ 0 ≤ x ≤ 1:
0

The marginal probability density function f( y) in a similar manner can be


obtained using
1
f ðyÞ = f ðx, yÞdx
-1

in which employing x + y = 1 → x = 1 - y → 0 ≤ x ≤ 1 - y, we obtain

1-y
f ðyÞ = 2dx → f ðyÞ = 2ð1- yÞ 0 ≤ y ≤ 1:
0

Thus, we got

f ðxÞ = 2ð1 - xÞ 0 ≤ x ≤ 1,
f ðyÞ = 2ð1 - yÞ 0 ≤ y ≤ 1:

Note that it can be shown for the calculated f(x) and f( y); we have
1 1
f ðxÞdx = 1 f ðyÞdy = 1:
-1 -1

6.2 Conditional Probability Density Function

~
The conditional probability density function of two continuous random variables X
and Y~ is defined as

f ðx, yÞ
f ðxjyÞ = : ð6:7Þ
f ðyÞ
188 6 More Than One Random Variables

Example 6.2: Show that


1
f ðxjyÞdx = 1:
-1

Solution 6.2: Substituting

f ðx, yÞ
f ðxjyÞ =
f ðyÞ

into
1
f ðxjyÞdx
-1

we get
1
f ðx, yÞ
dx
- 1 f ðyÞ

which can be written as


1
1
f ðx, yÞdx
f ðyÞ -1

where employing
1
f ðyÞ = f ðx, yÞdx
-1

we obtain

1
f ðyÞ → 1:
f ð yÞ
6.3 Conditional Expectation 189

Fig. 6.5 A triangle region y


on which the probability
density function of a random
variable is defined 1
D
x
1

6.3 Conditional Expectation

~ on condition Y~ = y is defined as
The conditional expectation of X

1
~ Y~ = y =
E Xj xf ðxjyÞdx ð6:8Þ
-1

~ Y~ = y :
which can be considered as a function of y, i.e., gðyÞ = E Xj
And in a similar manner, the conditional expectation of g X ~ on condition
~
Y = y is defined as

1
~ jY~ = y =
E g X gðxÞf ðxjyÞdx: ð6:9Þ
-1

Example 6.3: The joint probability density function f(x, y) of two continuous
~ and Y~ is defined on the region shown in Fig. 6.5 as f(x, y) = 2.
random variables X
The marginal probability density functions f(x) and f( y) are equal to

f ðxÞ = 2ð1 - xÞ 0 ≤ x ≤ 1,
f ðyÞ = 2ð1 - yÞ 0 ≤ y ≤ 1:

~ Y~ = y , and E Yj
Find f(x| y), E Xj ~X ~ =x :

Solution 6.3: We first calculate the conditional probability functions and then find
the conditional expectations as
190 6 More Than One Random Variables

f ðx; yÞ 2 1
f ðxjyÞ = → f ðxjyÞ = → f ðxjyÞ = 0 ≤ x ≤ 1 - y:
f ðyÞ 2ð 1 - yÞ 1-y
1
E X~ jY~ = y = xf ðxjyÞdx →
-1
1-y
1 1-y
E X~ jY~ = y = dx → E X~ jY~ = y =
x 0 ≤ y ≤ 1:
0 1-y 2
f ðx; yÞ 2 1
f ðyjxÞ = → f ðyjxÞ = → f ðyjxÞ = 0 ≤ y ≤ 1 - x:
f ðxÞ 2ð 1 - xÞ 1-x
1
E Y~ jX~ = x = yf ðyjxÞdy →
-1
1-x
1 1-x
E Y~ jX~ = x = y dy → E X~ jY~ = E Y~ jX~ = x y = 0 ≤ x ≤ 1:
0 1-x 2

Let’s summarize some concepts we have learned up to now as properties.


Properties
~ and g X,
1. Expected values of g X ~ Y~ can be calculated as

1 1 1
~
E g X = ~ Y~
gðxÞf ðxÞdx E g X, = gðx, yÞf ðx, yÞdxdy: ð6:10Þ
-1 -1 -1

~ jY~ = y and E g X,
2. The conditional expected functions E g X ~ Y~ jY~ = y can be
evaluated as

1
E g X~ jY~ = y = gðxÞf ðxjyÞdx
-1 ð6:11Þ
1 1
E g X~ ; Y~ jY~ = y = gðx; yÞf ðxjyÞdxdy:
-1 -1

3. Expectation is a linear function, i.e.,

~ þ bY~ = aE X
E aX ~ þ bE Y~ : ð6:12Þ

4. The joint probability density function satisfies

f ðx, yÞ = f ðxÞf ðyjxÞ f ðx, yÞ = f ðyÞf ðxjyÞ: ð6:13Þ


6.4 Conditional Expectation 191

5. Let A be an interval on x - axis, then

~ 2 Ajy =
Prob X f ðx, yÞdx: ð6:14Þ
A

6. If D is a region on the x – y plane, then

~ Y~ 2 D =
Prob X, f ðx, yÞdxdy: ð6:15Þ
ðx, yÞ2D

6.3.1 Bayes’ Rule for Continuous Distribution

The Bayes rule for the conditional probability density function of continuous
random variables is given as

f ðx, yÞ f ðxÞf ðyjxÞ f ðxÞf ðyjxÞ


f ðxjyÞ = → f ðxjyÞ = → f ðxjyÞ = → ð6:16Þ
f ðyÞ f ð yÞ
f ðx, yÞdx

f ðxÞf ðyjxÞ
f ðxjyÞ = : ð6:17Þ
f ðxÞf ðyjxÞdx

6.4 Conditional Expectation

The conditional expectation

~ Y~ = y
E Xj

is calculated as
1
~ Y~ = y =
E Xj xf ðxjyÞdx: ð6:18Þ
-1
192 6 More Than One Random Variables

The result of

~ Y~ = y
E Xj

~ Y~ = y also changes. Then,


depends on y. That is, as y changes, so the value of E Xj
~ Y~ = y as a function of y, i.e.,
we can denote E Xj

~ Y~ = y :
gðyÞ = E Xj ð6:19Þ

~ i.e., not only y but also the other values in


Now, let’s consider other values of Y,
~ Then, we can write
the range set of Y.

~ Y~ :
g Y~ = E Xj

~ Y~ can be considered as a function of random variable Y,


That is, E Xj ~ i.e., as
~
g Y .
~ is defined as
The conditional expected value for a function of X
1
~ jY~ = y =
E g X gðxÞf ðxjyÞdx: ð6:20Þ
-1

~ 2 jY~ = y is calculated as
Example 6.4: E X

1
~ 2 jY~ = y =
E X x2 f ðxjyÞdx:
-1

~ Y~
Expected Value of E Xj
~ Y~ equals E X
Property: The expected value of E Xj ~ , i.e.,

~ = E E Xj
E X ~ Y~ : ð6:21Þ

~ Y~ can be calculated using the formula


Proof: The mean value of g Y~ = E Xj

1
E g Y~ = gðyÞf ðyÞdy
-1

which can be written as


6.4 Conditional Expectation 193

1
~ Y~
E E Xj = ~ Y~ = y f ðyÞdy:
E Xj
-1

Substituting
1
~ Y~ = y =
E Xj xf ðxjyÞdx
-1

into
1
~ Y~ = y f ðyÞdy
E Xj
-1

we obtain
1 1
xf ðxjyÞdxf ðyÞdy
-1 -1

which can be written as


1 1
xf ðxjyÞf ðyÞdxdy
-1 -1
f ð x, yÞ

leading to

1 1
x f ðx, yÞdy dx
-1 -1
= f ð xÞ

resulting in
1
xf ðxÞdx
-1

which is nothing but

~ :
E X

Hence, we got
194 6 More Than One Random Variables

1
~ =
E X ~ Y~ = y f ðyÞdy
E Xj ð6:22Þ
-1

which can be expressed as

~ = E E Xj
E X ~ Y~ :

~ and functions of X,
The result in (6.22) can be generalized for functions of X ~ Y~ as

1
E g X~ = E g X~ jY~ = y f ðyÞdy
-1 ð6:23Þ
1
E g X~ ; Y~ = E g X~ ; Y~ jY~ = y f ðx; yÞdy:
-1

For discrete random variables, (6.22) can be written as

~ =
E X ~ Y~ = y pðyÞ:
E Xj ð6:24Þ
y

Example 6.5: Calculate the variance of conditional expectation, i.e., find

~ Y~ :
Var E Xj

Solution 6.5: Since g Y~ = E Xj


~ Y~ , Var g Y~ can be calculated as

Var g Y~ g Y~ - E g Y~
2 2
=E

~ Y~ for g Y~ , we obtain
in which substituting E Xj

~ Y~
Var E Xj =E ~ Y~
E Xj
2
- ~ Y~
E E Xj
E ðX

leading to

~ Y~
Var E Xj =E ~ Y~
E Xj
2
~
- E X
2
: ð6:25Þ

Example 6.6: The joint probability density function f(x, y) of two continuous
~ and Y~ is defined on the region shown in Fig. 6.6 as f(x, y) = 2.
random variables X
The marginal probability density functions f(x), f( y) and conditional f(x| y), f(y| x) are
equal to
6.4 Conditional Expectation 195

Fig. 6.6 A triangle region y

1
D
x
1

f ðxÞ = 2ð1 - xÞ 0 ≤ x ≤ 1
f ðyÞ = 2ð1 - yÞ 0 ≤ y ≤ 1
1
f ðxjyÞ = 0≤x≤1-y
1-y
1
f ðyjxÞ = 0 ≤ y ≤ 1 - x:
1-x
~ Y~ and E E Xj
Find E Xj ~ Y~ : Verify that

~ = E E Xj
E X ~ Y~ :

~ Y~ = y as
Solution 6.6: First, let’s calculate the conditional expected term E Xj

1
E X~ jY~ = y = xf ðxjyÞdx →
-1
1-y
1 1-y
E X~ jY~ = y = x dx → E X~ jY~ = y = 0 ≤ x ≤ 1 - y:
0 1-y 2

Then, we can write that

~
~ Y~ = 1 - Y :
E Xj
2
~ Y~
E E Xj can be calculated as

1
E E X~ jY~ = E X~ jy f ðyÞdy →
-1
1
1-y 1
E E X~ jY~ = 2ð1 - yÞdy → E E X~ jY~ = :
0 2 3

~ using
We can evaluate E X
196 6 More Than One Random Variables

1
~ =
E X xf ðxÞdx
-1

as

1
~ =
E X ~ = 1:
x2ð1- xÞdx → E X
0 3

Hence, we see that

~ = E E Xj
E X ~ Y~ :

6.5 Conditional Variance

The conditional variance

~
Var Xjy

is defined as

~ =E X
Var Xjy ~
~ 2 jy - E Xjy 2
ð6:26Þ

~
which can be considered as a function of y, i.e., as y changes the value of Var Xjy
~
changes. Then we can denote Var Xjy as

~
gðyÞ = Var Xjy

from which, we can write

g Y~ = Var Xj
~ Y~ :

That is,

~ Y~ = E X
Var Xj ~ 2 jY~ - E Xj
~ Y~ 2
: ð6:27Þ
6.5 Conditional Variance 197

Example 6.7: Calculate the expected value of conditional variance, i.e., find

~ Y~ :
E Var Xj

Solution 6.7: Since g Y~ = Var Xj


~ Y~ , E g Y~ can be calculated as

1
E g Y~ = gðyÞf ðyÞdy
-1

which can be written as


1
~ Y~
E Var Xj = ~ f ðyÞdy
Var Xjy
-1

~ 2 jy - E Xjy
in which substituting E X ~ 2
~ , we obtain
for Var Xjy

1
~ Y~
E Var Xj = ~ 2 jy - E Xjy
E X ~ 2
f ðyÞdy
-1

which can be written as


1 1
~ Y~
E Var Xj = ~ 2 jy f ðyÞdy -
E X ~
E Xjy
2
f ðyÞdy
-1 -1

from which, we can write

~ Y~
E Var Xj ~ 2 jY~
=E E X -E ~ Y~
E Xj
2

~2
E X

which is simplified as

~ Y~
E Var Xj ~2 - E
=E X ~ Y~
E Xj
2
ð6:28Þ

In (6.29), we obtained that

~ Y~
Var E Xj =E ~ Y~
E Xj
2
~
- E X
2
ð6:29Þ

Summing (6.28) and (6.29), we get


198 6 More Than One Random Variables

Fig. 6.7 A triangle region y

1
D
x
1

~ Y~
E Var Xj ~ Y~
þ Var E Xj ~2 - E X
=E X ~ 2

VarðX

which can be written as

~ = E Var Xj
Var X ~ Y~ ~ Y~ :
þ Var E Xj ð6:30Þ

~ can be expressed using another random variable Y~ as


Theorem 6.1: Variance of X

~ = E Var Xj
Var X ~ Y~ ~ Y~ :
þ Var E Xj ð6:31Þ

Proof of this theorem is provided in the previous example.


Exercise: The joint probability density function f(x, y) of two continuous random
~ and Y~ is defined on the region shown in Fig. 6.7 as f(x, y) = 2. The
variables X
marginal probability density functions f(x) and f( y) are equal to

f ðxÞ = 2ð1 - xÞ 0 ≤ x ≤ 1,
f ðyÞ = 2ð1 - yÞ 0 ≤ y ≤ 1:

Find

~ ~
E Var XjY ~2 - E
=E X ~ ~
E XjY
2

~ Y~
Var E Xj =E ~ Y~
E Xj
2
~
- E X
2

and verify that

~ = E Var Xj
Var X ~ Y~ ~ Y~ :
þ Var E Xj
6.6 Independence of Continuous Random Variables 199

6.6 Independence of Continuous Random Variables

~ and Y~ are independent of each other, then we


If the continuous random variables X
have

f ðx, yÞ = f x ðxÞf y ðyÞ: ð6:32Þ

When f(x, y) = f(x| y)f( y) and f(x, y) = f(y| x)f(x) are substituted into (6.32), we get

f ðxjyÞ = f x ðxÞ ð6:33Þ

and

f ðyjxÞ = f y ðyÞ ð6:34Þ

respectively.
~ and Y~ are independent random variables and A = a ≤ X
If X ~ ≤ b , B = c ≤ Y~ ≤ d
are two events, then we have

b d
Prob a ≤ X~ ≤ b; c ≤ Y~ ≤ d = f ðx; yÞdydx →
a c
b d
Prob a ≤ X~ ≤ b; c ≤ Y~ ≤ d = f x ðxÞf y ðyÞdydx →
a c
d
b
Prob a ≤ X~ ≤ b; c ≤ Y~ ≤ d = f x ðxÞdx f y ðyÞdy →
a
c
Prob a ≤ X~ ≤ b; c ≤ Y~ ≤ d = Prob a ≤ X~ ≤ b Prob c ≤ Y~ ≤ d :

The above equality can be derived in an alternative way as follows. Let A = [a b],
B = [c d] be the event, then we have

Prob X~ 2 A; Y~ 2 B = f ðx; yÞdxdy →


x2A, y2B

Prob X~ 2 A; Y~ 2 B = f x ðxÞf y ðyÞdxdy →


x2A, y2B

Prob X~ 2 A; Y~ 2 B = f x ðxÞdx f y ðyÞdy →


x2A y2B
Prob X~ 2 A; Y~ 2 B = Prob X~ 2 A Prob Y~ 2 B :
200 6 More Than One Random Variables

~ and Y,
For independent X ~ we also have

~ k Y~
E g X ~ E k Y~ :
=E g X ð6:35Þ

~ and Y~ are independent continuous random variables, then we


Theorem 6.2: If X
have

~ þ Y~ = Var X
Var X ~ þ Var Y~ : ð6:36Þ

Proof 6.2: The variance of Z~ = g X,


~ Y~ can be calculated using

~ Y~
Var g X, =E ~ Y~
g X,
2
- m2

where
1 1
E ~ Y~
g X,
2
= ½gðx, yÞ2 f ðx, yÞdxdy
-1 -1

and
1 1
~ Y~
m = E g X, = gðx, yÞf ðx, yÞdxdy:
-1 -1

~ Y~ = X
Let g X, ~ þ Y,
~ then the mean value of g X,
~ Y~ can be calculated as

1 1
m = E X~ þ Y~ = ðx þ yÞf ðx; yÞdxdy →
-1 -1
1 1 1 1
m= xf ðx; yÞdxdy þ xf ðx; yÞdxdy →
-1 -1 -1 -1
1 1
m= xf x ðxÞdx þ yf y ðyÞdy
-1 -1
m= mx þ my :

We can compute E ~ þ Y~
X
2
as
6.7 Joint Cumulative Distribution Function 201

1 1
2
E XþY = ðx þ yÞ2 f ðx, yÞdxdy →
-1 -1
1 1 1 1
= x2 f ðx, yÞdxdy þ y2 f ðx, yÞdxdy
-1 -1 -1 -1
1 1
þ 2xyf x ðxÞf y ðyÞdxdy
-1 -1
1 1 1 1
= x2 f x ðxÞdx þ y2 f y ðyÞdy þ 2 xf x ðxÞdx yf y ðyÞdy
-1 -1 -1 -1
2 2
=E X þE Y þ 2E X E Y :

~ þ Y~ using
Finally, we can calculate Var X

~ þ Y~ = E
Var X ~ þ Y~
X
2
- m2

as

2 2 2
Var X þ Y = E X þE Y þ 2E X E Y - mx þ my
2 2
=E X - m2x þ E Y - m2y þ 2E X E Y - 2mx my
0
Var X Var Y

= Var X þ Var Y :

6.7 Joint Cumulative Distribution Function

~ and Y~ is
The joint cumulative distribution function of continuous random variables X
defined as

~ ≤ x, Y~ ≤ y
F ðx, yÞ = Prob X ð6:37Þ

which can be expressed in terms of the probability density function f(x, y) as


x y
~ ≤ x, Y~ ≤ y =
F ðx, yÞ = Prob X f ðr, sÞdsdr ð6:38Þ
-1 -1

and joint probability density function f(x, y) can be obtained from its joint cumulative
distribution function via

2
∂ F ðx, yÞ
f ðx, yÞ = : ð6:39Þ
∂x∂y
202 6 More Than One Random Variables

6.7.1 Three or More Random Variables

We can define any number of random variables for a continuous experiment.


Assume that we have three random variables X, ~ Y,
~ and Z.
~ For the given intervals
A = [a b], B = [c d], C = [e f], the probability

~ 2 A, Y~ 2 B, Z~ 2 C
Prob X

which can also be expressed as

~ Y,
Prob X, ~ Z~ 2 D

where D indicates the region.

a ≤ x ≤ b, c ≤ y ≤ d, e ≤ z ≤ f

can be calculated as

~ 2 A, Y~ 2 B, Z~ 2 C =
Prob X f ðx, y, zÞdxdydz
x2A, y2B, z2C

or as

~ Y,
Prob X, ~ Z~ 2 D = f ðx, y, zÞdxdydz ð6:40Þ
ðx, y, zÞ2D

or more in detail as

b d f
~ ≤ b, c ≤ Y~ ≤ d, e ≤ Z~ ≤ f
Prob a ≤ X = f ðx, y, zÞdxdydz: ð6:41Þ
a c e

Properties
1. Marginal probability density functions can be calculated from joint distributions
as

f ðxÞ = f ðx, y, zÞdydz f ðyÞ = f ðx, y, zÞdxdy ð6:42Þ


6.7 Joint Cumulative Distribution Function 203

f ðzÞ = f ðx, y, zÞdxdz ð6:43Þ

Joint distributions involving fewer variables can be evaluated from those joint
distributions involving more variables as

f ðx, yÞ = f ðx, y, zÞdz f ðx, zÞ = f ðx, y, zÞdy ð6:44Þ

f ðy, zÞ = f ðx, y, zÞdx ð6:45Þ

2. Between conditional and joint distributions, we have the relations

f ðx, y, zÞ f ðx, y, zÞ
f ðxjy, zÞ = f ðx, yjzÞ = ð6:46Þ
f ðy, zÞ f ðzÞ

f ðx, y, zÞ = f ðxjy, zÞf ðyjzÞf ðzÞ: ð6:47Þ

~ Y,
3. If the random variables X, ~ and Z~ are independent of each other, we have

f ðx, y, zÞ = f x ðxÞf y ðyÞf z ðzÞ


f ðx, yÞ = f x ðxÞf y ðyÞ
ð6:48Þ
f ðx, zÞ = f x ðxÞf z ðzÞ
f ðy, zÞ = f y ðyÞf z ðzÞ:

6.7.2 Background Information: Reminder for Double


Integration

Assume that the function z = f(x, y) is defined on the region D shown in Fig. 6.8, and
we can consider that there is a plane on the region D whose height at point (x, y)
equals f(x, y). The volume of the region whose base is indicated by D is calculated as

b gðxÞ
V= f ðx, yÞdydx: ð6:49Þ
a hðxÞ
204 6 More Than One Random Variables

Fig. 6.8 A horizontal


region surrounded by two g (x)
functions
D

h(x)
a x
b

Fig. 6.9 A vertical region


D
surrounded by two functions
b
h( y )
g ( y)

a
x

If the region D is surrounded by the functions g( y) and h( y) as in Fig. 6.9, then the
volume of the plane with base D region D whose height at point (x, y) equals to f(x, y)
is calculated as

b gðyÞ
V= f ðx, yÞdxdy: ð6:50Þ
a hðyÞ

Note that if A = [a b], then

f ðxÞdx
A

is under f(x) for a ≤ x ≤ b.


Example 6.8: The joint probability density function of two continuous random
variables is defined on the triangle, on which f(x, y) = cxy, as shown in Fig. 6.10.
Find the value of c.
Solution 6.8: If we use the property
1 1
f ðx, yÞdxdy = 1
-1 -1
6.7 Joint Cumulative Distribution Function 205

Fig. 6.10 A triangle region y


for Example 6.8

f ( x, y ) cxy
1

1 x

Fig. 6.11 The triangle y


region with its border
equations
2

g ( x) x 1

f ( x, y ) cxy
1
h( x) x 1

x
1

for the region drawn in detail in Fig. 6.11, we get

1 xþ1
cxydydx = 1
0 - xþ1

leading to

1 xþ1
c x ydy dx = 1
0 - xþ1

from which, we can determine c as 2.


206 6 More Than One Random Variables

6.7.3 Covariance and Correlation

~ Y~ is calculated either using


The covariance of two random variables X,

~ Y~ = E X
Cov X, ~ Y~ - E X
~ E Y~ ð6:51Þ

or using

Cov X, ~ - mx Y~ - my :
~ Y~ = E X ð6:52Þ

Exercise: Show that

~ - mx Y~ - my
E X ~ Y~ - E X
=E X ~ E Y~ : ð6:53Þ

Property
~ Y~ are independent of each other, then we have
If the random variables X,

~ Y~ = E X
Cov X, ~ Y~ - E X
~ E Y~ → Cov X,
~ Y~ = E X
~ E Y~ - E X
~ E Y~ → Cov X,
~ Y~ = 0:

6.7.4 Correlation Coefficient

~ Y~ is calculated as
The correlation coefficient for the random X,

~ Y~
Cov X,
ρ= ð6:54Þ
~ Var Y~
Var X

6.8 Distribution for Functions of Random Variables

~ be a continuous random variable, and Y~ be a function of X,


Let X ~ i.e., Y~ = g X
~ , and
~ The mean value of Y~ is calculated
F(x) is the cumulative distribution function of X.
as
1
E Y~ = gðxÞf ðxÞdx: ð6:55Þ
-1
6.8 Distribution for Functions of Random Variables 207

Fig. 6.12 Uniform f ( x)


distribution on the interval
[0 1]
1

x
1

~ The cumulative distribu-


Let H( y) be the cumulative distribution function of Y.
~
tion function of Y, i.e., H( y), can be found using

H ðyÞ = Prob Y ≤ y
= Prob g X ≤ y
= Prob X ≤ g - 1 ðyÞ
g - 1 ðyÞ
= f ðxÞdx
-1
-1
= F ðg ðyÞÞ:

The probability density function of Y~ can be calculated using

dH ðyÞ dF ðg - 1 ðyÞÞ dg - 1 ðyÞ


f y ð yÞ = → f y ðyÞ = → f y ð yÞ = f x g - 1 ð yÞ ð6:56Þ
dy dy dy

where fx(x) = dF(x)/dx.


Example 6.9: The random variable X ~ is continuously distributed on the interval
~ i.e., F(x) = ?
[0 1]. Find the cumulative distributive function of X,
~ can be written as
Solution 6.9: The probability density function of X

1 0≤x≤1
f ðxÞ = 0 otherwise

which is graphically depicted in Fig. 6.12.


The cumulative distribution function can be calculated as
x
F ð xÞ = f ðt Þdt → F ðxÞ = x 0 ≤ x ≤ 1
0

whose graph is depicted in Fig. 6.13.


Example 6.10: The random variable X ~ is continuously distributed on the interval
p
[0 1]. If Y~ = X ~ i.e., H( y) = ?
~ , find the cumulative distributive function of Y,
208 6 More Than One Random Variables

Fig. 6.13 Cumulative F ( x)


distribution function for the
uniform distribution defined
on the interval [0 1] 1

x
1

p
Solution 6.10: If Y~ = X~ , then we have
p
y= x

~
and for 0 ≤ x ≤ 1, we have 0 ≤ y ≤ 1. The cumulative distribution function of X
equals to

F ðxÞ = x 0 ≤ x ≤ 1:

The cumulative distribution function of Y~ can be calculated as

H ðyÞ = Prob Y ≤ y
= Prob X ≤y
= Prob X ≤ y2
= F ð y2 Þ 0 ≤ y 2 ≤ 1
= y2 0 ≤ y ≤ 1

Hence, we got

y2 0 ≤ y ≤ 1
H ð yÞ =
0 otherwise:

The probability density function of Y~ can be calculated as

dH ðyÞ 2y 0 ≤ y ≤ 1
f y ðyÞ = → f y ð yÞ =
dy 0 otherwise:

Note: The derivative of the combined function F(g( y)) can be calculated as

dF ðgðyÞÞ
= F 0 ðgðyÞÞg0 ðyÞ: ð6:57Þ
dy
6.8 Distribution for Functions of Random Variables 209

Example 6.11: If

dF ðxÞ
f x ðxÞ =
dx

find
p
dF ð yÞ
dy :
p
Solution 6.11: Let gðyÞ = y, employing

dF ðgðyÞÞ
= F 0 ðgðyÞÞg0 ðyÞ
dy

we obtain
p p
dF y p p 0 dF y 1 p
=fx y y → = p fx y :
dy dy 2 y

Example 6.12: If

dF ðxÞ
f x ðxÞ =
dx

find
p
dF ð - yÞ
dy :
p
Solution 6.12: Let gðyÞ = - y, employing

dF ðgðyÞÞ
= F 0 ðgðyÞÞg0 ðyÞ
dy

we obtain
p p
dF ð - yÞ p p 0 dF ð - yÞ p
dy =fx - y - y → dy =- 1
p
2 yfx - y :

Example 6.13: The random variable X ~ is continuously distributed. If Y~ = X


~ 2 , find
~ i.e., fy( y) in terms of the probability density
the probability density function of Y,
function fx(x).
210 6 More Than One Random Variables

Solution 6.13: The cumulative distribution function of Y~ can be calculated as

H ð yÞ = Prob Y~ ≤ y
= Prob X~ 2 ≤ y
p p
= Prob - y ≤ X~ ≤ y
p p
= F y -F - y

Using the property

dF ðgðyÞÞ
= F 0 ðgðyÞÞg0 ðyÞ
dy

the probability density function of Y~ can be calculated as


p p
dH ðyÞ dF y dF - y 1 p 1 p
= - → f y ð yÞ = p f x y - p f x - y :
dy dy dy 2 y 2 y

Example 6.14: The random variable X ~ is continuously distributed. If Y~ = aX


~ þ b,
~ i.e., fy( y) in terms of the probability
find the probability density function of Y,
density function fx(x).
Solution 6.14: For a > 0, the cumulative distribution function of Y~ can be
calculated as

H ðyÞ = Prob Y ≤ y
= Prob aX þ b ≤ y
y-b
= Prob X ≤ if a > 0
a
y-b
=F
a

For a < 0, the cumulative distribution function of Y~ can be calculated as

H ðyÞ = Prob Y ≤ y
= Prob aX þ b ≤ y
y-b
= Prob X ≥ if a < 0
a
y-b
= 1 - Prob X ≤
a
y-b
=1-F
a
6.9 Probability Density Function for Function of Two Random Variables 211

Hence, we got

y-b
F if a > 0
a
H ðyÞ =
y-b
1-F if a < 0
a

Using the property

dF ðgðyÞÞ
= F 0 ðgðyÞÞg0 ðyÞ
dy

the probability density function of Y~ can be calculated by taking the derivative of


H( y) as

1 y-b
fx if a > 0
a a
f y ð yÞ =
1 y-b
- fx if a < 0
a a

which can be written in a more compact manner as

1 y-b
f y ð yÞ = f : ð6:58Þ
jaj x a

6.9 Probability Density Function for Function of Two


Random Variables

~ which is
In this subsection, we will inspect the probability density function of Z,
obtained from two different continuous random variables by a function, i.e.,

Z~ = g X,
~ Y~ : ð6:59Þ

Example 6.15: X ~ and Y~ are two continuous random variables, and Z~ = X


~ þ Y.
~ Find
~
the probability density function of Z.
212 6 More Than One Random Variables

Solution 6.15: The cumulative distribution function of Z~ can be calculated using

F ðzÞ = Prob Z~ ≤ z ð6:60Þ

which can be written as

~ Y~ ≤ z
F ðzÞ = Prob g X, ð6:61Þ

where the right-hand side can be written as

F ðzÞ = f ðx, yÞdxdy: ð6:62Þ


D:gðx, yÞ ≤ z

When (6.62) is used for Z~ = X


~ þ Y,
~ we obtain

F ðzÞ = f ðx, yÞdxdy ð6:63Þ


D:xþy ≤ z

where the region D, on which the integration is evaluated, can be written as

- 1 < x < 1 - 1 < y ≤ z - x:

Then, the integral expression in (6.63) can be written as


1 z-x
F ðzÞ = f ðx, yÞdydx: ð6:64Þ
x= -1 y= -1

If we take the derivative of F(z) in (6.64) w.r.t. z, we get


1
f z ðzÞ = f ðx, z- xÞdx: ð6:65Þ
x= -1

Note: For reminder, Leibniz integral rule is given as

bðxÞ
d dbðxÞ daðxÞ
f ðx, yÞdy = f ðx, bðxÞÞ - f ðx, aðxÞÞ
dx aðxÞ dx dx

bðxÞ
∂f ðx, yÞ
þ dy: ð6:66Þ
aðxÞ ∂x
6.9 Probability Density Function for Function of Two Random Variables 213

Derivative of an integral with respect to the variable parameter is calculated as

d b b
∂f ðx, yÞ
f ðx, yÞdy = dy: ð6:67Þ
dx a a ∂x

Example 6.16: For the previous example, if the random variables X ~ and Y~ are
independent of each, find the probability density function of Z~ = X
~ þ Y.
~

Solution 6.16: In the previous example, we found that


1
f z ðzÞ = f ðx, z- xÞdx
x= -1

which can be written as


1
f z ðzÞ = f x ðxÞf y ðz- xÞdx
x= -1

which is nothing but the convolution of fx(x) and fy( y), i.e.,

f ðzÞ = f x ðxÞ  f y ðyÞ:

Solution-2: In the previous example, we found that


1 z-x
F ðzÞ = f ðx, yÞdydx
x= -1 y= -1

in which using f(x, y) = fx(x)fy( y), we obtain

1 z-x
F ðzÞ = f x ð xÞ f y ðyÞdy dx
x= -1 y= -1

H ð z - xÞ

leading to
1
F ðzÞ = f x ðxÞH ðz- xÞdx:
x= -1
214 6 More Than One Random Variables

Using

dF ðzÞ
f ðzÞ =
dz

we get
1
dH ðz - xÞ
f ðzÞ = f x ðxÞ dx
x= -1 dz

which can be written as


1
f ðzÞ = f x ðxÞH 0 ðz- xÞðz - xÞ0 dx
x= -1

where employing

f y ð yÞ = H 0 ð yÞ

we obtain
1
f ðzÞ = f x ðxÞf y ðz- xÞdx
x= -1

which is nothing but the convolution of fx(x) and fy( y), i.e.,

f ðzÞ = f x ðxÞ  f y ðyÞ:

6.10 Alternative Formula for the Probability Density


Function of a Random Variable

~ be a continuous random variable, and


Let X

Y~ = g X
~ : ð6:68Þ

To find the probability density function of Y~ in terms of the probability density


~ we first solve the equation
function of X,

y = gðxÞ: ð6:69Þ
6.10 Alternative Formula for the Probability Density Function of a Random Variable 215

Let the roots of (6.69) be denoted as x1, x2, ⋯, xN, i.e.,

y = g ð x 1 Þ = g ð x 2 Þ = ⋯ = g ð x N - 1 Þ = gð x N Þ ð6:70Þ

then, the probability density function of Y, ~ i.e., fy( y), can be calculated from the
~
probability density function of X, i.e., fx(x), as

f x ðx1 Þ f ðx Þ
f y ðyÞ = þ ⋯ þ x0 N : ð6:71Þ
j g0 ð x 1 Þ j jg ðxN Þj

Example 6.17: If Y~ = aX
~ þ b, find fy( y) in terms of the probability density function
fx(x).
Solution 6.17: If we solve

y = ax þ b

for x, we get the single root as

y-b
x1 = :
a

Since g(x) = ax + b, we have

g0 ðxÞ = a:

From (6.71), we can write

f x ð x1 Þ
f y ð yÞ =
jg0 ðx1 Þj

leading to

1 y-b
f y ðyÞ = f :
j aj x a

Example 6.18: If Y~ = 1=X,


~ find fy( y) in terms of the probability density function
fx(x).
216 6 More Than One Random Variables

Solution 6.18: If we solve

1
y=
x

for x, we get the single root as

1
x1 = :
y

Since g(x) = 1/x, we have

1
g0 ðxÞ = - :
x2

From (6.71), we can write

f x ð x1 Þ f ðx Þ
f y ð yÞ = → f y ðyÞ = x 1 1 → f y ðyÞ = x21 f x ðx1 Þ
jg0 ðx1 Þj x2 1

leading to

1 1
f y ð yÞ = f :
y2 x y

6.11 Probability Density Function Calculation


for the Functions of Two Random Variables Using
Cumulative Distribution Function

In this section, we explain the probability density function calculation for the
functions of two random variables using the cumulative distribution function via
some examples.
Example 6.19: If Z~ = X=
~ Y,
~ find fz(z) in terms of the joint probability density
function f(x, y).
Solution 6.19: The cumulative distribution function of Z~ can be written as

F ðzÞ = Prob Z~ ≤ z

leading to
6.11 Probability Density Function Calculation for the Functions of Two Random. . . 217

X~
F ðzÞ = Prob ≤z ~ Y~ ≤ z
$ F ðzÞ = Prob g X,
Y~

which can be calculated using

F ð zÞ = f ðx, yÞdxdy: ð6:72Þ


D = fðx, yÞjxy < zg

~ Y~ ≤ z can be calculated as
Note: Prob g X,

~ Y~ ≤ z =
Prob g X, f ðx, yÞdxdy ð6:73Þ
D = fðx, yÞjgðx, yÞ ≤ zg

The region on which the integration is performed in (6.72) can be elaborated as

x
D = ðx, yÞj < z → D1 = fðx, yÞj x < yz, y > 0g D2 = fðx, yÞjx > yz, y < 0g
y

and D = D1 [ D2. Then, the integral expression in (6.72) can be written as

F ðzÞ = f ðx, yÞdxdy þ f ðx, yÞdxdy


D1 = fðx, yÞj x < yz, y > 0g D2 = fðx, yÞjx > yz, y < 0g

which can be written as

1 yz 0 1
F ðzÞ = f ðx, yÞdxdy þ f ðx, yÞdxdy:
y=0 x= -1 y = - 1 x = yz

The probability density function fz(z) = dF(z)/dz can be calculated as

1 yz 0 1
d d
f z ðzÞ = f ðx, yÞdx dy þ f ðx, yÞdx dy
y=0 dz x= -1 y= -1 dz x = yz

leading to

1 0
f z ðzÞ = yf ðyz, yÞdy þ - yf ðyz, yÞdy
y=0 y= -1

which can be written in a more compact form as


218 6 More Than One Random Variables

1
f z ðzÞ = jyjf ðyz, yÞdy:
y= -1

Example 6.20: If Z~ = X
~ 2 þ Y~ 2 , find fz(z) in terms of the joint probability density
function f(x, y).
Solution 6.20: The cumulative distribution function of Z~ can be written as

F ðzÞ = Prob Z~ ≤ z

leading to

~ 2 þ Y~ 2 ≤ z $ F ðzÞ = Prob g X,
F ðzÞ = Prob X ~ Y~ ≤ z

which can be calculated using

F ðzÞ = f ðx, yÞdxdy ð6:74Þ


D = fðx, yÞjx2 þy2 < zg

where the region D on which the integration is performed is the area of a circle with
p
radius z and can be elaborated as

p p
D = ðx, yÞjx2 þ y2 < z → D = ðx, yÞj - z ≤ y ≤ z, - z - y2 ≤ x ≤ z - y2

Then, the integral expression in (6.74) can be written as


p p
z z - y2
F ðzÞ = p p f ðx, yÞdxdy
y = - z x = - z - y2

gðz, yÞ

which can be written as


p
z
F ðzÞ = p
gðz, yÞdy
y= - z

where
6.11 Probability Density Function Calculation for the Functions of Two Random. . . 219

p
z - y2
gðz, yÞ = p f ðx, yÞdx:
x= - z - y2

The probability density function of Z~ can be calculated as

dF ðzÞ
f ðzÞ =
dz

leading to
p
z
dgðz, yÞ
f ðzÞ = p
dy
y= - z dz

where

dgðz, yÞ
dz

can be calculated as

dgðz, yÞ 0 0
= z - y2 f z - y2 , y - - z - y2 f - z - y2 , y
dz

leading to

dgðz, yÞ 1 1
= f z - y2 , y þ f - z - y2 , y :
dz 2 z - y2 2 z - y2

Then, f(z) can be found as


p
z
1
f z ðzÞ = p
f z - y2 , y þ f - z - y2 , y dy:
y= - z 2 z - y2

Note:

bðxÞ bðxÞ
d dbðxÞ daðxÞ ∂f ðx, yÞ
f ðx, yÞdy = f ðx, bðxÞÞ - f ðx, aðxÞÞ þ dy
dx aðxÞ dx dx aðxÞ ∂x
220 6 More Than One Random Variables

Example 6.21: If Z~ = ~ 2 þ Y~ 2 , find fz(z) in terms of the joint probability density


X
function f(x, y).
Solution 6.21: The cumulative distribution function of Z~ can be written as

F ðzÞ = Prob ~ 2 þ Y~ 2 ≤ z
X

leading to

~ 2 þ Y~ 2 ≤ z2 $ F ðzÞ = Prob g X,
F ðzÞ = Prob X ~ Y~ ≤ z2 :

Following a similar approach to the previous example, we obtain the probability


density function of Z~ = ~ 2 þ Y~ 2 as
X

z
z
f z ðzÞ = f z 2 - y2 , y þ f - z 2 - y2 , y dy:
y= -z z2 - y2

~  N ð0, σ 2 Þ, Y~  N ð0, σ 2 Þ are normal random variables and independent of


If X
each other, then we have

1 x2 1 y2
f x ð xÞ = p e - 2σ2 f y ðyÞ = p e - 2σ2
2πσ 2 2πσ 2

and fz(z) happens to be


z
z
f z ðzÞ = fx z 2 - y2 f y ð yÞ þ f x - z2 - y2 f y ðyÞ dy
y= -z z2 - y2

yielding
z
z
f z ðzÞ =
y= -z z2- y2
1 z2 - y 2 1 y2 1 z2 - y2 1 y2
 p e - 2σ2 p e - 2σ2 þ p e - 2σ2 p e - 2σ2 dy
2πσ 2 2πσ 2 2πσ 2 2πσ 2

which can be simplified as


6.12 Two Functions of Two Random Variables 221

z
z 1 - 2σz22
f z ðzÞ = e dy
z2 - y2 πσ
2
y= -z

where if we let y = z cos θ, we obtain

0
z 1 - 2σz22
f z ðzÞ = - e z sin θdθ
y=π πσ 2
z2 - z2 ðcos θÞ2

which can be written as

0
z 1 - 2σz22
f z ðzÞ = - e z sin θdθ
y = π z sin θ πσ
2

resulting in

1 - 2σz22
f z ðzÞ = e z>0 ð6:75Þ
σ2

which is called Rayleigh distribution. Rayleigh distribution is used in wireless


communication. If the transmitter and receiver do not see each other, the envelope
of the received signal has Rayleigh distribution.
~  N ðmx , σ 2 Þ, Y~  N my , σ 2 are normal random variables and independent
If X

of each other, then the probability density function of Z~ = ~ 2 þ Y~ 2 equals


X

z - z22σþm2 2 zm
f z ðzÞ = e I0 2 ð6:76Þ
σ2 σ

where m2 = m21 þ m22 and


π
1
I 0 ð xÞ = ex cos θ dθ ð6:77Þ
π 0

is the Bessel function of the first kind and zeroth order.

6.12 Two Functions of Two Random Variables

~ and Y,
For two continuous random variables X ~ we define
222 6 More Than One Random Variables

Z~ = g X,
~ Y~ ~ = h X,
W ~ Y~ : ð6:78Þ

To find the joint probability density function of Z~ and W,


~ i.e., fzw(z, w), in terms of
the joint probability density function f(x, y), we perform the following steps:
First Method
Step 1: We solve the equations

z = gðx, yÞ w = hðx, yÞ

for the unknowns x and y, and denote the roots by xi, yi.
Step 2: The joint probability density function of Z~ and W
~ can be calculated using

1
f zw ðz, wÞ = f ðx , y Þ ð6:79Þ
i
jJ ðxi , yi Þj xy i i

where

∂z ∂z
J ðx; yÞ = ∂x ∂y and J ðx ; y Þ = J ðx; yÞ
j i i j j jxi , yi ð6:80Þ
∂w ∂w
∂x ∂y

Second Method
The second method can be used if the equation set z = g(x, y), w = h(x, y) has only
one pair of root for x and y.
Step 1: Using equations

z = gðx, yÞ w = hðx, yÞ

we express x and y as

x = g1 ðz, wÞ y = h1 ðz, wÞ:

Step 2: The joint probability density function of Z~ and W


~ can be calculated using

f zw ðz, wÞ = jJ ðz, wÞjf xy ðx, yÞ ð6:81Þ

where
6.12 Two Functions of Two Random Variables 223

∂x ∂x
J ðz; wÞ = ∂z ∂w : ð6:82Þ
∂y ∂y
∂z ∂w

Example 6.22: X ~ and Y~ are two continuous random variables. Using these two
random variables, we obtain the random variables Z~ and W
~ as

Z~ = X
~ þ Y~ W
~ =X
~ - Y:
~

Find fzw(z, w) in terms of fxy(x, y).


Solution 6.22: For the solution of this example, we can follow two different
approaches. Let’s solve the problem initially using the first approach, then illustrate
the solution using the second method.
First Method
Step 1: If we solve the equations

z=x þ y w=x-y

for the unknowns x and y, we find the roots as

zþw z-w
x1 = y1 = :
2 2

Step 2: The determinant of the Jacobian matrix can be calculated using

∂z ∂z
∂x ∂y
J ðx; yÞ =
∂w ∂w
∂x ∂y

as

J ðx, yÞ = 1
1
1
-1
= - 2 → J ðx1 , y1 Þ = - 2:

The joint probability density function of Z~ and W


~ can be found via
224 6 More Than One Random Variables

1
f zw ðz, wÞ = f ðx , y Þ
i
jJ ðxi , yi Þj xy i i

leading to

1
f zw ðz, wÞ = f ðx , y Þ
jJ ðx1 , y1 Þj xy 1 1

resulting in

1 z þ w z-w
f zw ðz, wÞ = f , :
2 xy 2 2

Second Method
Step 1: Using equations

z=x þ y w=x-y

we express x and y as

zþw z-w
x= y= :
2 2

Step 2: The determinant of the Jacobian matrix can be calculated using

∂x ∂x
J ðz; wÞ = ∂z ∂w
∂y ∂y
∂z ∂w

as

1 1
J ðz; wÞ = 21 2 → J ðz; wÞ = - 1
1 2
-
2 2

The joint probability density function of Z~ and W


~ can be found via

f zw ðz, wÞ = jJ ðz, wÞjf xy ðx, yÞ

leading to
6.12 Two Functions of Two Random Variables 225

1 z þ w z-w
f zw ðz, wÞ = f , :
2 xy 2 2

Example 6.23: X ~ and Y~ are two continuous random variables. Using these two
random variables, we obtain the random variables Z~

Z~ = X
~ Y:
~

Find fz(z) in terms of the joint probability density function f(x, y).
Solution 6.23: To be able to use the Jacobian approach, we need two equations.
The first one is given as

Z~ = X
~ Y:
~

For the second one, let’s invent an equation as

~ = X:
W ~

Now we can calculate fzw(z, w) as follows.


First Method
Step 1: If we solve the equations

z = xy w = x

for the unknowns x and y, we find the roots as

z
x 1 = w y1 = :
w

Step 2: The determinant of the Jacobian matrix can be calculated using

∂z ∂z
∂x ∂y
J ðx; yÞ =
∂w ∂w
∂x ∂y

as

J ðx, yÞ = y
1
x
0
= - x → J ðx1 , y1 Þ = - w:

The joint probability density function of Z~ and W


~ can be found via
226 6 More Than One Random Variables

1
f zw ðz, wÞ = f ðx , y Þ
i
jJ ðxi , yi Þj xy i i

leading to

1
f zw ðz, wÞ = f ðx , y Þ
jJ ðx1 , y1 Þj xy 1 1

resulting in

1 z
f zw ðz, wÞ = f w, :
j w j xy w

Second Method
Step 1: Using equations

z = xy w = x

we express x and y as

z
x=w y= :
w

Step 2: The determinant of the Jacobian matrix can be calculated using

∂x ∂x
J ðz; wÞ = ∂z ∂w
∂y ∂y
∂z ∂w

as

0 1 1
J ðz; wÞ = 1 z → J ðz; wÞ = - :
- 2 w
w w

The joint probability density function of Z~ and W


~ can be found via

f zw ðz, wÞ = jJ ðz, wÞjf xy ðx, yÞ

leading to
6.12 Two Functions of Two Random Variables 227

1 z
f zw ðz, wÞ = f w, :
j w j xy w

The probability density function fz(z) can be obtained from fzw(z, w) as


1 1
1 z
f z ðzÞ = f zw ðz, wÞdw → f z ðzÞ = f xy w, dw:
-1 -1 j w j w

Example 6.24: If

~
Z~ = X ~=X
~ þ Y~ W
Y~

find fzw(z, w) in terms of fxy(x, y).


Solution 6.24: If we solve the equations

x
z=x þ y w=
y

for the unknowns x and y, we find the roots as

w z
x1 = z y1 = :
wþ1 wþ1

And proceed as in the previous examples, we find

z w z
f zw ðz, wÞ = f xy z , :
ð w þ 1Þ 2 wþ1 wþ1
p
Example 6.25: If Y~ = X~ , find fy( y) in terms of fx(x).
p
Solution 6.25: When the equation y = x is solved for x, we get

x = y2

i.e., we have a single root. Then, employing the formula

f x ð x1 Þ f ðx Þ
f y ð yÞ = þ ⋯ þ x0 N
jg0 ðx1 Þj j g ð xN Þ j

for our example, we get


228 6 More Than One Random Variables

f x ð x1 Þ
f y ð yÞ =
jg0 ðx1 Þj

where fx(x1) = fx(y2) and

p 1 1
gðxÞ = x → g0 ðxÞ = p → g0 ðx1 Þ = g0 y2 = :
2 x 2y

Then, we have

f x ðy2 Þ
f y ðyÞ = → f y ðyÞ = 2yf x y2 y > 0:
j g0 ð x 1 Þ j

Example 6.26: If Y~ = X
~ , find fy( y) in terms of fx(x).
2

Solution 6.26: When the equation y = x2 is solved for x, we get


p p
x1 = y x2 = - y

i.e., we have a two roots. Then, employing the formula

f x ð x1 Þ f ðx Þ
f y ð yÞ = þ ⋯ þ x0 N
jg0 ðx1 Þj j g ð xN Þ j

for our example, we get

f x ð x1 Þ f ðx Þ
f y ð yÞ = þ x 2
jg0 ðx1 Þj jg0 ðx2 Þj
p p
where f x ðx1 Þ = f x y , f x ðx2 Þ = f x - y and
p
gðxÞ = x2 → g0 ðxÞ = 2x → g0 ðx1 Þ = g0 y2 = 2 y:

Then, we have
p p p p
fx y fx - y fx y fx - y
f y ð yÞ = 0 þ → f y ðyÞ = p þ p y > 0:
jg ðx1 Þj jg0 ðx2 Þj 2 y 2 y

Thus,
Problems 229

Fig. 6P.1 The triangle y


region on which the joint
probability density function
is defined
1

D
1 x
1

p p
fx y fx - y
f y ð yÞ = p þ p y>0
2 y 2 y
0 otherwise:

Exercises:
~
1. If Y~ = eX , find fy( y) in terms of fx(x).
2. If Y~ = - ln X,~ find fy( y) in terms of fx(x).

Problems
1. The joint probability density function f(x, y) of two continuous random variables
~ and Y~ is a constant and it is defined on the region shown in Fig. 6P.1.
X
(a) Find f(x, y), f(x), f( y), and f(x| y).
(b) Calculate E X ~ , E Y~ , and Var X ~ .
(c) Find E Xj~ Y~ = y and E Yj ~X~ =x .
(d) ~ ~
Find E XjY and E E XjY . ~ ~
(e) Find Var Xj ~ Y~ = y and Var Yj ~X ~ =x .
(f) ~ ~
Find Var XjY and verify that Var X ~ = E Var Xj
~ Y~ ~ Y~ .
þ Var E Xj
~ 1 and X
2. Assume that we have two independent normal random variables X ~ 2 , i.e.,

~ 1  N ð0, 1Þ X
X ~ 2  N ð0, 1Þ:

If Y~ = X
~1 þ X
~ 2 , what is the variance of Y~ ?
3. Assume that we have independent random variables X, ~ Y,
~ each of which is
uniformly distributed in the interval [0 1]. Find the probabilities

~
~< 3 P X
P X ~ Y~ ≤ 1 P X ≤ 1 P max X,
~ þ Y~ ≤ 3 P X ~ Y~ ≤
1 ~ < Y~ :
P X
7 8 4 Y~ 4 5

~ and Y~ have the joint probability density


4. The continuous random variables X
function
230 6 More Than One Random Variables

x
f ðx; yÞ = c if x > 0,y > 0 and þ y < 10 otherwise:
2

The events A and B are defined as

~ ≤ Y~ B = Y~ ≤ 0:5 :
A= X

(a) Determine the value of the constant c and find the following:

~ Y~ E X
PðBjAÞ PðAjBÞ E X ~ þ Y~ E Xj
~ Y~ f x ðxjBÞ Cov X,
~ Y~ :

(b) Determine the probability density function of

Y~
Z~ = :
X~

5. The continuous random variable X ~ is uniformly distributed on the interval [-


1 1]. Find a function g(x) such that Y~ = g X ~ has the probability density function
-2y
fy( y) = 2e y > 0.
6. The continuous random variable X ~ is uniformly distributed on the interval (0 1].
~ ~
If Y = - ln X, find the probability density function of Y. ~
7. The relation between continuous random variables X ~ and Y~ is given as Y~ = eX~ .
Find fy( y) in terms of fx(x).
8. The relation between continuous random variables X ~ and Y~ is given as
Y~ = j X~ j. Find fy( y) in terms of fx(x).
1
9. The relation between continuous random variables X ~ and Y~ is given as Y~ = X
~ 2.
Find fy( y) in terms of fx(x).
1
~ and Y~ is given as Y~ = X
10. The relation between continuous random variables X ~ 5.
Find fy( y) in terms of fx(x).
p
~ and Y~ is given as Y~ = 3 X
11. The relation between continuous random variables X ~.
Find fy( y) in terms of fx(x).
12. Assume that we have four continuous random variables W, ~ X, ~ Y,
~ Z,~ and the
relation among these random variables is defined as

Z~ = aX
~ þ bY~ W
~ = cX
~ þ dY:
~

Express the joint probability density function of Z~ and W,~ i.e., fzw(z, w), in
~ ~
terms of the joint probability density function of X and Y, i.e., fxy(x, y).
13. Assume that we have three continuous random variables X, ~ Y, ~ Z, ~ and the
relation among these random variables is defined as
Problems 231

Z~ = X
~ Y:
~

Find the probability density function of Z, ~ i.e., fz(z), in terms of the joint
~ and Y,
probability density function of X ~ i.e., fxy(x, y).
14. Assume the random variables X ~ and Y~ are normal random variables, i.e.,

~  N ð0, 1Þ Y~  N ð0, 1Þ:


X

If

Z~ = X
~ þ Y~ W
~ =X
~ - Y~

find the joint and marginal probability density functions of Z~ and W.


~
Bibliography

1. Athanasios Papoulis, Probability, Random Variables and Stochastic Processes 4th Edition, 2001,
ISBN-10:0073660116
2. Hwei Hsu, Schaum’s Outline of Probability, Random Variables, and Random Processes, 3rd
Edition, 2014, ISBN-10:0071822984
3. Charles Therrien (Author), Murali Tummala, Probability and Random Processes for Electrical
and Computer Engineers 2nd Edition, ISBN-10:1439826986
4. Joseph K. Blitzstein, Jessica Hwang, Introduction to Probability, Second Edition (Chapman &
Hall/CRC Texts in Statistical Science)
5. Bilal M. Ayyub, Richard H. McCuen, Probability, Statistics, and Reliability for Engineers and
Scientists 3rd Edition, ISBN-10:1439809518

© The Editor(s) (if applicable) and The Author(s), under exclusive license to 233
Springer Nature Switzerland AG 2023
O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0
Index

B Discrete uniform random variable, 103–105


Bayes’ rule, 29–34 Disjoint events, 3
for continuous distribution, 191 Double integration, 203–205
Bernoulli random variable, 102
Biased coin, 10
Binomial probabilities, 42–50 E
Binomial random variable, 100 Event, 1–2
Expectation, 146–148
Expected value, 87–91
C Experiment, 1–2
Combinations, 53–56 Exponential random variable, 152–154
Combined experiment, 8, 26
Conditional expectation, 166–172, 189–196
Conditional independence, 39–42 F
Conditional mean value, 128–130 Fair coin, 11
Conditional probability, 17–24 Functions of a random variable, 91–99, 137
density function, 162–166, 187 Functions of two random variables, 122–124
mass function, 118–120, 124–127
Conditional variance, 172–177, 196–198
Continuous experiment, 25 G
Continuous random variables, 141–181 Gaussian, 148–152
Continuous uniform random variable, 143–144 Geometric random variable, 100
Correlation coefficient, 206
Counting principle, 50–51
Covariance and correlation, 206 I
Cumulative distribution function, 80–87, Impulse function, 157
154–157 Independence, 36–39
of continuous random variables, 199–201
of random variables, 131–139
D of several events, 38
Discrete probability law, 4–5 Independent trials, 42–50
Discrete random variables, 67–109, 111–113

© The Editor(s) (if applicable) and The Author(s), under exclusive license to 235
Springer Nature Switzerland AG 2023
O. Gazi, Introduction to Probability and Random Variables,
https://doi.org/10.1007/978-3-031-31816-0
236 Index

J function, 12–14
Joint cumulative distribution function, 201–206 mass function, 76–80
Joint experiment, 5–12 Properties of cumulative distribution function,
Joint probability mass function, 113–118, 154–157
121–122

S
M Sample space, 1–2, 9
Modeling of binary communication channel, Several random variables, 136–139
61–64 Standard deviation, 87–91
Multiplication rule, 35–36 Standard random variable, 149–152

N T
Normal random variable, 148–152 Total probability theorem, 29–34
Trial, 1
Two continuous random variables, 221–228
P
Partitions, 56–61
Permutation, 51–52 U
Poisson random variable, 101 Unit step function, 158–161
Probabilistic law, 3–4
Probability, 3
axioms, 4 V
density function for function of two random Variance, 87–91, 146–148
variables, 211–214 Venn diagram, 14–16

You might also like