0% found this document useful (0 votes)
114 views156 pages

Stat 115 - Chapter 1

The document provides an overview of key concepts in probability and statistics: - A random experiment is a process with uncertain outcomes that can be repeated. The sample space contains all possible outcomes. - Parameters describe population characteristics, while statistics describe samples. Parameters can only be computed from population data, while statistics are computed from sample data to infer parameters. - Probability theory provides a framework to understand the errors inherent in making inferences about populations based only on sample data. Descriptive statistics describe samples, while inferential statistics allow inferences about populations.

Uploaded by

Chisei Meadow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views156 pages

Stat 115 - Chapter 1

The document provides an overview of key concepts in probability and statistics: - A random experiment is a process with uncertain outcomes that can be repeated. The sample space contains all possible outcomes. - Parameters describe population characteristics, while statistics describe samples. Parameters can only be computed from population data, while statistics are computed from sample data to infer parameters. - Probability theory provides a framework to understand the errors inherent in making inferences about populations based only on sample data. Descriptive statistics describe samples, while inferential statistics allow inferences about populations.

Uploaded by

Chisei Meadow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

CHAPTER 1

Preliminaries

University of the Philippines School of Statistics | 2nd Semester AY 2022-2023


Random Experiment, Random Variable
01 Sample Space and
Probability
02 and its
Distribution

03 The Binomial
Distribution 04 The Normal
Distribution
00 STAT 114 REVIEW
POPULATION vs SAMPLE
Population data = 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵
Parameter a summary measure describing a particular characteristic of the
population that is computed using population data
∑$
!"# "!
Example population mean, 𝜇 =
#
$ ∑$
!"# "! %&
%
population variance, 𝜎 =
#

Sample data = 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏
Statistic a summary measure describing a particular characteristic of the
sample that is computed using sample data
∑&
!"# "!
Example sample mean, 𝑋$ =
'
$ ∑& ( %
!"# "! %"
sample variance, 𝑠 =
'%) 4
REMARKS
o Both the parameter and statistic are summary measures that
are computed using data.
o If you have population data, then the computed summary
measure is a parameter. If you only have sample data, then
the computed summary measure is a statistic.
REMARKS
o In a statistical inquiry, the answer to the research problem is based
on the value of the parameter that describes the characteristic of
interest of the population under study.
o However, the value of this parameter can only be computed using
population data.
o If you only have sample data, you cannot compute for the value of
the parameter.
Descriptive vs Inferential
Descriptive Statistics comprise those methods concerned with
collecting, describing, and analyzing a set of data without
drawing conclusions or inferences about a larger group

Inferential Statistics comprise those methods concerned with


the analysis of sample data leading to predictions or
inferences about the population
REMARKS
o Although we cannot compute for the value of the parameter using
sample data, we can use the methods in Inferential Statistics to infer
on the value of this parameter.

o In Inferential Statistics, we compute for the value of the statistic


using sample data not for the purpose of describing the sample but
so that we can infer on the value of the parameter of interest.
REMARKS
o It should be clear that we base our inferences on partial information
about the population.
o Thus, whatever inferences we make will always be subject to some
error.
o A background on probability theory and distribution theory will
help us understand the errors that we commit in Inferential
Statistics.
Remarks on Probability Theory
o The development of probability theory was not originally intended to
be used in solving problems on inferential statistics.
o It was first developed to give answers to professional gambler’s
questions on the systematic pattern of outcomes of games involving
dice or cards that will allow them to adjust their bets to the “odds” of
success.
o This is the reason why most of the basic examples on probability
theory are die-throwing experiments and the selection in a deck of
cards.
Remarks on Probability Theory
o Today, many important phenomena that are of interest to humankind
share something in common with these games of chance.
o It is impossible to predict with certainty when such a phenomenon
will occur.
o By studying patterns, we can learn more about the behavior of the
phenomenon of interest and then be able to predict an occurrence of
a phenomenon with a certain degree of confidence.
01 Random Experiment, Sample
Space and Probability
RANDOM EXPERIMENT
A random experiment is a process that can be repeated under similar
conditions but whose outcome cannot be predicted with certainty
beforehand.
o There are many examples of random experiments. Tossing a pair of dice, tossing a coin, selecting 5
cards from a well-shuffled deck of cards, and selecting a sample of size n from a population of N
using a probability sampling method are some of them.
o Regardless of the number of times we repeat the process, it is still not possible to determine in
advance what the next outcome will be.
SAMPLE SPACE
The sample space, denoted by Ω, is the collection of all
possible outcomes of a random experiment.
An element of the sample space is called a sample point.
The sample space is a set because it is a collection of elements. In set
theory, this set is referred to as the universal set since it contains all
elements under consideration.
Specifying a set

ROSTER RULE METHOD

Ω
METHOD State a rule that the
elements must satisfy in
List down all the elements order to belong in the set
belonging in the set then then enclosing this rule in
enclosing them in braces. braces.
Illustration
Consider the random experiment of tossing a coin twice.
Using H to denote a head and T to denote a tail, we can
specify the sample space by roster method as follows:
Ω = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}
We can also use the rule method to specify the same sample
space as follows:
Ω = 𝑥, 𝑦 𝑥 ∈ 𝐻, 𝑇 , 𝑦 ∈ 𝐻, 𝑇 }
Illustration
Consider the random experiment of tossing a coin twice.
Recall that we defined our sample space previously as follows:
Ω = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}
But take note that, in the same experiment, instead of
recording what comes up on the first and second tosses, what
we can also do is to count the number of heads that will come
up in the two tosses. In this case, the sample space is
Ω = 0,1,2
Ω = 𝑥 𝑥 ∈ {0,1,2}}
REMARKS
o The previous illustration shows that the description of the sample
space is NOT UNIQUE.
o There are many ways in which we can specify the collection of all
possible outcomes of the experiment.
Which representation must we choose to use?
o Well, naturally, the choice depends on the characteristic of interest
and whatever will facilitate the assignment and computation of
probabilities.
Example
Simple random sampling is an example of a probability sampling
method.

o In simple random sampling without replacement (SRSWOR), all


possible subsets consisting of n distinct elements selected from
the N elements of the population have the same chances of
selection.

o In simple random sampling with replacement (SRSWR), all


possible ordered n-tuples (coordinates need not be distinct) that
can be formed from the N elements of the population have the
same chances of selection.
Example
Suppose the population consists of N=5 children:
Let a=Janine, b=Josiel, c=Jan, d=Eryl, and e=Eariel.

Suppose a sample of size n=2 will be selected using SRSWOR. Specify


the sample space.

We will denote a sample of size 2 by the set {x1, x2}, where x1 and x2
are the two distinct elements included in the sample.
W = {{a,b},{a,c},{a,d},{a,e},{b,c},{b,d},{b,e},{c,d},{c,e},{d,e}}

By definition of SRSWOR, all the 10 sample points (samples) will be


given equal chances of selection.
Example
In general, a sample of size 𝑛 selected using SRSWOR will be
denoted by a set containing 𝑛 distinct elements, {x1,x2,…,xn},
where the xis are the elements selected in the sample.

When the sample of size 𝑛 is selected from a population of size


𝑁 using SRSWOR, then the sample space will contain
N N−1 N−2 … N−n+1
n n−1 n−2 … 2 1
sets containing 𝑛 elements, and by definition, all of them will be
given equal chances of selection.
Example
Suppose the population consists of N=5 children:
Let a=Janine, b=Josiel, c=Jan, d=Eryl, and e=Eariel.

Suppose a sample of size n=2 will be selected using SRSWR. Specify the sample
space.
We will denote a sample of size 2 by an ordered pair, (x1,x2), where x1 is the
element selected on the first draw while x2 is the element selected on the second
draw.
W ={(a,a),(a,b),(a,c),(a,d),(a,e),(b,a),(b,b),(b,c),(b,d),(b,e),(c,a), (c,b),(c,c),(c,d), (c,e), (d,a),
(d,b), (d,c), (d,d), (d,e),(e,a), (e,b),(e,c),(e,d), (e,e)}
By definition of SRSWR, all the 25 sample points (samples) will be given equal
chances of selection.
Example
In general, a sample of size 𝑛 selected using SRSWR will be
denoted by an ordered n-tuple (x1,x2,…,xn) where xi is the
element selected on the ith draw.
When the sample of size 𝑛 is selected from a population of size
𝑁 using SRSWR, then the sample space will contain 𝑵𝒏 ordered
n-tuples and by definition, all of them will be given equal
chances of selection.
EVENT
An event is a subset of the sample space whose probability is
defined.
We say that an event occurred if the outcome of the
experiment is one of the sample points belonging in the event.
Otherwise, the event did not occur.
We will use any capital Latin letter (A,B,C,...) to denote an event of interest.
Illustration
Consider the experiment of rolling a die. The sample space is given to
be:
Ω = {1, 2, 3, 4, 5, 6}
Let A = event of observing odd number of dots in a roll of a die
= {1, 3, 5}
B = event of observing even number of dots in a roll of a die
= {2, 4, 6}
C = event of observing less than 3 dots in a roll of a die
= {1, 2}
Example
The population consists of 𝑁 = 5 children: a=Janine, b=Josiel, c=Jan, d=Eryl, and
e=Eariel. Suppose a sample of size n=2 will be selected using SRSWR.

W ={(a,a),(a,b),(a,c),(a,d),(a,e),(b,a),(b,b),(b,c),(b,d),(b,e),(c,a), (c,b),(c,c),(c,d), (c,e), (d,a),


(d,b), (d,c), (d,d), (d,e),(e,a), (e,b),(e,c),(e,d), (e,e)}
A = event that Janine is included in the sample
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (c,a), (d,a), (e,a)}
B = event that Janine and Jan are both included in the sample
= {(a,c), (c,a)}

Suppose the sample selected was (a,d). Did event A occur? Did event B occur?
Impossible and Sure Events
The impossible event is the empty set ∅.
The sure event is the sample space Ω.
§ Two subsets of the sample space that will always be events are the empty set and the
sample space. Their probabilities are always defined.
§ Remember that an event occurs if the outcome of the experiment belongs in it. But ∅
is the empty set so it does not contain any sample point and thus it is impossible for
this event to happen. On the other hand, Ω is the sample space so it contains all
possible outcomes of the experiment and thus we are sure that it will always occur.
Other Events
Aside from the impossible event and sure event, the other subsets of the sample space are
also required to be events.

A C A∪B A∩B
“A complement” “A union B” “A intersection B”
the collection of sample the collection of sample the collection of sample
points in the sample points that belong in at points that belong in
space that do not belong least one of A and B both A and B.
in A occurred if only event occurred if both events
occurred if event A did A occurred, only event A and B occurred
not occur B, or both A and B simultaneously
Other Events
A1∪A2∪...∪An A1∩A2 ∩... ∩An
“the union of n events” “the intersection of n events”
the collection of sample points the collection of sample points
that belong in at least one of A1, that belong in each one of A1,
A2, ..., An A2, ..., An

occurred if at least one of the n occurred if all of the n events


events occurred occurred
Example
Consider the experiment of tossing a pair of colored dice, one is green and the other is red.
For each sample point, the first coordinate represents the number of dots that comes up
on the green die while the second coordinate represents the number of dots on the red
die.
Let Ω = {(x,y) | x ε {1,2,3,4,5,6} and y ε {1,2,3,4,5,6}}.
This sample space contains 36 sample points.
Example
The following are considered events:
A = event of having the same number of dots on both dice
= {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}

B = event of 3 dots on the green die


= {(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)}

C = event of 7 dots on the green die


= { } = ∅.
Example
The other events are:
AC = { (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,3), (2,4), (2,5), (2,6), (3,1),
(3,2), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,5), (4,6), (5,1), (5,2),
(5,3), (5,4), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5) }
A ∪ B = {(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(3,1),(3,2),(3,4),(3,5),(3,6)}
A ∩ B = {(3,3)}
A ∪ B ∪ C = {(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(3,1),(3,2),(3,4),(3,5),(3,6)}
A∩B∩C = ∅
Mutually Exclusive Events
Two events A and B are mutually exclusive if and only if
A∩B=∅
that is, A and B have no elements in common.
The concept of mutually exclusive events can be extended to more than two events.
Accordingly, any collection of events is said to be mutually exclusive if the collection is
pairwise disjoint, which means that when one event in the collection occurs then any
one of the other events in the collection cannot occur.
Set Theory & Probability Theory
Set theory: Probability theory:
Universal Set, Ω Sure event
Ω

Set theory: Probability theory:


The set A A Event A will occur
Ω

Set theory: Probability theory:


A complement, AC A Event A will not occur
Ω
Set Theory & Probability Theory
Set theory: Probability theory:
A B
A union B, A ∪ B Event A or B will occur
Ω
Set theory: Probability theory:
A B
A union B union C, At least one of A, B, and C
A∪B∪C will occur
C
Ω

Set theory: Probability theory:


A A B Events A and B will occur
A intersection B, A ∩ B
ΩΩ
Set Theory & Probability Theory
Set theory: A B Probability theory:
A intersection B All events A, B, and
intersection C,
A∩B∩C
C
Ω C will occur

Set theory: Probability theory:


A intersection B complement, A B Only event A will occur but
A ∩ BC Ω not event B

Probability theory:
Set theory: A B
A and B are disjoint A Events A and B are mutually
ΩΩ exclusive
EXERCISES
Exercise Three new antivirus software are being developed to target
and clean computer virus. We then define the following

01
events:
A = event that Antivirus A detects the virus
B = event that Antivirus B detects the virus
C = event that Antivirus C detects the virus
Express the following events in terms of A, B, and C:
1. event where Antivirus A did not detect the virus
2. event where Antivirus B and C detects the virus
3. event where all three Antivirus detect the virus
4. event where Antivirus A detects the virus but not Antivirus
B
5. event where at least one of Antivirus B or C detects the
virus
Exercise Consider the random experiment of tossing a fair
coin 4 times.
02 1. Specify the sample space.
2. Let A be the event of observing heads on the
first 2 tosses. What are the elements of A?
3. Let B be the event of observing exactly two
heads. What are the elements of B?
4. Suppose you tossed a fair coin 4 times and
observed HHTT. Did A occur? Did B occur?
Did A ∪ B occur? Did A ∩ B occur?
5. Are A and B mutually exclusive?
PROBABILITY
The probability of an event A , denoted by P(A), is a function that
assigns a measure of chance that event A will occur and must
satisfy the following properties:
(a) Nonnegativity. 0 ≤ P(A) ≤ 1 for any event A
(b) Norming Axiom. P(Ω) = 1
(c) Finite Additivity. If A can be expressed as the union of n
mutually exclusive events, that is,
A = A1 ∪ A2 ∪ ... ∪ An, then
P(A) = P(A1) + P(A2) + ... + P(An).
INTERPRETATION
o A probability measure that is close to 1 means that the event has a very large chance of
occurrence. On the other hand, if the probability measure is close to 0, then the event has a
very small chance of occurrence.
o A probability of 0.5, the midpoint of the interval [0,1], means that the event has a 50-50
chance of occurrence, that is, the chance that the event will occur is just the same as the
chance that the event will not occur.
o In fact, if you are sure that an event is going to happen, then it must be assigned a
probability of 1. Similarly, the probability of the impossible event must always be equal to 0.

0 0.5 1
Impossible Hmmm... Certain
Exercise The definition of probability function is useful not only in computing
for probabilities but also in determining if our assignment of

03
probabilities is valid or not. Find the errors in each of the following
assignments of probabilities:
1. The probabilities that a couple will have 0, 1, 2, 3, or 4 or more
children are, 0.42, 0.36, 0.25, 0.12, and -0.15 respectively.
2. A person tosses a biased die three times with at least one dot
on each side. The probability that the sum of the number of
dots in all three tosses is 2 is 0.0625.
3. The probability that a selected student will pass the Physics
exam is 0.35 and the probability that this student will fail the
same exam is 0.62.
4. The probabilities that a salesperson will sell exactly 0, 1, 2, or
3 or more items on any given day are, 0.23, 0.42, 0.25, and
0.20, respectively.
Approaches to Assigning Probabilities

A priori or A posteriori or Relative


Frequency Subjective
Classical
Classical Probability
The method of using a priori or classical approach assigns probabilities
to events before the experiment is performed using the following rule:
If an experiment can result in any one of the N different equally likely
outcomes, and if exactly n of these outcomes belong to event A, then
𝑛𝑜. 𝑜𝑓𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝐴 𝑛
𝑃(𝐴) = =
𝑛𝑜. 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝛺 𝑁
REMARKS
A priori probability is also referred to as the “classical definition of probability” because it was
the first formula that provided a theoretical computation of probability.
Its use is restricted to experiments whose sample space contains equiprobable outcomes, and
consequently, the sample space must have only a finite number of sample points.
Examples of such experiments are the following:
i. die-throwing experiments where the die used is fair
ii. coin-tossing experiments where the coin used is balanced
iii. selecting n cards at random from a well-shuffled deck of cards
iv. selecting a sample of size n from a population of size N using simple random sampling
𝑛𝑜. 𝑜𝑓𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝐴
𝑃 𝐴 = = proportion of elements possessing
𝑛𝑜. 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 the characteristic of interest
Steps in Assigning Probabilities

We assign probabilities to events using a priori probability by following these


steps:
Step 1: Specify the sample space. Make sure that the outcomes are equiprobable
and finite. Count the number of sample points in Ω and denote this by
n(Ω).
Step 2: Specify event A whose probability you are interested in. Count the
number of sample points in A and denote this by n(A).
Step 3: Compute for the probability of event A using the formula,
& '
𝑃(𝐴) = & (
Example
The population consists of 𝑁 = 5 children: a=Janine, b=Josiel, c=Jan, d=Eryl, and
e=Eariel. Suppose a sample of size n=2 will be selected using SRSWR.

W ={(a,a),(a,b),(a,c),(a,d),(a,e),(b,a),(b,b),(b,c),(b,d),(b,e),(c,a), (c,b),(c,c),(c,d), (c,e), (d,a),


(d,b), (d,c), (d,d), (d,e),(e,a), (e,b),(e,c),(e,d), (e,e)}

By definition of SRSWR, the sample space contains equally likely outcomes.

A = event that Janine is included in the sample


= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (c,a), (d,a), (e,a)}

P(A) = 9/25
Example
The population consists of 𝑁 = 5 children: a=Janine, b=Josiel, c=Jan, d=Eryl, and
e=Eariel. Suppose a sample of size n=2 will be selected using SRSWR.

W ={(a,a),(a,b),(a,c),(a,d),(a,e),(b,a),(b,b),(b,c),(b,d),(b,e),(c,a), (c,b),(c,c),(c,d), (c,e), (d,a),


(d,b), (d,c), (d,d), (d,e),(e,a), (e,b),(e,c),(e,d), (e,e)}

By definition of SRSWR, the sample space contains equally likely outcomes.

B = event that Janine and Jan are both included in the sample
= {(a,c), (c,a)}

P(B) = 2/25
Relative Frequency
The method of using a posteriori or relative frequency assigns probabilities to
events by repeating the experiment a large number of times and using the following
rule:
If a random experiment is repeated many times under uniform conditions, use the
empirical probability of an event A to assign its probability as follows:

𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑒𝑣𝑒𝑛𝑡 𝐴 𝑜𝑐𝑐𝑢𝑟𝑒𝑑


𝑒𝑚𝑝𝑖𝑟𝑖𝑐𝑎𝑙 𝑃(𝐴) =
𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑤𝑎𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑

The a posteriori definition of the probability of event A is the limiting value of its
empirical probability if we repeat the process endlessly.
REMARKS
o For any event A, the a posteriori approach defines the P(A) as the limiting value of the
relative frequency of occurrence of event A if we repeat the process endlessly.
o The advantage of using a posteriori probabilities instead of a priori probabilities is that its
use is not restricted to random experiments that generate a sample space containing
equiprobable outcomes.
o The advantage, on the other hand, of using a priori probabilities instead of a posteriori
probabilities is that its use does not require us to perform the actual experiment and can be
determined prior it.
Example
Consider the experiment of tossing a coin. Let A=event that a head comes up. The first
few outcomes of one possible sequence of trials is as follows:
Trial No. Outcome Relative Frequency
For the first few trials, the values of the
1 H 1/1 = 1 relative frequencies will fluctuate.
2 H 2/2 = 1
3 T 2/3 = 0.67 However, the relative frequencies will tend
4 H ¾ = 0.75 to stabilize to its limit as n becomes large.
5 T 3/5 = 0.60
6 H 4/6 = 0.67
7 T 4/6 = 0.57
8 T 4/8 = 0.5
9 H 5/9 = 0.56
10 H 6/10 = 0.60
Subjective Probability
Subjective probability assigns probabilities to events by using
intuition, personal beliefs, and other indirect information.
This method is more personal in its approach to assigning probabilities. The assignments
may vary from one person to another, depending on their personal assessment of the
available information on the situation at hand.
But whatever the assignments may be, these measures must still conform to Kolmogorov’s
definition of a probability.
Ideally, we want to use objective methods in assigning probabilities. However, it is
sometimes impractical or not at all possible to use a priori or a posteriori probabilities.
Exercise Consider the experiment of tossing a biased coin so that
the chance of observing a tail is twice the chance of

04 observing a head. Find the probability of observing a


head.
Ω = {𝐻, 𝑇}. Since the coin is biased then the classical
approach cannot be used.
Rules of
Counting
Generalized Basic Principle of Counting
Suppose an experiment can be performed in k stages.


1st stage 2nd stage 3rd stage kth stage
n1 distinct n2 distinct n3 distinct nk distinct
possible possible possible possible
outcomes outcomes outcomes outcomes
Then there are n1 x n2 x n3 x ... x nk possible outcomes of the experiment.
Example
How many different 7-place license plates are possible if the first 3 places are to be
occupied by letters and the last 4 by numbers?
Assume that the letters and numbers CAN be repeated.

26 26 26 10 10 10 10
A, B, C, …, A, B, C, …, A, B, C, …, 0, 1, 2, …, 0, 1, 2, …, 0, 1, 2, …, 0, 1, 2, …,
X, Y, Z X, Y, Z X, Y, Z 7, 8, 9 7, 8, 9 7, 8, 9 7, 8, 9

There are 175, 760, 000 possible license plates.


Example
How many different 7-place license plates are possible if the first 3 places are to be
occupied by letters and the last 4 by numbers?
Assume that the letters and numbers CANNOT be repeated.

26 25 24 10 9 8 7
A, B, C, …, A, B, C, …, A, B, C, …, 0, 1, 2, …, 0, 1, 2, …, 0, 1, 2, …, 0, 1, 2, …,
X, Y, Z X, Y, Z X, Y, Z 7, 8, 9 7, 8, 9 7, 8, 9 7, 8, 9

There are 78, 624, 000 possible license plates.


Factorial
The factorial notation is a compact representation for the product of
the first n consecutive positive integers.
It is denoted by n!, read as “n factorial”, and
n! = n x (n-1) x (n-2) x ... x (2) x (1)
where n is a positive integer.
We also define 0! = 1.
Permutation and Combination
◉ An r-permutation of set Z is an ordered arrangement of r distinct
elements selected from set Z. It can be represented by an ordered r-
tuple with distinct coordinates. If set Z contains n distinct elements,
then the number of r-permutations of set Z is denoted by P(n,r) or nPr
read as “permutation n taken r”.
◉ An r-combination of set Z is a subset of set Z that contains r distinct
elements. If set Z contains n distinct elements, then the number of r-
"
combinations of set Z is denoted by C(n,r) or # read as “n taken r”.
ILLUSTRATION
Suppose we have n=3 and we are to take r=2. a b c

Permutation a b b c a c
P(3,2) or 3P2 b a c b c a

Combination
C(3,2) or $ a b b c a c
%
Permutation and Combination
The number of distinct r-permutations that we can form from
the n distinct elements of set Z is
𝑛!
𝑃 𝑛, 𝑟 = 𝑛 ∗ 𝑛 − 1 ∗ 𝑛 − 2 ∗ ⋯ ∗ (𝑛 − 𝑟 + 1) =
𝑛−𝑟 !

The number of distinct r-combinations that can be formed from


the n distinct elements of set Z is
𝑃 𝑛, 𝑟 𝑛!
𝐶 𝑛, 𝑟 = =
𝑟! 𝑛 − 𝑟 ! 𝑟!
EXERCISES
Exercise Answer the following:
1. A man has four shirts. One is red, the others are yellow, white,

05
and green. He has three pairs of pants. One is red, the others
are white and blue. How many ways can he match his shirts
with his pants?
2. In a Science exam, a student has a choice of 8 questions out of
10. How many ways can a student choose a set of 8 questions
if he chooses arbitrarily?
3. A class consists of 10 boys and 15 girls. An examination is
given, and the students are ranked according to their
performance. Assume that no two students obtain the same
score.
a) How many different rankings are possible?
b) If the men were ranked just among themselves and the women
among themselves, how many different rankings are possible?
Example
A nongovernment organization is awarding 5 scholarships to children of
poor families. Fifty children are qualified for the scholarship. Among
these 50 children, only 10 are boys while the other 40 are girls.
a) How many ways can the nongovernment agency select the 5 children
who will be awarded the scholarship?

There are a total of 𝑛 Ω = )* )


ways that the nongovernment agency can
select the 5 children who will be awarded the scholarship
Example
A nongovernment organization is awarding 5 scholarships to children of
poor families. Fifty children are qualified for the scholarship. Among
these 50 children, only 10 are boys while the other 40 are girls.
b) Let A = event that the agency selects 4 boys and only 1 girl. How many
sample points are in A?

The experiment can be divided into two stages: (i) selection of boys and (ii) selection of
girls. The first stage has C(10,4) = 210 possible outcomes while the second stage has
C(40,1) = 40 outcomes.

Thus, the number of sample points in A is (210)(40) = 8400.


Example
A nongovernment organization is awarding 5 scholarships to children of
poor families. Fifty children are qualified for the scholarship. Among
these 50 children, only 10 are boys while the other 40 are girls.
c) Assuming that the organization selected the children at random using
SRS, find the P(A).

𝑛𝑜. 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝐴 8,400


𝑃 𝐴 = = = 0.00396
𝑛𝑜. 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 Ω 2,118,760
Exercise If a multiple choice test consists of 5 questions
each with 4 possible answers of which only 1 is
06 correct,
a) How many different ways can a student
answer the 5 questions?
b) How many ways can a student answer all the
5 questions incorrectly?
c) If the student is choosing the answers at
random, what is the probability of getting a
score of 0?
Properties of Probability Function
1. If A is an event, then 𝑃(𝐴+ ) = 1 – 𝑃(𝐴).
2. If A and B are events, then 𝑃(𝐴 ∩ 𝐵𝐶 ) = 𝑃(𝐴) – 𝑃(𝐴 ∩ 𝐵).
3. If A and B are events, then
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵).
4. If A and B are mutually exclusive, then
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵).
5. If A and B are events, then
𝑃 𝐴∪𝐵 , = 𝑃 𝐴, ∩ 𝐵, .
𝑃 𝐴∩𝐵 , = 𝑃(𝐴, ∪ 𝐵, ).
Properties of Probability Function

A A B A B
Ω Ω Ω
𝑃(𝐴! ) = 1 – 𝑃(𝐴) 𝑃(𝐴 ∩ 𝐵𝐶 ) = 𝑃(𝐴) – 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

A B A B A B
Ω Ω Ω
𝑃 𝐴∩𝐵 " = 𝑃(𝐴" ∪ 𝐵" )
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑃 𝐴∪𝐵 " = 𝑃 𝐴" ∩ 𝐵"
Exercise Suppose A and B are events for which it is known
that 𝑃 𝐴 = 0.6, 𝑃 𝐵 = 0.7, and 𝑃 𝐴 ∩ 𝐵 = 0.4.
07 Compute the probabilities of the following events:
a. 𝑃 𝐴 ∪ 𝐵
b. 𝑃 𝐴 ∩ 𝐵"
c. 𝑃 𝐵 ∩ 𝐴"
"
d. 𝑃 𝐴 ∩ 𝐵
e. 𝑃 𝐴" ∩ 𝐵"
f. 𝑃 𝐴 ∩ 𝐵" ∪ 𝐵 ∩ 𝐴"
Event Composition Method
Here, the probabilities are computed by expressing the event of interest as a
composition of other events.
Step 1: Define the basic events. The basic events are those events in the
problem that cannot be expressed as a composition of other events.
Step 2: List the known probabilities of events as stated in the problem.
Step 3: Express the event of interest as a composition of the basic events using
the set operations.
Step 4: Use theorems or formulas for the computation of the probabilities.
Example
The probability that a randomly selected student sleeps in Stat 101 is
0.60 and the probability that he sleeps in Math 20 is 0.85. If the
probability that he sleeps in at least one of the two courses is 0.95,
a) What is the probability that the selected student sleeps in both
courses?
b) What is the probability that the selected student is awake in both
courses?
Example
First, define the basic events:
Let S = event that the selected student sleeps in Stat 101
M = event that the selected student sleeps in Math 20
Given: 𝑃 𝑆 = 0.60 𝑃 𝑀 = 0.85 𝑃(𝑆 ∪ 𝑀) = 0.95
a) What is the probability that the selected student sleeps in both courses?
𝑃 𝑆∩𝑀 =𝑃 𝑆 +𝑃 𝑀 −𝑃 𝑆∪𝑀
= 0.60 + 0.85 − 0.95
= 0.5
b) What is the probability that the selected student is awake in both courses?
𝑃 𝑆 ' ∩ 𝑀' = 𝑃 𝑆 ∪ 𝑀 '
=1−𝑃 𝑆∪𝑀
= 1 − 0.95 = 0.05
Exercise Danielle, a health worker, is studying the prevalence
of certain diseases in a particular community. Based
08 on her previous studies, she came up with the
following figures: 10% of the people in the
community will contract disease A sometime during
their lifetime; 25% will contract disease B; and 5%
will contract both diseases. Find the probability that
a randomly selected person from this community
will contract:
a. At least one of the 2 diseases
b. Disease B but not disease A
c. Exactly one of the 2 diseases
Independent Events
Two events A and B are said to be independent events if and only if

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 × 𝑃(𝐵)

Otherwise, the events are said to be dependent.


Mutually Exclusive vs. Independent Events

𝑃 𝐴∩𝐵 =0 “Hindi pwede mangyari


Mutually nang sabay”
Exclusive

𝑃 𝐴∩𝐵
“Walang kinalaman sa = 𝑃 𝐴 𝑃(𝐵)
isa’t isa” Independent
Example
The probability that a Japanese industry will put up a plant in Cebu is 0.7. The
probability that it will put up a plant in Bataan is 0.3, and the probability that it will put
up a plant in at least one of the two provinces is 0.79.
Define the following events:
C = event that a Japanese industry will locate in Cebu
B = event that a Japanese industry will locate in Bataan
Are C and B independent events?
Solution:
Given: 𝑃(𝐶) = 0.7, 𝑃(𝐵) = 0.3, and 𝑃(𝐶 ∪ 𝐵) = 0.79. We compute for 𝑃(𝐶 ∩ 𝐵)
𝑃 𝐶 ∩ 𝐵 = 𝑃 𝐶 + 𝑃 𝐵 − 𝑃 𝐶 ∪ 𝐵 = 0.7 + 0.3 − 0.79 = 0.21
Now, 𝑃 𝐶 𝑃 𝐵 = 0.7 0.3 = 0.21.
Since 𝑃 𝐶 ∩ 𝐵 = 0.21 = 𝑃 𝐶 𝑃(𝐵), then C and B are independent events.
Exercise At the annual Idol Star Athletics Championships,
three idols compete in the Team Archery. Each
09 idol fires one shot at the target in an attempt to
hit the bullseye. Let 𝐴> denote the event that
person j hits the bullseye, j=1,2,3. If 𝐴? , 𝐴% , and
𝐴$ are independent with 4𝑃(𝐴? ) = 𝑃(𝐴% ) =
2𝑃(𝐴$ ) = 0.8 , obtain the probability that
a) all of them hits the bullseye
b) exactly one of them hits the bullseye
02 Random Variable and
its Distribution
Random Variable
Definition
A function whose value is a real number that is determined by each sample point in
the sample space is called a random variable.
An uppercase letter, say X, will be used to denote a random variable and its
corresponding lowercase letter, x in this case, will be used to denote one of its
values.

X(•): 𝛀 → ℝ
REMARKS
o The use of the term variable is consistent with the way we use this word in
mathematics and the way we defined it in Stat 114 as the characteristic of
interest whose value varies.

o The addition of the term “random” emphasizes the requirement that the realized
or actual value of the random variable depends on the outcome of a random
experiment.

o Consequently, it is impossible to predict with certainty what the realized value


of the random variable X will be.
Random Variable as a function
The random variable is a function or mapping.
And therefore, each outcome in the sample space must be mapped/translated to
exactly one real number.
Example
Filipinos are so fascinated with elections and the polls conducted to predict the
outcomes of these elections. For illustration purposes, let us imagine a very small
barangay consisting of 6 qualified voters. Let’s label these voters as A1, A2, A3, A4,
A5, and A6.
There are two candidates vying for the position, say Renzo and Sandro. What we do
not know is that voters A1, A2, A3 and A4 have already decided to elect Renzo while
voters A5 and A6 will elect Sandro.
We only have enough resources to get a sample of size 3. We will then use the
information from this sample to predict the outcome of the election.
Example
Suppose we use SRSWOR to select our sample of size 3. Our sample space
will contain all the 20 possible subsets of size 3. The sample points in our
sample space are:

{A1,A2,A3} {A1,A2,A4} {A1,A2,A5} {A1,A2,A6} {A1,A3,A4}


{A1,A3,A5} {A1,A3,A6} {A1,A4,A5} {A1,A4,A6} {A1,A5,A6}
{A2,A3,A4} {A2,A3,A5} {A2,A3,A6} {A2,A4,A5} {A2,A4,A6}
{A2,A5,A6} {A3,A4,A5} {A3,A4,A6} {A3,A5,A6} {A4,A5,A6}
Example
Define X = number of voters who will elect Renzo.
X is a random variable. Its realized value depends on the outcome of the
random experiment (selection of a sample of size 3).
{A1,A2,A3} {A1,A2,A4} {A1,A2,A5} {A1,A2,A6} {A1,A3,A4}
3 3 2 2 3
{A1,A3,A5} {A1,A3,A6} {A1,A4,A5} {A1,A4,A6} {A1,A5,A6}
2 2 2 2 1
{A2,A3,A4} {A2,A3,A5} {A2,A3,A6} {A2,A4,A5} {A2,A4,A6}
3 2 2 2 2
{A2,A5,A6} {A3,A4,A5} {A3,A4,A6} {A3,A5,A6} {A4,A5,A6}
1 2 2 1 1
Expressing the Event of Interest
o We will use the notation, X ≤ x, to express the event containing all sample points
whose associated value for the random variable X is less than or equal to x (X is at
most x), where x is a specified real number.
o We will use the notation, X > x, to express the event containing all sample points
whose associated value for X is greater than x.
o We will use the notation, a<X<b, to express the event containing all sample points
whose associated value for X is in between a and b, where a and b are specified
real numbers.
o and so on.
Example
Define X = number of voters who will elect Renzo.
{A1,A2,A3} {A1,A2,A4} {A1,A2,A5} {A1,A2,A6} {A1,A3,A4}
3 3 2 2 3
{A1,A3,A5} {A1,A3,A6} {A1,A4,A5} {A1,A4,A6} {A1,A5,A6}
2 2 2 2 1
{A2,A3,A4} {A2,A3,A5} {A2,A3,A6} {A2,A4,A5} {A2,A4,A6}
3 2 2 2 2
{A2,A5,A6} {A3,A4,A5} {A3,A4,A6} {A3,A5,A6} {A4,A5,A6}
1 2 2 1 1

A = event of selecting a sample with 1 voter electing Renzo


= {{A1,A5,A6}, {A2,A5,A6},{A3,A5,A6}, {A4,A5,A6}}
Event A can be expressed as X=1.
Example
B = event of selecting a sample with more than 2 voters electing Renzo
= {{A1,A2,A3}, {A1,A2,A4}, {A1,A3,A4}, {A2,A3,A4}}
Event B can be expressed as X>2.
C = event of selecting a sample with at least 1 voter electing Renzo
= W = sure event
Event C can be expressed as X ≥ 1
D = event of selecting a sample with 5 voters electing Renzo
= Æ = impossible event.
Event D can be expressed as X=5.
Cumulative Distribution Function
The cumulative distribution function (CDF) of a random
variable X, denoted by F(.), is a function defined for any
real number x as
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥)
We can use this CDF to compute for the probability of an event.
Note that the values of the CDF is from 0 to 1, since it is a probability.
REMARKS
o The CDF of the random variable X is also referred to as its distribution.
o Just like the PMF of a discrete random variable and the PDF of a continuous
random variable, the CDF provides us with complete information about the
behavior of the random variable. We can use it to compute for the probability
of any event expressed in terms of the random variable X.
Two Major Types of Random Variables

(Absolutely)
Discrete
continuous
random
random
variable
variable
Discrete Random Variable
If a sample space contains a finite number of sample points or has as
many sample points as there are counting/natural numbers, then it is
called a discrete sample space.
A random variable defined over a discrete sample space is called a
discrete random variable.
𝛀 = 𝟏, 𝟐, … , 𝒏 𝛀 = {𝟏, 𝟐, 𝟑, … }
Discrete Random Variable
Here are some examples of discrete random variables:
1) number of people vaccinated in a random day
2) number of heads in 5 tosses of a coin
3) number of fake news shared in a random day
4) number of COVID-19 active cases in a random week
5) number of Bondee friends of a random person
Probability Mass Function
The probability mass function (PMF) of a discrete random variable,
denoted by f(.), is a function defined for any real number x as
𝑓(𝑥) = 𝑃(𝑋 = 𝑥)
The values of the discrete random variable X for which f(𝑥) > 0 are
called its mass points.
We can use the probability mass function to compute for probabilities expressed in
terms of X. Also, we can use it to compute for important summary measures like the
mean and standard deviation.
Example
Define X = number of voters who will elect Renzo.
{A1,A2,A3} {A1,A2,A4} {A1,A2,A5} {A1,A2,A6} {A1,A3,A4}
3 3 2 2 3
{A1,A3,A5} {A1,A3,A6} {A1,A4,A5} {A1,A4,A6} {A1,A5,A6}
2 2 2 2 1
{A2,A3,A4} {A2,A3,A5} {A2,A3,A6} {A2,A4,A5} {A2,A4,A6}
3 2 2 2 2
{A2,A5,A6} {A3,A4,A5} {A3,A4,A6} {A3,A5,A6} {A4,A5,A6}
1 2 2 1 1

o X is a discrete random variable. The range of X = {1,2,3}.


o The elements in the range of X are the mass points of the discrete random variable X.
o To derive the PMF of X, we need to compute P(X=x) for all x that are mass points of X.
Steps in Constructing a PMF
We construct the probability mass function as follows:
Step 1: Identify the mass points of X. The mass points of X are actually the
possible values that X could take on because these are the points where
P(X=x) will be nonzero. In other words, the set of mass points of X is the
range of the function X.
Step 2: Determine the event associated with the expression, X = x.
Step 3: Compute for the probability of this event.
Example
o Since Ω contains equally likely outcomes, then we can use the classical definition to
compute for these probabilities, that is,
P(A) = # of sample points in A / # of sample points in W
o The PMF of X can be presented in tabular form as:

x Event Associated with X=x P(X=x)


1 {{A1,A5,A6}, {A2,A5,A6},{A3,A5,A6}, {A4,A5,A6} 4/20 = 1/5
{{A1,A2,A5}, {A1,A2,A6}, {A1,A3,A5}, {A1,A3,A6}, {A1,A4,A5},
2 {A1,A4,A6}, {A2,A3,A5}, {A2,A3,A6},{A2,A4,A5}, {A2,A4,A6}, 12/20 = 3/5
{A3,A4,A5}, {A3,A4,A6}}
3 {{A1,A2,A3}, {A1,A2,A4}, {A1,A3,A4}, {A2,A3,A4}} 4/20 = 1/5

x 1 2 3
f(x) 1/5 3/5 1/5
Valid PMF
We must always check if a probability mass function (PMF) is
valid or not.
A valid PMF must satisfy the following properties:
üIt is greater than 0 when it is evaluated at the mass points.
üThe sum of its values evaluated at all mass points is equal to
1.
Using the PMF to compute probabilities

We can use the PMF of X to determine the probability of an


event expressed in terms of it by following these steps:
Step 1: Identify the mass points, x, that are included in the
interval of interest.
Step 2: Use the PMF to determine the value of P(X=x) for
each one of the mass points identified in Step 1.
Step 3: Get the sum of all the values derived in Step 2.
Example
The PMF of the discrete random variable X is as follows:
x 2 4 6 8 10 12
f(x)=P(X=x) 1/25 3/25 8/25 6/25 5/25 2/25
1. P(X > 8)
5 2 7
𝑃 𝑋 > 8 = 𝑓 10 + 𝑓 12 = + =
25 25 25
2. P(X ≥ 8)
6 5 2 13
𝑃 𝑋 ≥ 8 = 𝑓 8 + 𝑓 10 + 𝑓 12 = + + =
25 25 25 25
3. P(X < 3.5)
1
𝑃 𝑋 < 3.5 = 𝑓 2 =
25
4. P(4.5 < X < 10)
8 6 14
𝑃 4.5 < 𝑋 < 10 = 𝑓 6 + 𝑓 8 = + =
25 25 25
Example
It is useful for fire departments to have a model for X = number of direct injuries
occurring in a fire incident. They can use this to determine the level of medical services
to make available each time. Suppose the PMF of X is as follows:

x 0 1 2 3 4 5 6 7 8 9
f(x) 0.08 0.09 0.09 0.14 0.18 0.16 0.09 0.06 0.06 0.05

Use this PMF to evaluate the probabilities of the following events:


a. event that there will be at least 1 direct injury
b. event that there will be less than 3 direct injuries
c. event that there will be more than 2 but less than 8 direct injuries
Exercise An experiment consists of tossing a coin three times and
observing the result. Let X be the number of heads. Given

10 below is its CDF:

0, 𝑥<0 1. What is the probability


1' , that more than one toss
8 0≤𝑥<1 will result to heads?
𝐹 𝑥 = 4'8 , 1≤𝑥<2 2. What is the probability
7' , 2≤𝑥<3 that the three tosses will
8
1, 𝑥≥3 result to at least one
head?
3. What is the probability
that all three tosses will
result to heads?
Continuous Random Variable
If a sample space contains an infinite number of sample points and
cannot be put into a one-to-one correspondence with the set of
counting numbers, then it is called a continuous sample space.
A random variable defined over a continuous sample space is called a
continuous random variable.
𝛀 = 𝟎, 𝟏 𝛀 = 𝟑, 𝟓 𝛀 = [𝟎, ∞)
Continuous Random Variable
Here are some examples of continuous random variables:
1) Weight (in kilos) of a random newborn baby
2) Shelf life of a particular drug in a sample
3) Height of water (in meters) in a dam
4) Vaccine efficacy (in percentage)
5) COVID transmission rate
Probability Density Function
The probability density function (PDF) of a continuous random variable X, denoted by f(.), is
a function defined for any real number x and satisfy the following properties:
a) f(x) ≥ 0 for all x;
b) the area below the whole curve, f(x), and above the x-axis is always equal to 1; and
c) P(a ≤ X ≤ b) is the area bounded by the curve f(x), the x-axis, and the lines x = a and x =
b.

Unlike the discrete random variable, the continuous random variable has infinite mass points. It is not defined as P(X=x). However, we still
consider the PDF as the counterpart of the PMF for the continuous case because we will also use it to compute probabilities and summary
measures.
Graph of the PDF
o The graph of the PDF is always above
the 𝑥 -axis because the function
X=a X=b
cannot take on negative values.
f(x)
o If we remove the lines 𝑥 = 𝑎 and 𝑥 = 𝑏
and measure the whole area below P(a ≤ X ≤ b)
𝑓(𝑥) and above the 𝑥-axis, this area is
always exactly equal to 1.

o The shaded area which is bounded by


the curve f(x), the x-axis, and the lines a b
x=a and x=b, represents the P(a≤X≤b).
Using the PDF to compute probabilities

When dealing with a continuous random variable X, here are


some useful and remarkable properties:
1. P(X=c)=0
2. P(X ≤ c) = P(X < c)
3. P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a ≤ X ≤ b)

Note that the discrete random variable doesn’t share this property.
Graph of the PDF
o P(X=a) is just the same as X=a X=b
P(a≤X≤a). In this case, we
will let b=a. Then, the area f(x)
representing P(X=a) will be
P(a ≤ X ≤ b)
0 because we will only be
left with a single line.

a b
Example
The PDF of a continuous random variable, X, is given by:
0.25, 3<𝑥<7
𝑓 𝑥 =R
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find the following probabilities:
a. P(4 < X < 6.5) = (length)(width) = (0.25)(2.5) = 0.625
b. P(X ≥ 5) = (0.25)(2) = 0.5
Notes about the PMF and PDF

o The PMF of the discrete random variable and the PDF of the continuous random
variable are what we refer to as the distribution of the random variable.
o The distribution of the random variable X provides us with complete information
about the behavior of the random variable X.
o Although we cannot predict with certainty what the realized value of the random
variable X will be, we can use its distribution to compute for the probability of any
event expressed in terms of the random variable X. We will learn how to do this in
Stat 121. In Stat 115, we will use the CDF of a continuous random variable to
compute for probabilities.
Notes about the CDF

When X is a continuous random variable, we can express the probability


of the event in terms of the CDF as follows:
o P(X £ a) = P(X < a) = F(a)
o P(X > a) = P(X ³ a) = 1 – F(a)
o P(a < X < b) = P(a £ X £ b) = P(a £ X < b) = P(a < X £ b)
= F(b) – F(a)
Example
The PDF of a continuous random variable, X, is given by:

1, 𝑥≥7
F 𝑥 = C0.25𝑥 − 0.75, 3≤𝑥<7
0, 𝑥<3
Find the following probabilities using this CDF.
a) P(X≥5)
𝑃 𝑋 ≥5 =1−𝐹 5
= 1 − 0.25 5 − 0.75
= 0.5
b) P(4<X<6.5)
𝑃 4 < 𝑋 < 6.5 = 0.25 6.5 − 0.75 − 0.25 4 − 0.75
= 0.625
Exercise The CDF of a continuous random variable X is as
follows:
11 1, 𝑤ℎ𝑒𝑛 𝑥 ≥ 1
𝐹 𝑥 = W𝑥 I, 𝑤ℎ𝑒𝑛 0 ≤ 𝑥 < 1
0, 𝑤ℎ𝑒𝑛 𝑥 < 0
Find the following probabilities using the CDF above:
a) P(X > 0.25)
b) P(0.3 < X < 0.7)
c) P(0.4 ≤ X < 1.25)
d) P(X ≤ 0.5)
Expected Value of X
Let X be a discrete random variable with probability mass function (PMF):
x x1 x2 … xn
f(x)=P(X=x) f(x1) f(x2) … f(xn)

The expected value of X, also referred to as the mean of X, is


𝒏
𝐸 𝑋 = 𝜇J = 𝑥K𝑓 𝑥K + 𝑥L𝑓 𝑥L + ⋯ + 𝑥& 𝑓 𝑥& = ^ 𝒙𝒊 𝒇(𝒙𝒊 )
𝒊N𝟏
E(X) as a Weighted Mean

As a result, mass points with larger chances of occurrence receive heavier


weights and have a greater contribution in locating the center of the
distribution.
Example
In a game of chance, a man is paid Php 50.00 if he gets all heads or all tails when 3 fair coins are
tossed, and he pays out Php 30.00 if either 1 or 2 heads show. What is his expected gain?
Let X = gain of the gambler.
We see that X is a discrete random variable with only 2 possible values: 50 and -30.
X=50 à {HHH, TTT}
X=-30 à {HHT, HTH, HTT, THH, THT, TTH}
x 50 -30
We then derive the PMF to get the value of E(X)
!(#) 2/8 6/8
Thus,
$
2 6
𝐸 𝑋 = & 𝑥! 𝑝 𝑥! = 50 ∗ + −30 ∗ = −10
8 8
!"#
Expected Value of g(X)
Let X be a discrete random variable with probability mass function (PMF):
x x1 x2 … xn
f(x)=P(X=x) f(x1) f(x2) … f(xn)

Suppose Y = g(X) is a discrete random variable, then the expected value of


g(X) is
𝒏

𝐸 𝑔(𝑋) = 𝑔(𝑥3 )𝑓 𝑥3 + 𝑔 𝑥4 𝑓 𝑥4 + ⋯ + 𝑔 𝑥5 𝑓 𝑥5 = 6 𝒈 𝒙𝒊 𝒇(𝒙𝒊 )


𝒊7𝟏
Example
A used-car dealer finds that in any day, the probability of selling exactly no car is 0.4, one car is
0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is 0.06, and six cars is 0.01.
Let X = number of cars sold in a day and
Let Y = 500 + 1500X represent the salesman’s daily earnings.
Find the salesman’s expected daily earnings.
We can see that the PMF of X is:
x 0 1 2 3 4 5 6
f(x)=P(X=x) 0.40 0.20 0.15 0.10 0.08 0.06 0.01

And the values for Y=g(X)=500+1500X are as follow:


x 0 1 2 3 4 5 6
g(X) 500 2000 3500 5000 6500 8000 9500
Thus,
+

𝐸(𝑌) = 𝐸(𝑔(𝑋)) = R 𝑔 𝑥( 𝑝 𝑥( = 500(0.40) + 2000(0.20) +. . . +(9500)(0.01) = 2,720


()*
Variance of X
Let X be a random variable with mean, μ. The variance of X,
denoted by σ2 or Var(X), is defined as:
𝜎 ! = 𝑉𝑎𝑟(𝑋) = 𝐸 𝑋 − 𝜇 ! = 𝐸(𝑋 ! ) – 𝐸 𝑋 !
The standard deviation of X is the positive square root of the
variance.
The variance is still a measure of dispersion and the average squared difference between the realized
value of X and μ.
Also, the variance, being a mean, is also in terms of expectation.
Example
The PMF of the discrete random variable X is as follows:

x 2 4 6 8 10 12
f(x)=P(X=x) 1/25 3/25 8/25 6/25 5/25 2/25

Use this PMF to determine the mean and variance of X.

µX = E(X) = (2)(1/25) + (4)(3/25) + (6)(8/25) + (8)(6/25) + (10)(5/25) + (12)(2/25)


= 7.36

sX2 = E(X-7.36)2 = (2 – 7.36)2 (1/25) + (4 – 7.36)2 (3/25) + (6 –7.36)2 (8/25) +


(8 –7.36)2 (6/25) + (10 – 7.36)2(5/25) + (12 – 7.36)2(2/25)
= 6.3104
Exercise The PMF of a discrete random variable X is as follows:

12 x
f(x)=P(X=x)
-1 0 1 2
1/10 2/10 5/10 2/10

Use this PMF to evaluate the following:


a) Mean of X
b) Variance of X
c) Standard deviation of X
d) E(X3)
Read:
Properties of the Mean and
Variance (pp. 345-346)

Reading
Assignment
Properties of the Mean
o 𝐸(𝑋 − 𝜇) = 0
o 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋)
o 𝐸(𝑏) = 𝑏
o 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
o 𝐸(𝑋 – 𝑌) = 𝐸(𝑋) – 𝐸(𝑌)
o If X and Y are independent random variables: 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)
Properties of the Variance
Let a and b be constants
o 𝑉𝑎𝑟 𝑎𝑋 = 𝑎% 𝑉𝑎𝑟 𝑋
o 𝑉𝑎𝑟 𝑏 = 0
o 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎% 𝑉𝑎𝑟 𝑋 since variance is not affected by the addition/subtraction of a constant.

o 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌) if X and Y are independent random variables


o 𝑉𝑎𝑟 𝑋 − 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌) if X and Y are independent random variables
o 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 % − 𝐸 𝑋 %
Example
If X and Y are independent random variables with
𝐸(𝑋) = 2, 𝐸(𝑌) = 4, 𝑉𝑎𝑟(𝑋) = 3, 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑌) = 1
then solve for the following expectations:
a. 𝐸(4𝑋 − 5) = 4𝐸(𝑋)– 5 = 4(2)– 5 = 8– 5 = 3
b. 𝑉𝑎𝑟(4𝑋 − 5) = 4# 𝑉𝑎𝑟(𝑋) = 16(3) = 48
c. 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸 𝑌 = 2 4 = 8
d. 𝑉𝑎𝑟(3𝑋 − 2𝑌) = 3# 𝑉𝑎𝑟(𝑋) + 2# 𝑉𝑎𝑟(𝑌) = 9(3) + 4(1) = 27 + 4 = 31
Skewness and Kurtosis
𝜇: :
𝑆𝑘 = : 𝑤ℎ𝑒𝑟𝑒 𝜇: = 𝐸 𝑋 − 𝜇
𝜎
𝜇; ;
𝐾𝑢𝑟𝑡 = ; 𝑤ℎ𝑒𝑟𝑒 𝜇; = 𝐸 𝑋 − 𝜇
𝜎
03 Binomial Distribution
Binomial Experiment
A binomial experiment is a random experiment that satisfies the following
properties:
a) The experiment consists of 𝑛 identical trials
b) Each trial results in one of two outcomes: a “success” or a “failure” (sometimes called a
Bernoulli trial)
c) The probability of success is p, constant throughout
d) The trials are done independently of each other

The random variable in a binomial experiment is the number of times a success has
occurred in a total of 𝒏 trials.
Example X = number of items correctly answered out of 10 items
Y = number of games won out of 8 games
Example
Tossing a coin 20 times to see how many tails occur.
a) It consists of observing the outcomes of a sequence of 20 tosses
(trials).
b) Each toss (trial) can result in one of only 2 possible outcomes:
heads or tails
c) The probability of success is the same for each trial which is 0.5.
d) The trials are independent. Even if you get heads or tails for the
previous tosses, the probability of a success (getting heads or tails)
is still 0.5.
Binomial Distribution
A discrete random variable X is said to follow a Binomial distribution if its probability mass
function (PMF) is given by

&
𝑝V 1 − 𝑝 &WV , 𝑥 = 0,1,2, … , 𝑛
𝑓 𝑥 =a V
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
where n and p are the parameters of the distribution, p is the probability of success (any
value between "0" and "1"), and n is the number of trials (any positive integer).

We consider using the Binomial distribution to model


X = number of “successes” out of n trials.
Binomial Distribution
If 𝑋 ~ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝)
Number of trials

Probability
of success

then 𝐸(𝑋) = 𝑛𝑝 and


𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝)
Example
A multiple-choice quiz has 15 questions, each with 4 possible
answers of which only 1 is correct. Suppose a student has
been absent for the past meetings and has no idea what the
quiz is all about. The student simply uses a randomization
mechanism in answering each item.
a) What is the probability that the student will get a
perfect score?
b) What is the probability that the student will get at
least 3 correct answers?
c) What is the student’s expected number of correct
answers?
Example
Let X = number of correct answers (successes) out of the 15 items (trials)
X ~ Binomial (n=15, p=1/4) and its pmf is

15 , 1 − 0.25 -./, ,
𝑝 𝑥 =X 𝑥 (0.25) 𝑥 = 0,1,2, … , 15
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
a) What is the probability that the student will get a perfect score?
15
𝑃 𝑋 = 15 = (0.25)-. 1 − 0.25 -./-. = 0.00000000093
15
b) What is the probability that the student will get at least 3 correct answers?
𝑃 𝑋 ≥3 =1−𝑃 𝑋 <3 =1− 𝑃 𝑋 =0 +𝑃 𝑋 =1 +𝑃 𝑋 =2
= 1 − [𝑝 0 + 𝑝 1 + 𝑝 2 ] = 1 – (0.01336 + 0.0668 + 0.1559) = 0.7639
c) What is the student’s expected number of correct answers?
The expected number of correct answers is 𝑛𝑝 = (15)(0.25) = 3.75 𝑜𝑟 4
Exercise A fair die is rolled and the number of the dots on the face
that comes up is observed. What is the probability that
13 the number of dots is a perfect square (1 or 4) in 3 out of
the 6 tosses?
Exercise Westex Oil Corporation spends an extensive amount of
time, effort, and money in searching for oil. On the basis
14 of past experience, the firm has 15% rate of success of
striking oil in each drilling operation and the results of
the drilling operations are independent of each other. In
the year 2013, the corporation plans to drill 20 new
wells. What is probability that they will strike oil in at
least 3 of the 20 drilling operations in 2013?
04 Normal Distribution
Normal Distribution
A continuous random variable X is said to be normally distributed if its probability
density function (PDF) is given by

𝟏 𝟏 𝑿#𝝁 𝟐
#
𝒇 𝒙 = 𝒆 𝟐 𝝈
𝝈 𝟐𝝅
for any real number x. The constants, μ and σ2, are such that μ is any real number
and σ2>0. The values, e and π, are mathematical constants, wherein, e=2.71828
and π=3.14159.
Many consider the normal distribution as the most important distribution in Statistics. And we will see why...
Graph of the Normal Distribution

In Stat 114, if X~Normal(µ,s2) then


a) P(µ - 1s < X < µ + 1s) » 0.68
b) P(µ - 2s < X < µ + 2s) » 0.95
c) P(µ - 3s < X < µ + 3s) > 0.99

o Bell-shaped curve that is symmetric about μ.


o The area bounded by the curve and the x-axis is 1.
o The curve will approach the x-axis as we proceed in either direction away from μ, but will
never touch the x-axis.
Graph of the Normal Distribution
Here, we can see how the different values of µ and s2 affect the distribution.
Standard Normal Distribution
Definition
If the normal random variable has mean 0 and
variance 1, it is called a standard normal
random variable and is denoted by Z.

A normal distribution with mean 0 and


variance 1 is called standard normal
distribution.

We will use the notation Φ(•) or 𝐹d (•) to


denote the CDF of a standard normal random
variable.
Standard Normal Distribution
Table B.1 presents the values of the CDF of a standard normal random
variable or 𝑃(𝑍 ≤ 𝑧).
We can compute for the probability of any event expressed in terms of the
standard normal random variable using these formulas:
o𝑃(𝑍 ≤ 𝑎) = 𝑃(𝑍 < 𝑎) = Φ(𝑎)
o𝑃(𝑍 > 𝑎) = 𝑃(𝑍 ≥ 𝑎) = 1 − Φ(𝑎)
o𝑃 𝑎 < 𝑍 < 𝑏 = 𝑃 𝑎 ≤ 𝑍 ≤ 𝑏 = 𝑃 𝑎 ≤ 𝑍 < 𝑏
= 𝑃(𝑎 < 𝑍 ≤ 𝑏) = Φ(𝑏) − Φ(𝑎)
where Z is the standard normal random variable.
See Examples 10.49 and 10.50 (page 352)
Linear Transformation of X
If 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝜎 % ), and 𝑌 = 𝑎𝑋 + 𝑏
then Y will still be normally distributed
but with 𝐸 𝑌 = 𝑎𝜇 + 𝑏 and 𝑉𝑎𝑟(𝑌) = 𝑎% 𝜎 % .

𝑋 ~ 𝑁(𝜇, 𝜎 ! ) → 𝑌 ~ 𝑁(𝑎𝜇 + 𝑏, 𝑎! 𝜎 ! )

)#*
Classic Example: 𝑍 = ~𝑁(0,1)
+
Property of the Normal Distribution

o Any random variable X that follows a normal distribution with mean µ and
variance s2 can be transformed into a standard normal random variable Z with
mean 0 and variance 1.
o The transformation is the familiar formula that we use to compute for the z-score
in Stat 114:
𝑋−𝜇
𝑍=
𝜎
o If X~N(μ,σ2), then
𝑋−𝜇 𝑎−𝜇 𝑎−𝜇
𝑃 𝑋≤𝑎 =𝑃 ≤ =𝑃 𝑍≤
𝜎 𝜎 𝜎
where Z is a standard normal random variable.
Example
Suppose X~Normal(μ=5, σ2=4).
JW) eW)
a) 𝑃 𝑋 ≤ 6 = 𝑃 ≤ = 𝑃 𝑍 ≤ 0.5 = Φ 0.5 = 0.6915
L L

f.)W) JW) eW)


b) 𝑃 4.5 ≤ 𝑋 ≤ 6 = 𝑃 ≤ ≤
L L L
= 𝑃 −0.25 < 𝑍 < 0.5
= Φ 0.5 − Φ −0.25 = 0.6915 − 0.4013 = 0.2902
JW) f.)W)
c) 𝑃 𝑋 > 4.5 = 𝑃 L
> = 𝑃 𝑍 > −0.25
L
= 1 − Φ −0.25 = 1 − 0.4013 = 0.5987
Checking the 68-95-99 Rule
Suppose X~Normal(µ,s2).
𝑋−𝜇
𝑃 𝜇 − 1𝜎 < 𝑋 < 𝜇 + 1𝜎 = 𝑃 −1𝜎 < 𝑋 − 𝜇 < 1𝜎 = 𝑃 −1 < <1
𝜎
= 𝑃(−1 < 𝑍 < 1) = Φ(1) − Φ −1 = 0.8413 − 0.1587 = 0.6826

𝑋−𝜇
𝑃 𝜇 − 2𝜎 < 𝑋 < 𝜇 + 2𝜎 = 𝑃 −2𝜎 < 𝑋 − 𝜇 < 2𝜎 = 𝑃 −2 < <2
𝜎
= 𝑃(−2 < 𝑍 < 2) = Φ(2) − Φ(−2) = 0.9772 − 0.0228 = 0.9544

𝑋−𝜇
𝑃 𝜇 − 3𝜎 < 𝑋 < 𝜇 + 3𝜎 = 𝑃 −3𝜎 < 𝑋 − 𝜇 < 3𝜎 = 𝑃 −3 < <3
𝜎
= 𝑃(−3 < 𝑍 < 3) = Φ(3) − Φ(−3) = 0.9987 − 0.0013 = 0.9974
za Value
The value za (read as “z sub alpha”) satisfies the condition that P(Z > za) = a.
This is equivalent to saying that P(Z ≤ za) = 1-a

a a
1-2a
-za za
za Values: Bottom of Table B.1
Example:
z0.05= 1.645
P(Z > 1.645) = 0.05 , P(Z < 1.645) = 0.95
P(Z < -1.645) = 0.05 , P(-1.645 < Z < 1.645) = 0.90.

0.05 0.05
0.90
-1.645 0 1.645
Importance of the Normal Distribution

o The normal distributions or at least approximately normal distributions occur in


many situations. Many physical and mental traits tend to be at least
approximately normally distributed.
o If it is not X that is normally distributed, it is some transformation of X that is
normal.
o Furthermore, as a consequence of the Central Limit Theorem, the normal
distribution is also used to model characteristics of interest that are believed to
be the result of summing up a large number of small effects that are
independently generated by a process.
Example
Examples 10.53 and 10.54 (page 355)
Exercise 1 (page 372) A wine’s distinctive taste is a result of ageing it in
wooden casks. Some of the wine evaporates while it is aging in the porous
wooden casks. Define X=percentage of wine in the cask that is lost due to
evaporation. Suppose X is normally distributed with mean 5% and a
standard deviation of 1%. What is the probability of losing more than 7.5%
of the wine due to evaporation?
o Always define the random variable: X=percentage of wine in the cask that
is lost due to evaporation
o Identify the distribution of X: Given: X~Normal(µ=5, s2=12)
o Express problem in terms of the defined random variable: Find P(X>7.5).
Example
Examples 10.53 and 10.54 (page 355)
Exercise 1 (page 372) A wine’s distinctive taste is a result of ageing it in
wooden casks. Some of the wine evaporates while it is aging in the porous
wooden casks. Define X=percentage of wine in the cask that is lost due to
evaporation. Suppose X is normally distributed with mean 5% and a
standard deviation of 1%. What is the probability of losing more than 7.5%
of the wine due to evaporation?

𝑋 − 5 7.5 − 5
𝑃 𝑋 > 7.5 = 𝑃 > = 𝑃 𝑍 > 2.5
1 1
= 1 − Φ(2.5) = 1 − .9938 = 0.0062.
Example
Exercise 3 (page 373)
Suppose that the IQ’s of applicants of a certain science high school follow a
normal distribution with mean of 120 and a standard deviation of 9.
a. One of the requirements of the school in accepting a student is that
the student’s IQ must be at least 115. What proportion of the
applicants will be rejected on the basis of their IQ?
Let X be the IQ of a selected applicant
Given: X~Normal(µ=120, s2=92)
Example
Find P(X<115).
𝑋 − 120 115 − 120
𝑃 𝑋 < 115 = 𝑃 <
9 9
5
= 𝑃 𝑍 < − ≈ 𝑃 𝑍 < −.56
9
= Φ −0.56
= 0.2877
Example
Suppose X~Normal(µ=10, s2=25). Find the 99th percentile.
99
𝑃 𝑋≤𝑐 = = 0.99
100
𝑐 − 10
𝑃 𝑍≤ = 0.99
5

But 𝑃 𝑍 ≤ 𝑧 = 0.99 when 𝑧 = 2.326 so we have


𝑐 − 10
= 2.326
5
𝑐 = 21.63
Exercise An automatic soda dispenser is regulated so that it
dispenses an average of 200 ml per cup. Suppose the
15 amount of drink dispensed is normally distributed with a
standard deviation equal to 15 ml.
a. What is the probability that a cup contains between
191 and 209 ml?
b. What is the probability that a cup will overflow if
230 ml cups are used?
c. What is the 99th percentile?
Exercise A student commutes daily from his suburban
home to his midtown school. The average time for
16 a one-way trip is 24 minutes with a standard
deviation of 3.8 minutes. Assume the distribution
of trip times to be normally distributed.
a) If the student leaves the house at 8:30 am and
the review part of the professor’s lecture is from
8:50 am until 9:00 am, what is the probability
that the student misses the review?
b) Find the probability that 2 of the 3 next trips will
take at least 30 minutes.
Next Sampling
Chapter Distributions

You might also like