0% found this document useful (0 votes)
6 views90 pages

Probability Theory

The document covers probability theory, focusing on data representation methods such as stem-and-leaf plots and histograms, and discusses key concepts like mean, standard deviation, variance, and the empirical rule. It explains experiments, outcomes, events, and the sample space, along with the definitions and rules of probability, including conditional probability and independence. Additionally, it provides examples and exercises related to these concepts to illustrate their application.

Uploaded by

Kenny Loren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views90 pages

Probability Theory

The document covers probability theory, focusing on data representation methods such as stem-and-leaf plots and histograms, and discusses key concepts like mean, standard deviation, variance, and the empirical rule. It explains experiments, outcomes, events, and the sample space, along with the definitions and rules of probability, including conditional probability and independence. Additionally, it provides examples and exercises related to these concepts to illustrate their application.

Uploaded by

Kenny Loren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 90

Probability Theory

Data Representation
Graphical representation of Data
tem-and-Leaf Plot Histogram (Fig. 508)
tem-and-Leaf Plot
We divide these numbers into 5 groups, 75–79, 80–84, 85–89, 90–94, 95–99.
The integers in the tens position of the groups are 7, 8, 8, 9, 9. These form the
stem
The first leaf is 789, representing 77, 78, 79. The second leaf is 1123344,
representing 81, 81, 82, 83, 83, 84, 84.
The number of times a value occurs is called its absolute frequency. Thus 78
has absolute frequency 1, the value 89 has absolute frequency 5,
Histogram
histograms are better in displaying the distribution of data than stem-and-leaf
plots
The bases of the rectangles in Fig. 508 are the x-intervals (known as class
intervals) 74.5–79.5, 79.5–84.5, 84.5–89.5, 89.5–94.5, 94.5–99.5, whose
midpoints (known as class marks) are x 77, 82, 87, 92, 97, respectively. The
height of a rectangle with class mark x is the relative class frequency frel(x),
defined as the number of data values in that class interval, divided by n ( 30 in
our case). Hence the areas of the rectangles are proportional to these relative
frequencies, 0.10, 0.23, 0.43, 0.17, 0.07, so that histograms give a good
impression of the distribution
Mean. Standard Deviation. Variance. Empirical Rule
Medians and quartiles are easily obtained by ordering and counting, practically
without calculation. But they do not give full information on data: you can change
data values to some extent without changing the median. Similarly for the
quartiles.
The average size of the data values can be measured in a more refined way by
the
mean
standard deviation
Similarly, the spread (variability) of the data values can be measured in a more
refined
way by the standard deviation s or by its square, the variance

Empirical Rule.
Empirical Rule and Outliers. z-Score
Experiments, Outcomes, Events

An experiment is a process of measurement or observation, in a laboratory, in a


factory, on the street, in nature, or wherever; so “experiment” is used in a rather
general sense
Our interest is in experiments that involve randomness, chance effects, so that
we cannot predict a result exactly. A trial is a single performance of an
experiment. Its result is called an outcome or a sample point. n trials then give
a sample of size n consisting of n sample points. The sample space S of an
experiment is the set of all possible outcomes
Probability
Random phenomena
– Unable to predict the outcomes, but in the long-
run, the outcomes exhibit statistical regularity.

Examples
1. Tossing a coin – outcomes S ={Head, Tail}
Unable to predict on each toss whether is Head or
Tail.
In the long run can predict that 50% of the time
heads will occur and 50% of the time tails will occur
2. Rolling a die – outcomes
S ={ , , , , , }

Unable to predict outcome but in the long run can


one can determine that each outcome will occur 1/6
of the time.
Use symmetry. Each side is the same. One side
should not occur more frequently than another side
in the long run. If the die is not balanced this may
not be true.
The sample Space, S
The sample space, S, for a random phenomena
is the set of all possible outcomes.
Examples
1. Tossing a coin – outcomes S ={Head, Tail}

2. Rolling a die – outcomes


S ={ , , , , , }

={1, 2, 3, 4, 5, 6}
An Event , E
The event, E, is any subset of the sample space,
S. i.e. any set of outcomes (not necessarily all
outcomes) of the random phenomena
Venn
S diagram
E
The event, E, is said to have occurred if after
the outcome has been observed the outcome lies
in E.

S
E
Examples

1. Rolling a die – outcomes


S ={ , , , , , }
={1, 2, 3, 4, 5, 6}

E = the event that an even number is


rolled
= {2, 4, 6}
={ , , }
Special Events
The Null Event, The empty event - 

 = { } = the event that contains no outcomes


The Entire Event, The Sample Space - S
S = the event that contains all outcomes
The empty event, , never occurs.
The entire event, S, always occurs.
Set operations on Events
Union
Let A and B be two events, then the union of A
and B is the event (denoted by AB) defined by:
A  B = {e| e belongs to A or e belongs to B}
AB

A B
The event A  B occurs if the event A occurs or
the event and B occurs .

AB

A B
Intersection
Let A and B be two events, then the intersection of
A and B is the event (denoted by AB) defined by:

A  B = {e| e belongs to A and e belongs to B}

AB

A B
The event A  B occurs if the event A occurs and
the event and B occurs .

AB

A B
Complement

Let A be any event, then the complement of A


(denoted by A ) defined by:

A = {e| e does not belongs to A}

A
A
The event A occurs if the event A does not
occur

A
A
In problems you will recognize that you are
working with:

1. Union if you see the word or,


2. Intersection if you see the word and,
3. Complement if you see the word not.
Definition: mutually exclusive
Two events A and B are called mutually
exclusive if:
A  B 

A B
If two events A and B are are mutually
exclusive then:

1. They have no outcomes in common.


They can’t occur at the same time. The outcome of the
random experiment can not belong to both A and B.

A B
Probability
Definition: probability of an Event E.
Suppose that the sample space S = {o1, o2, o3, …
oN} has a finite number, N, of oucomes.
Also each of the outcomes is equally likely
(because of symmetry).
Then for any event E
n E  n E  no. of outcomes in E
P E =  
n S  N total no. of outcomes
Note : the symbol n  A  = no. of elements of A
Thus this definition of P[E], i.e.

n E  n E  no. of outcomes in E
P E =  
n S  N total no. of outcomes

Applies only to the special case when


1. The sample space has a finite no.of outcomes,
and
2. Each outcome is equi-probable
If this is not true a more general definition of
probability is required.
Examples
• Consider an experiment involving a single coin toss.
There are two possible outcomes, heads (H) and tails
(T). If the coin is fair, the probability P(H) =P(T) =
0.5
• Consider another experiment involving three coin
tosses. The outcome will now be a 3-long string of
heads or tails. The sample space is
S = {HHH, HHT, HTH, HTT, THH, THT, TTH,
TTT}.
• A = {exactly 2 heads occur} = {HHT, HT H, T HH}.
• P {HHT,HTH,THH} =P {HHT} +P {HTH} +P {THH} =1/8
+1/8 + 1/8 = 3/8
Rules of Probability
Rule The additive rule
(Mutually exclusive events)
P[A  B] = P[A] + P[B]
i.e.
P[A or B] = P[A] + P[B]

if A  B = 
(A and B mutually exclusive)
If two events A and B are are mutually
exclusive then:

1. They have no outcomes in common.


They can’t occur at the same time. The outcome of the
random experiment can not belong to both A and B.

A B
P[A  B] = P[A] + P[B]
i.e.
P[A or B] = P[A] + P[B]

A B
Rule The additive rule
(In general)

P[A  B] = P[A] + P[B] – P[A  B]

or
P[A or B] = P[A] + P[B] – P[A and B]
Logic AB
A B

AB

When P[A] is added to P[B] the outcome in A  B are counted twice

hence
P[A  B] = P[A] + P[B] – P[A  B]
P  A  B   P  A  P  B   P  A  B 

Example:
Saskatoon and Moncton are two of the cities competing
for the World university games. (There are also many
others). The organizers are narrowing the competition to
the final 5 cities.
There is a 20% chance that Saskatoon will be amongst
the final 5. There is a 35% chance that Moncton will be
amongst the final 5 and an 8% chance that both
Saskatoon and Moncton will be amongst the final 5.
What is the probability that Saskatoon or Moncton will
be amongst the final 5.
Solution:
Let A = the event that Saskatoon is amongst the final 5.
Let B = the event that Moncton is amongst the final 5.
Given P[A] = 0.20, P[B] = 0.35, and P[A  B] = 0.08
What is P[A  B]?
Note: “and” ≡ , “or” ≡  .
P  A  B   P  A  P  B   P  A  B 
0.20  0.35  0.08 0.47
Rule for complements

2. P  A  1  P  A

or
P  not A 1  P  A
Complement

Let A be any event, then the complement of A


(denoted by A ) defined by:

A = {e| e does not belongs to A}

A
A
The event A occurs if the event A does not
occur

A
A
Logic:
A and A are mutually exclusive.
and S  A  A

A
A

thus 1 P  S  P  A  P  A 
and P  A  1  P  A
Independent event
Sampling With and Without Replacement
• A box contains 10 screws, three of which are
defective. Two screws are drawn at random.
Find the probability that neither of the two
screws is defective.
• Solution?
Independent event
Exercise
• A batch of 200 iron rods consists of 50
oversized rods, 50 undersized rods, and 100
rods of the desired length. If two rods are
drawn at random without replacement,
what is the probability of obtaining (a) two
rods of the desired length, (b) exactly one of
the desired length, (c) none of the desired
length?
Conditional Probability
Conditional Probability
• Frequently before observing the outcome of a random
experiment you are given information regarding the
outcome
• How should this information be used in prediction of
the outcome.
• Namely, how should probabilities be adjusted to take
into account this information
• Usually the information is given in the following
form: You are told that the outcome belongs to a
given event. (i.e. you are told that a certain event has
occurred)
Definition
Suppose that we are interested in computing the
probability of event A and we have been told
event B has occurred.
Then the conditional probability of A given B is
defined to be:
P  A  B if P  B  0
P  A B  
P  B
Rationale:
If we’re told that event B has occurred then the sample
space is restricted to B.
The probability within B has to be normalized, This is
achieved by dividing by P[B]
The event A can now only occur if the outcome is in of
A ∩ B. Hence the new probability of A is:

A
P  A  B B
P  A B  
P  B A∩B
Examples
An Example
The academy awards is soon to be shown.
For a specific married couple the probability that
the husband watches the show is 80%, the
probability that his wife watches the show is
65%, while the probability that they both watch
the show is 60%.
If the husband is watching the show, what is the
probability that his wife is also watching the
show
Solution:
The academy awards is soon to be shown.
Let B = the event that the husband watches the show
P[B]= 0.80
Let A = the event that his wife watches the show
P[A]= 0.65 and P[A ∩ B]= 0.60

P  A  B 0.60
P  A B    0.75
P  B 0.80
Independence
Definition
Two events A and B are called independent if
P  A  B   P  A P  B 
Note if P  B  0 and P  A 0 then
P  A  B P  A P  B 
P  A B    P  A
P  B P  B
P  A  B P  A P  B 
and P  B A   P  B 
P  A P  A
Thus in the case of independence the conditional probability of
an event is not affected by the knowledge of the other event
Difference between independence
and mutually exclusive

mutually exclusive
Two mutually exclusive events are independent only in
the special case where
P  A 0 and P  B  0. (also P  A  B  0
Mutually exclusive events are
A highly dependent otherwise. A
B
and B cannot occur
simultaneously. If one event
occurs the other event does not
occur.
Independent events
P  A  B   P  A P  B 

P  A  B P  A
or  P  A 
P  B PS
S

A B
The ratio of the probability of the
AB set A within B is the same as the
ratio of the probability of the set
A within the entire sample S.
The multiplicative rule of probability

 P  A P  B A if P  A 0
P  A  B  
 P  B  P  A B  if P  B  0

and
P  A  B   P  A P  B 

if A and B are independent.


Bayes’ Rule
• Given two events A and B and suppose that Pr(A) > 0. Then

Pr( AB ) Pr( A | B ) Pr( B )


Pr( B | A)  
Pr( A) Pr( A)
• Example:

Pr(R) = 0.8
R: It is a rainy day
Pr(W|R) R R
W: The grass is wet
W 0.7 0.4 Pr(R|W) = ?
W 0.3 0.6
Bayes’ Rule

R R
R: It rains
W 0.7 0.4
W: The grass is wet
W 0.3 0.6

Information
Pr(W|R)
R W

Inference
Pr(R|W)
Bayes’ Rule

R R
R: It rains
W 0.7 0.4
W: The grass is wet
W 0.3 0.6

Information: Pr(E|H)
Hypothesis H Evidence E
Posterio Likelihood
Inference: Pr(H|E) Prior
r
Pr( E | H ) Pr( H )
Pr( H | E ) 
Pr( E )
Bayes’ Rule: More Complicated
• Suppose that B1, B2, … Bk form a partition of S:

Bi B j ; i Bi S
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 
Pr( A)
Pr( A | Bi ) Pr( Bi )
 k
 j 1 Pr( AB j )
Pr( A | Bi ) Pr( Bi )
 k
 j 1
Pr( B j ) Pr( A | Bj )
Bayes’ Rule: More Complicated
• Suppose that B1, B2, … Bk form a partition of S:

Bi B j ; i Bi S
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 
Pr( A)
Pr( A | Bi ) Pr( Bi )
 k
 j 1 Pr( AB j )
Pr( A | Bi ) Pr( Bi )
 k
 j 1
Pr( B j ) Pr( A | Bj )
Bayes’ Rule: More Complicated
• Suppose that B1, B2, … Bk form a partition of S:

Bi B j ; i Bi S
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A) 
Pr( A)
Pr( A | Bi ) Pr( Bi )
 k
 j 1 Pr( AB j )
Pr( A | Bi ) Pr( Bi )
 k
 j 1
Pr( B j ) Pr( A | Bj )
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
W 0.3 0.6 U 0.1 0.8

Pr(U|W) = ?
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
W 0.3 0.6 U 0.1 0.8

Pr(U|W) = ?
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
W 0.3 0.6 U 0.1 0.8

Pr(U|W) = ?
Conditioning
• If A and B are events with Pr(A) > 0, the conditional
probability of B given A is
Pr( AB)
Pr( B | A) 
Pr( A)
• Example: Drug test
A = {Patient is a Women}
Women Men B = {Drug fails}
Success 200 1800 Pr(B|A) = ?
Failure 1800 200 Pr(A|B) = ?
Which Drug is Better ?
Simpson’s Paradox: View I

Drug II is better than Drug I


A = {Using Drug I}
Drug I Drug II B = {Using Drug II}
Success 219 1010 C = {Drug succeeds}

Failure 1801 1190 Pr(C|A) ~ 10%


Pr(C|B) ~ 50%
Random variable and probability
distribution
Random variables
Probability distribution
Poisson distribution
Normal distribution
Normal distribution
Normal distribution
Homework

You might also like