Module 4 Probability Basics - Filled
Module 4 Probability Basics - Filled
Random sampling helps eliminate bias, but estimates based on samples can still be wrong because of
sampling variability—two samples from the same population will yield different results
If the sampling variability is too large, then we cannot trust the results of any one sample
We need probability to help us understand and express how samples behave
2
Probability Terminology
• Random Phenomenon: situation involving chance leading to results called
outcomes
• Examples: getting heads or tails on a coin toss; whether school is cancelled on a particular
day
• Unpredictable in the short run but regular and predictable over the long run
• Probability: proportion of times an outcome occurs in infinitely repeated trials
• Example: snow chance of 80% implies it will snow on 80% of all days with similar weather
conditions H
H
• In practice, we have only a finite number of repeated trials T
H
• Law of Large Numbers: the larger the sample size (number of trials), the closer
H
T
the observed sample probability will be to the unknown theoretical probabilityT
-
• Sample Space, H
H
4
Basic Probability Definitions
• Union of two events & 𝑆
• Denoted U
𝐴 and 𝐵
• The event either or occurs
• Intersection of two events &
• Denoted ∩ 𝑆
• The event both and happen at the same time
• Mutually exclusive / disjoint events 𝐴 𝐵
• Two events that cannot happen at the same
time
• They have no intersection 𝑆
• Complement of event
𝐴 𝐴
𝑐
• Denoted
• The event does not occur
5
Basic Probability Rules
• For any event ,
• for a 0% chance of event
• for a 100% chance of event
• If is the sample space, then
• Complement Rule:
• The complement of event is its opposite, denoted
• For any event ,
• Addition Rule:
• In general,
• If and are mutually exclusive, then
• Why is this true?
• Because in this case
6
Example 2
Recall the results of the diabetes study: No
Treatment Complications Complications Total
• = patient used Treatment 1
Treatment 1 11 77 88
• = patient has experienced complications
Treatment 2 9 103 112
• = patient used Treatment 2 and has not
Total 20 180 200
experienced complications
Estimate the following probabilities:
7
Conditional Probability
• Multiplication Rule:
• In general,
8
Example 3
The table below displays information regarding the 80.2 million long-form federal returns
received by the IRS one year. It cross-tabulates the taxpayer’s income level and whether
they were audited. For simplicity, frequencies are reported in thousands and rounded.
• Find the following conditional probabilities by reading them directly from the table:
Not
Income Level Audited Audited Total
Under $25,000 90 14010 14100
$25,000 to $49,999 71 30629 30700
$50,000 to $99,999 69 24631 24700
$100,000 or more 80 10620 10700
Total 310 79890 80200
• Given a positive test, what is the probability the baby will have Down syndrome?
• Given a negative test, what is the probability the baby will have Down syndrome?
10
Independence
• Suppose we have a population of 100 subjects, of which 5 are unusual. If
we randomly sample 3 subjects without replacement, what is the
probability all 3 will be unusual?
𝑃 ( 𝐵 ∨ 𝐴 )
𝐵 𝑃 ( 𝐴∩ 𝐵 ) =𝑃 ( 𝐴 ) ∗ 𝑃 (𝐵∨ 𝐴)
( 𝐴 ) 𝐴 𝑃 (𝐵 𝑐 𝑐 𝑃 ( 𝐴∩ 𝐵 𝑐 )=𝑃 ( 𝐴 ) ∗ 𝑃 (𝐵𝑐∨ 𝐴)
𝑃 ∨ 𝐴)
𝐵
-
𝐵
𝑐
𝑃( 𝐴 )
𝐴 𝑐
𝑃 ( 𝐵 ∨ 𝑃 ( 𝐴𝑐 ∩ 𝐵 )=𝑃 ( 𝐴𝑐 ) ∗ 𝑃 ( 𝐵∨ 𝐴𝑐 )
) 𝑐
𝐴 𝑃 (𝐵 𝑐 𝑐 𝑐 𝑃 ( 𝐴𝑐 ∩ 𝐵 𝑐) =𝑃 ( 𝐴𝑐 ) ∗ 𝑃 ( 𝐵𝑐∨ 𝐴𝑐)
∨𝐴 )
𝐵
• The sum of the probabilities emanating from any branch is 1
• The final outcomes are disjoint
• The find the probability for a final outcome, multiply across that branch
13
Example 6
The following tree diagram shows the probabilities of skin cancer among men and women
by body locations:
0.44 Man
• Calculate Head
0.56 Woman
. 15
0
0.63 Man
Individuals with 0.41 Trunk
• Calculate skin cancer 0.37 Woman
0.4
4
0.20 Man
Limbs
• Calculate 0.80 Woman
14
Probability Rules for Random Variables
• A random variable has values that Let and be numbers with
represent numerical outcomes of
a random phenomenon
• Capital letters such as refer to the • by the addition rule for disjoint events
variable itself
• Lowercase letters such as refer to •
possible values of the variable • by the addition rule for disjoint events
• Two types of random variables:
• Discrete
•
• by the complement rule
• Continuous
• The probability distribution of a
random variable tells us what
values are possible and their
associated probabilities
• by the complement & addition rules
15
Probability Distributions for Discrete Random Variables
• Discrete random variables have a countable (finite) list of possible outcomes
• A probability model lists all possible outcomes along with their associated probabilities
Value of …
Probability …
• Where
• Find the probability for any event by adding probabilities of the individual outcomes that
make up the event
• Mean (aka expected value) of
• Multiply each possible value by its probability, then add all the products:
• Variance of
• Subtract the mean from each possible value, square the result, multiply by the corresponding
probability, then add all the products:
•
16
Example 7
Suppose we have a population of 15 people with the following ages. What is the
probability distribution for , the age of a randomly chosen person?
13, 14, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18
13 14 15 16 17 18
17
Example 8
A study examined hearing impairment in 5333 Dalmatians, since pure dog breeds are often
inbred with high numbers of congenital defects. Let = the number of ears impaired in a
randomly chosen Dalmatian. 0 1 2
0.70 0.22 0.08
• What is the mean of ?
18
Probability Distributions for Continuous Random
Variables
• Continuous random variables can take on any value in an interval
• Since possible outcomes are infinite, we calculate probabilities by finding areas
under a density curve
• A density curve is:
• Fitted to the irregular bars of a histogram
• Describes the overall pattern of a distribution
• Is always on or above the horizontal axis
• Has total area exactly 1 underneath it
• Probabilities are assigned to intervals of values
rather than individual values
• In fact, if is continuous and is any constant,
19
Example 9
Suppose we have a uniform distribution over the interval from 0 to 5
and
and
• Find the mean and standard deviation for the total of these two bills
• Mean
• Variance
• Standard deviation
22
Important Points
• Law of Large Numbers: the larger the sample size, the closer
the sample probability will be to the theoretical probability
• Basic probability rules for events and , and sample space
• and
• Complement rule:
• Addition rule:
• If and are mutually exclusive,
• Multiplication rule:
• The conditional probability of given is
• If and are independent,
• Tree diagrams simplify complex probability
• Two types of random variables:
• Discrete: find probabilities by adding
• Continuous: find probabilities by computing areas under density curves
• When combining two random variables, add their variances 23