Ch8 10
Ch8 10
b0 = 0
𝑧𝑦Ƹ = 𝑟 𝑧𝑥
7.3 Regression to the Mean
The equation below shows that if x is 2 SDs above its mean,
we won’t ever move more than 2 SDs away for y, since r
can’t be bigger than 1
𝑧𝑦Ƹ = 𝑟 𝑧𝑥
So, each predicted y tends to be closer to its mean than its
corresponding x
This property of the linear model is called regression to the
mean
7.4 Checking the Model
Models are only useful only when specific assumptions are
reasonable. We check conditions that provide information about
the assumptions.
1) Quantitative Data Condition – linear models only make sense for
quantitative data, so don’t be fooled by categorical data recorded as
numbers
2) Linearity Condition – two variables must have a linear association, or a
linear model won’t mean a thing (look at the scatter plot).
3) Outlier Condition – outliers can dramatically change a regression model.
Investigate the outlier, do the model with and without the outlier.
7.5 Learning More from the Residuals
Residuals help us see whether the model makes
sense
se = e 2
n−2
The standard deviation around the line should be the same wherever
we apply the model – this is called the Equal Spread Condition
7.5 Learning More from the Residuals (Expected)
7.5 Learning More from the Residuals (issues)
7.5 Learning More from the Residuals (issues)
7.6 Variation in the Model and R2
R2 = r2
Figure 7.5 The scatterplot of number of Cell Phones (000s) vs. HDI for countries shows a bent relationship not suitable for correlation or regression.
7.8 Nonlinear Relationships
To use regression models:
Transform or re-express one or both variables by a function such as:
• Logarithm
• Square root
• reciprocal
We can write:
No. of outcomes in A
P ( A) =
Total no. of outcomes
whenever the outcomes are equally likely, and
call this the theoretical probability of the event.
8.3 Two More Types of Probability (2 of 2)
A subjective, or personal probability is a type of probability derived from an individual’s personal judgment,
intuition, or belief about how likely an event is to occur. It is not based on mathematical calculations,
statistical data, or formal experimentation, but rather on personal experience, knowledge, or perception.
P (S ) = 1
P( A) = 1 − P( A ) C
where the set of outcomes that are not in event A is called the “complement”
of A, and is denoted AC.
P ( A and B ) = P ( A ) P ( B)
8.4 Probability Rules (5 of 6)
Rule 5: The Addition Rule
Two events are disjoint (or mutually exclusive) if they have no
outcomes in common.
The Addition Rule allows us to add the probabilities of disjoint
events to get the probability that either event occurs.
P ( A or B ) = P ( A ) + P ( B)
where A and B are disjoint.
8.4 Probability Rules (6 of 6)
Rule 6: The General Addition Rule
The General Addition Rule calculates the probability
that either of two events occurs. It does not require
that the events be disjoint.
P ( A or B ) = P ( A ) + P ( B ) − P ( A and B)
8.5 Joint Probability and Contingency
Events may be placed in a contingency table such as the one in the example below.
Example: As part of a Pick Your Prize Promotion, a store invited customers to choose
which of three prizes they’d like to win. The responses could be placed in the
following contingency table:
Blank Prize Preference Blank
Gender Skis Camera Bike Total
Man 117 50 60 227
Woman 130 91 30 251
Total 247 141 90 478
P(woman) = 251/478 = 0.525.
P(woman and camera) = 91/478 = 0.190.
P(bike|woman) = 30/251 = 0.120.
8.6 Conditional Probability and Independence
In general, when we want the probability of an event from a
conditional distribution, we write P(B|A) and pronounce it
“the probability of B given A.” or
A probability that takes into account a given condition is
called a conditional probability.
P ( A and B )
P (B | A ) =
P( A)
P (B | Ai )P ( Ai )
P ( Ai | B ) = n
P (B | A )P ( A )
j =1
j j
8.9 Reversing the Conditioning: Bayes’s Rule
Example: Suppose the engineer’s decision in the previous example
fixed the problem that was occurring. What is the probability that it
was the case adjustment that corrected the problem?
To solve this problem, we return to our definition of conditional
probability.
P (Case and Fixed )
P (Case | Fixed ) =
P (Fixed )
0.48
=
0.48 + 0.15 + 0.01
= 0.75
9.1 Random Variable
A random variable is a function that assigns numerical values to the
outcomes of an experiment (event).
1 2 2 2 997
Var ( X ) = 99,8002 + 49,800 + ( −200)
1000 1000 1000
= 14,960,000
E ( X c ) = E ( X ) c,
Var ( X c ) = Var ( X ),and
SD( X c ) = SD( X ).
Multiplying X by a constant a:
E (aX ) = aE ( X ),and
Var (aX ) = a Var ( X ).
2
SD(aX ) = a SD( X ).
Adding and Subtracting Random Variables
Expected Value when Adding or Subtracting Random Variables
E ( X Y ) = E ( X ) E (Y ).
Variances when Adding or Subtracting (independent) Random Variables
Var ( X Y ) = Var ( X ) + Var (Y )
if X and Y are independent .
Note: we always add the Variances (even when subtracting the Random Variables)
Compare this to the expected value and variance on two independent policies at the
original payout amount.
E ( X + Y ) = E ( X ) + E (Y ) = 2 200 = $400
Var ( X + Y ) = Var ( X ) + Var (Y ) = 2 14,960,000 = 29,920,00
Note: The expected values are the same but the variances are
different.
Standard Deviation and Variance of a RV
Example: The probability model for a particular life insurance policy is
shown. Find the variation and standard deviation of the annual payout.
Outcome Payout X X+10000 10 X PMF. P (X = x) (X +1000) P(X) (10 X) P(X)
Example:
Tossing a fair die is described by the Uniform model U[1, 2, 3, 4, 5, 6],
with P(X = i) = 1/6.
A Bernoulli Trial and Process
A Bernoulli Trial is a trial with the following characteristics:
1) There are only two possible outcomes (success
and failure) for each trial.
2) The probability of success, denoted p, is the
same for each trial. The probability of failure is q =
1 − p.
3) The trials are independent.
A Bernoulli process is a series of n independent and
identical Bernoulli trials.
The Geometric Distribution
The Geometric Distribution: Predicting the number of Bernoulli trials required to achieve the
first success.
The 10% Condition: For finite samples, the sample should be no more than 10% of the population.
Examples
1. A company wants to survey their customers to see if they received a
faulty product and what their feelings about their experiences are. If the
probability of getting a faulty product is .01, what is the probability that
they have to contact 100 customers before finding someone who
received a faulty product? (0.0037)
Solution
Let X denote the number of American adults who are Facebook users.
We also know that p = 0.68 and n = 100.
a. 𝑃 𝑋 = 70 = 0.0791
BINOM.DIST(70, 100, 0.68, FALSE)
dbinom(70, 100, 0.68)
b. P X ≤ 70 = 0.7007
BINOM.DIST(70, 100, 0.68, TRUE)
pbinom(70, 100, 0.68)
Poisson Distributions
• The Poisson Model predicting the number of events that occur over
a given interval of time or space.
c. Ninety percent of the time, the time a person must wait falls
below what value?
The Normal Distribution
• The normal distribution is one of the most
widely used continuous distributions.
• Bell-shaped and symmetric around its mean
• Gaussian distribution
• One reason for its extensive use is that
closely approximates the probability
distribution for a wide range of random
variables (naturally occurring events).
• Another important function of the normal
distribution is that it serves as the
cornerstone of statistical inference.
The Normal Distribution
1 1 𝑥 −𝜇 2
−
𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋
𝑥 1 𝑠 −𝜇 2
1 −
𝐹 𝑥 =Φ 𝑥 =න 𝑒 2 𝜎 𝑑𝑠
−∞ 𝜎 2𝜋
𝜎 is the population standard deviation. 𝜇 is the population mean
The Normal Distribution
It can be transformed into the standard Normal model (or the standard Normal distribution). This model is
used with standardized z-scores. The standard normal has mean 0 and standard deviation 1.
The Normal Distribution
𝑿 −𝝁
𝒛=
𝝈
Example: The Normal Distribution
Scores on a management aptitude exam are normally distributed with a mean of
72 and a standard deviation of 8.
a. What is the probability that a randomly selected manager will score above 60?
b. What is the probability that a randomly selected manager will score between
68 and 84?
f ( x ) = e − x for x 0 and 0
− s − t
P (s X t ) = e −e
− 0 − t − t
P ( X t ) = P (0 X t ) = e −e = 1− e
Figure 9.19 The Exponential density
function (with = 1).
We can expect to wait 20 seconds or less between hits about 75% of the time.
Ch10 Sampling Distribution
Learning Objectives
1) Understand how variations among multiple samples can be
represented in a sampling distribution
2) Calculate the sampling distribution (mean and variance) of a
proportion
3) Calculate the sampling distribution (mean and variance) of a
mean
Definitions
Sample Statistics are used as estimators of Population Parameters.
• Example continued
• 𝑝 = 0.55 and 𝑛 = 100, 𝑛𝑝 = 55 𝑎𝑛𝑑 𝑛 1 − 𝑝 = 45
• 𝐸 𝑃ത = 0.55
𝑝 (1 −𝑝) 0.55(1−0.55)
𝑠𝑒 𝑃ത = = = 0.0497
𝑛 100
0.57−0.55
ത
• 𝑃 𝑃 ≥ 0.57 = 𝑃 𝑍 ≥
0.0497
4 y − 15 − 12
SD ( y ) = = = 1.26 and z = = = 2.38.
n 10 SD ( y ) 1.26
So the chance that the shipper will reject a palette is only .0087—less
than 1%. That’s probably good enough for the company.
Standard Error
When we don’t know σ, we estimate it with the standard deviation
of the one real sample. That gives us the standard error,
(
SE y )= s
n
More about t distributiton
Gosset’s t
William S. Gosset discovered that when he used the standard error the shape
of the curve was no longer Normal.
s / n,