0% found this document useful (0 votes)
13 views98 pages

Ch8 10

The document discusses statistical concepts related to correlation, regression, and probability. It covers the least squares line for standardized variables, regression to the mean, model checking conditions, and the importance of residuals in assessing model fit. Additionally, it explains various types of probability, including empirical, theoretical, and subjective probabilities, along with rules for calculating probabilities and the use of probability trees and Bayes's Rule.

Uploaded by

ngoclampham3008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views98 pages

Ch8 10

The document discusses statistical concepts related to correlation, regression, and probability. It covers the least squares line for standardized variables, regression to the mean, model checking conditions, and the importance of residuals in assessing model fit. Additionally, it explains various types of probability, including empirical, theoretical, and subjective probabilities, along with rules for calculating probabilities and the use of probability trees and Bayes's Rule.

Uploaded by

ngoclampham3008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

7.

2 Correlation and the Line


Working in Standard Deviations
If we consider finding the least squares line for standardized variables
zx and zy, the formula for slope can be simplified
b1 = r
The intercept formula can be rewritten as well

b0 = 0

𝑧𝑦Ƹ = 𝑟 𝑧𝑥
7.3 Regression to the Mean
The equation below shows that if x is 2 SDs above its mean,
we won’t ever move more than 2 SDs away for y, since r
can’t be bigger than 1

𝑧𝑦Ƹ = 𝑟 𝑧𝑥
So, each predicted y tends to be closer to its mean than its
corresponding x
This property of the linear model is called regression to the
mean
7.4 Checking the Model
Models are only useful only when specific assumptions are
reasonable. We check conditions that provide information about
the assumptions.
1) Quantitative Data Condition – linear models only make sense for
quantitative data, so don’t be fooled by categorical data recorded as
numbers
2) Linearity Condition – two variables must have a linear association, or a
linear model won’t mean a thing (look at the scatter plot).
3) Outlier Condition – outliers can dramatically change a regression model.
Investigate the outlier, do the model with and without the outlier.
7.5 Learning More from the Residuals
Residuals help us see whether the model makes
sense

A scatterplot of residuals against predicted


values should show nothing interesting – no
patterns, no direction, no shape

If nonlinearities, outliers, or clusters in the


residuals are seen, then we must try to determine
what the regression model missed
7.5 Learning More from the Residuals
The standard deviation of the residuals, se, gives us a measure of how
much the points spread around the regression line.
We estimate the standard deviation of the residuals as shown below

se = e 2

n−2

The standard deviation around the line should be the same wherever
we apply the model – this is called the Equal Spread Condition
7.5 Learning More from the Residuals (Expected)
7.5 Learning More from the Residuals (issues)
7.5 Learning More from the Residuals (issues)
7.6 Variation in the Model and R2

To access the explanatory power of the model you just created,


we use the R squared written as R2

R2 = r2

It is a value between 0 and 1


r2 gives the fraction of the data’s variation accounted
for by the model and 1 – r2 is the fraction of the original
variation left in the residuals.
7.6 Variation in the Model and R2

How Big Should R2 Be?


There is no value of R2 that automatically determines that a
regression is “good”
Data from scientific experiments often have R2 in the 80% to 90%
range
Data from observational studies may have an acceptable R2 in the
30% to 50% range
7.7 Reality Check: Is the Regression
Reasonable?
• The results of a statistical analysis should reinforce common
sense
• Is the slope reasonable?
• Does the direction of the slope seem right?
• Always be skeptical and ask yourself if the answer is reasonable
7.8 Nonlinear Relationships
A regression model works well if the relationship between the two
variables is linear. What should be done if the relationship is nonlinear?

Figure 7.5 The scatterplot of number of Cell Phones (000s) vs. HDI for countries shows a bent relationship not suitable for correlation or regression.
7.8 Nonlinear Relationships
To use regression models:
Transform or re-express one or both variables by a function such as:
• Logarithm
• Square root
• reciprocal

Figure 7.6 Taking the logarithm of


cell phones results in a more nearly
linear relationship.
BSMM8320
Quantitative Studies

Dr. Yawo Kobara


Ch. 8: Randomness and Probability
Ch. 9: Random Variables and Probability
Distributions
Learning Objectives
1) Estimate probability using empirical, theoretical, and subjective methods
2) Combine probabilities of one event and/or another
3) Determine whether events are independent and/or disjoint
4) Represent probabilities of multiple events using a probability tree
5) Update estimates of probability using additional information
6) Calculate the expected value and variance of discrete random variable
7) Analyze the effect of adding and subtracting random variables
8) Model discrete random variables
9) Model continuous random variables
8.1 Random Phenomena
and Empirical Probability
• With a phenomenon is random, when we
can’t predict the individual outcomes, but we
can hope to understand characteristics of
their long-run behavior.
• For any random phenomenon, each attempt,
or trial, generates an outcome.
• One or more outcomes is called event.
• Sample space is a special event that is the
collection of all possible outcomes. We
denote the sample space S or sometimes Ω
8.2 Random Phenomena and Empirical Probability
Empirical probability
• The probability of an event is its long-run relative frequency.
• Empirical probability is based on repeatedly observing the
event’s outcome.
• The Law of Large Numbers (LLN) states that if the trials are
independent, then as the number of trials increases, the long-
run relative frequency of an event gets closer and closer to a
single value.
• Independence means that the outcome of one trial doesn’t
influence or change the outcome of another.
8.3 Two More Types of Probability (1 of 2)
Model-Based (Theoretical) Probability of event A can be
computed with the following equation:

We can write:
No. of outcomes in A
P ( A) =
Total no. of outcomes
whenever the outcomes are equally likely, and
call this the theoretical probability of the event.
8.3 Two More Types of Probability (2 of 2)
A subjective, or personal probability is a type of probability derived from an individual’s personal judgment,
intuition, or belief about how likely an event is to occur. It is not based on mathematical calculations,
statistical data, or formal experimentation, but rather on personal experience, knowledge, or perception.

Characteristics of Subjective Probability


•Based on personal opinions or estimates.
•May vary widely between individuals.
•Lacks a rigorous or repeatable method for calculation.
•Often used when empirical data or experiments are not available and is biased

Bias Key Driver Effect Common in


Overconfidence Overestimation Leads to excessive risk- Leadership,
of self taking. investments

Sunk Cost Emotional Results in wasting resources Projects,


attachment on failing efforts. personal
finance
Recency Overweighting Focuses on short-term Performance
recent events trends, ignoring history. reviews, trading
8.4 Probability Rules (1 of 6)
Rule 1
Probability is a number between 0 and 1.
For any event A,
0  P ( A )  1.
If the probability of an event occurring is 0, the event can’t occur.
If the probability is 1, the event always occurs.
8.4 Probability Rules (2 of 6)
Rule 2: The Probability Assignment Rule
The probability of the set of all possible outcomes must be 1.

P (S ) = 1

where S represents the set of all possible outcomes and is


called the sample space.
8.4 Probability Rules (3 of 6)
Rule 3: The Complement Rule
The probability of an event occurring is 1 minus the
probability that it doesn’t occur.

P( A) = 1 − P( A ) C

where the set of outcomes that are not in event A is called the “complement”
of A, and is denoted AC.

Type equation here.


8.4 Probability Rules (4 of 6)
Rule 4: The Multiplication Rule
For two independent events A and B, the probability that both A
and B occur is the product of the probabilities of the two events.

P ( A and B ) = P ( A )  P ( B)
8.4 Probability Rules (5 of 6)
Rule 5: The Addition Rule
Two events are disjoint (or mutually exclusive) if they have no
outcomes in common.
The Addition Rule allows us to add the probabilities of disjoint
events to get the probability that either event occurs.

P ( A or B ) = P ( A ) + P ( B)
where A and B are disjoint.
8.4 Probability Rules (6 of 6)
Rule 6: The General Addition Rule
The General Addition Rule calculates the probability
that either of two events occurs. It does not require
that the events be disjoint.

P ( A or B ) = P ( A ) + P ( B ) − P ( A and B)
8.5 Joint Probability and Contingency
Events may be placed in a contingency table such as the one in the example below.
Example: As part of a Pick Your Prize Promotion, a store invited customers to choose
which of three prizes they’d like to win. The responses could be placed in the
following contingency table:
Blank Prize Preference Blank
Gender Skis Camera Bike Total
Man 117 50 60 227
Woman 130 91 30 251
Total 247 141 90 478
P(woman) = 251/478 = 0.525.
P(woman and camera) = 91/478 = 0.190.
P(bike|woman) = 30/251 = 0.120.
8.6 Conditional Probability and Independence
In general, when we want the probability of an event from a
conditional distribution, we write P(B|A) and pronounce it
“the probability of B given A.” or
A probability that takes into account a given condition is
called a conditional probability.
P ( A and B )
P (B | A ) =
P( A)

P(bike|woman) = P(bike and woman) / P(woman)


= (30/478) / (251/478)
= 30/251 = 0.12
8.6 Conditional Probability and Independence
Rule 7: The General Multiplication Rule
The General Multiplication Rule calculates the probability that both of
two events occurs. It does not require that the events be independent.
P ( A and B ) = P ( A )  P (B | A )

Events A and B are independent whenever P(B|A) = P(B).


Independent vs. Disjoint (disjoint or mutually exclusive)
For all practical purposes, disjoint events cannot be independent.

Don’t make the mistake of treating disjoint events as if they were


independent and applying the Multiplication Rule for independent
events.
If you’re given probabilities without a contingency table, you can often
construct a simple table to correspond to the probabilities and use this table
to find other probabilities.
Example: A survey classified homes into two price categories (Low and
High). It also noted whether the houses had at least 2 bathrooms or not
(True or False). 56% of the houses had at least 2 bathrooms, 62% of the
houses were Low priced, and 22% of the houses were both.
Blank Blank At Least Two Bathrooms Blank

Blank Blank True False Total

Price Low 0.22 Blank 0.62

Blank High Blank Blank Blank

Blank Total 0.56 Blank 1.00


8.7 Probability Trees
Example: Suppose a company manufactures
components for electronic devices. In the
manufacturing process, if an unacceptable level
of defects occur, an engineer must decide how
to correct the problem. The engineer can order
the three minor adjustments to try to fix the
problem. Each is listed with the probability that it
is the cause of the defects: i. motherboard
adjustment (10%) ii. memory adjustment
(30%) iii. case adjustment (60%).
It is observed that 80% of case adjustments
fixed the problem, 50% of memory adjustments
fixed the problem and 10% of motherboard
adjustment fixed the problem.
What is the probability that a minor
adjustment will correct the problem?
8.7 Probability Trees
Example (continued): All the joint events listed in the probability tree are
disjoint. We may therefore simply add the relevant probabilities to answer the
question that was posed: What is the probability that a minor adjustment will
correct the problem?
P (Fixed )
= P ((Case and Fixed ) or ( Memory and Fixed ) or ( Motherboard and Fixed ) )
= P (Case and Fixed ) + P ( Memory and Fixed ) + P ( Motherboard and Fixed )
= 0.48 + 0.15 + 0.01
= 0.64.
The likelihood that a minor adjustment will correct the problem is about 64%.
8.8 Reversing the Conditioning: Bayes’s Rule
We can generalize the result on the previous slide for n
mutually exclusive and exhaustive (union is the whole space)
events Ai.
P (B ) = P (B | A1 )  P ( A1 ) + P (B | A2 )  P ( A2 ) + L + P (B | An )  P ( An )

From this we obtain Bayes’s Rule.

P (B | Ai )P ( Ai )
P ( Ai | B ) = n

 P (B | A )P ( A )
j =1
j j
8.9 Reversing the Conditioning: Bayes’s Rule
Example: Suppose the engineer’s decision in the previous example
fixed the problem that was occurring. What is the probability that it
was the case adjustment that corrected the problem?
To solve this problem, we return to our definition of conditional
probability.
P (Case and Fixed )
P (Case | Fixed ) =
P (Fixed )
0.48
=
0.48 + 0.15 + 0.01
= 0.75
9.1 Random Variable
A random variable is a function that assigns numerical values to the
outcomes of an experiment (event).

Examples of random events and random variables


• outcome of tossing a coin once, twice, tossing 2 coins together,
etc…
• outcome of rolling a die, rolling 2 dies, etc ...
9.1 Discrete and Continuous Random
Variable
• If the possible outcomes takes on a countable number of distinct
values, the random variable is called a discrete random variable
Example: Anything that will be expressed as ”number of”

• If a random variable can take on any value between two values in an


interval, it is called a continuous random variable.
Takes on an uncountable (infinite) number of possible values, usually
over an interval.

Examples: Any value within a range (e.g., height, weight, temperature).


Examples of Random Variable
Discrete random variables
- the number of children in a family
- the Friday night attendance at a cinema
- the number of patients in a doctor's surgery
- the number of customers
- the number of defective light bulbs in a box of ten.

Continuous random variables


- the height of students in a class
- the amount of ice tea in a glass
- the change in temperature throughout a day,
Probability models for RVs
For both discrete and continuous random variables, the associated
probability values for all the outcomes of the set of all variables is called the
probability model.
• Models for discrete random variable are called probability mass function
(pmf).
– 𝑃 𝑋 = 𝑥 is the probability that a random variable assumes a particular
value. Each probability is between 0 and 1, or 0 ≤ 𝑃(𝑋 = 𝑥) ≤ 1
– The sum of the probabilities equals 1, or σ 𝑃 𝑋 = 𝑥 = 1

• A discrete random variable can also be defined in terms of their cumulative


distribution function (cdf), 𝑃 𝑋 ≤ 𝑥 .
Probability model
PMFs can be tabulated or formulated or described using sentences.

Policyholder Payout (cost) PMF CDF


Outcome P (X = x) P(X<=x)

Neither 0 997 997


1000 1000
Disability 50,000 2 999
1000 1000
Death 100,000 1 1
1000
Standard Deviation and Variance of a RV
• The expected value is also referred to as the mean.
• It is a weighted average of all possible values of X.
• Denoted as 𝐸(𝑋) or 𝜇.
• It is calculated as 𝐸 𝑋 = 𝜇 = σ 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖 .
• The variance and standard deviation are both measures of
variability.
• The variance is denoted Var 𝑋 or 𝜎 2 .
• The variance is calculated as
𝑉𝑎𝑟(𝑋) = 𝜎 2 = σ 𝑥𝑖 − 𝜇 2 𝑃 𝑋 = 𝑥𝑖 .
• The standard deviation is denoted by 𝑆𝐷(𝑋) or 𝜎.
Standard Deviation and Variance of a RV
Example: The probability model for a particular life insurance policy is
shown. Find the variation and standard deviation of the annual payout.
Policyholder Payout X PMF CDF X P(X) Deviation
Outcome P (X = x) P(X<=x) (x − E (x))
Neither 0 997 997 0 (0 − 200) =-200
1000 1000 99,800
Disability 50,000 2 999 100 (50,000 − 200) =
1000 1000 49,800
Death 100,000 1 1 100 (100,000 − 200) =
1000 99,800

𝐸 𝑋 = 𝜇 = ෍ 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖 = 0 + 100 + 100 = 200

 1  2 2  2  997 
Var ( X ) = 99,8002   + 49,800   + ( −200)  
 1000   1000   1000 
= 14,960,000

SD( X ) = 14,960,000  $3867.82


Example 2 (try this on your own)
Brad Williams is the owner of a large car dealership. Brad decides to construct
an incentive compensation program that equitably and consistently
compensates employees on the basis of their performance.
He offers an annual bonus of $10,000 for superior performance, $6,000 for
good performance, $3,000 for fair performance, and $0 for poor performance.
Based on prior records, he expects an employee to perform at superior, good,
fair, and poor performance levels with probabilities 0.15, 0.25, 0.40, and 0.20,
respectively.
a. Calculate the expected value of the annual bonus amount.
b. Calculate the variance and the standard deviation of the annual bonus
amount.
c. What is the total annual amount that Brad can expect to pay in bonuses if he
has 25 employees?
Example 2 (try this on your own)
Let the random variable X denote the bonus amount (in $1,000’s).

a. The expected value is 𝐸 𝑋 = 𝜇 = σ 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖 = 4.2 or $4,200.


b. The variance is 𝑉𝑎𝑟 𝑋 = 𝜎 2 = σ 𝑥𝑖 − 𝜇 2 𝑃 𝑋 = 𝑥𝑖 = 9.97 (in ($1,000s)2). The standard
deviation is 𝑆𝐷 𝑋 = 𝜎 = 3.158 or $3,158.
c. If Brad has 25 employees, we can expect to pay $4,200*25 = $105,000 in bonuses.
Adding and Subtracting Random Variables
Adding a constant c to X:

E ( X  c ) = E ( X )  c,
Var ( X  c ) = Var ( X ),and
SD( X  c ) = SD( X ).
Multiplying X by a constant a:

E (aX ) = aE ( X ),and
Var (aX ) = a Var ( X ).
2

SD(aX ) = a SD( X ).
Adding and Subtracting Random Variables
Expected Value when Adding or Subtracting Random Variables
E ( X  Y ) = E ( X )  E (Y ).
Variances when Adding or Subtracting (independent) Random Variables
Var ( X  Y ) = Var ( X ) + Var (Y )
if X and Y are independent .
Note: we always add the Variances (even when subtracting the Random Variables)

𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 = 𝑎 2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌 + 2 𝑎𝑏𝐶𝑜𝑣(𝑋, 𝑌)


Adding and Subtracting Random Variables
Illustration: The expected annual payout per insurance policy is $200 and the variance is
$14,960,000. If the payout amounts are doubled, what are the new expected value and
variance?
E ( 2 X ) = 2E ( X ) = 2  200 = $400
Var ( 2 X ) = 22Var ( X ) = 4  14,960,000 = 59,840,000

Compare this to the expected value and variance on two independent policies at the
original payout amount.

E ( X + Y ) = E ( X ) + E (Y ) = 2  200 = $400
Var ( X + Y ) = Var ( X ) + Var (Y ) = 2  14,960,000 = 29,920,00
Note: The expected values are the same but the variances are
different.
Standard Deviation and Variance of a RV
Example: The probability model for a particular life insurance policy is
shown. Find the variation and standard deviation of the annual payout.
Outcome Payout X X+10000 10 X PMF. P (X = x) (X +1000) P(X) (10 X) P(X)

Neither 0 10000 0 997 9970 0


1000
Disability 50,000 60000 500000 2 120 1000
1000
Death 100,000 110000 1000000 1 110 1000
1000

𝐸 𝑋 + 10000 = σ[(𝑥𝑖 +1000) 𝑃 𝑋 = 𝑥𝑖 ] = 9970 + 120 + 110 = 10 200 = 𝐸(𝑋) + 10000

𝐸 10 𝑋 = 𝜇 = σ 10 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖 = 0 + 1000 + 1000 = 2000 = 10 x 𝐸(𝑋)


Standard Deviation and Variance of a RV
Example: The probability model for a particular life insurance policy is
shown. Find the variation and standard deviation of the annual payout.
Outcome Payout X X+10000 10 X PMF. P (X = x) (X +1000) P(X) (10 X) P(X)

Neither 0 10000 0 997 9970 0


1000
Disability 50,000 60000 500000 2 120 1000
1000
Death 100,000 110000 1000000 1 110 1000
1000

𝐸 𝑋 + 10000 = σ[(𝑥𝑖 +1000) 𝑃 𝑋 = 𝑥𝑖 ] = 9970 + 120 + 110 = 10 200 = 𝐸(𝑋) + 10000

𝐸 10 𝑋 = 𝜇 = σ 10 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖 = 0 + 1000 + 1000 = 2000 = 10 x 𝐸(𝑋)


Some formulated probability models
The Uniform Distribution

If X is a random variable with possible outcomes 1, 2, …, n and


1
𝑃 𝑋=𝑖 =
𝑛
for each i, then we say X has a discrete Uniform distribution U[1, …, n].

Example:
Tossing a fair die is described by the Uniform model U[1, 2, 3, 4, 5, 6],
with P(X = i) = 1/6.
A Bernoulli Trial and Process
A Bernoulli Trial is a trial with the following characteristics:
1) There are only two possible outcomes (success
and failure) for each trial.
2) The probability of success, denoted p, is the
same for each trial. The probability of failure is q =
1 − p.
3) The trials are independent.
A Bernoulli process is a series of n independent and
identical Bernoulli trials.
The Geometric Distribution
The Geometric Distribution: Predicting the number of Bernoulli trials required to achieve the
first success.

Geometric Probability Model for Bernoulli Trials


p = Probability of success (and q = 1 − p = probability of failure)
X = Number of trials until the first success occurs P(X = x) = qx−1p
1
Expected value:  =
p
q
Standard deviation:  =
p2

The 10% Condition: For finite samples, the sample should be no more than 10% of the population.
Examples
1. A company wants to survey their customers to see if they received a
faulty product and what their feelings about their experiences are. If the
probability of getting a faulty product is .01, what is the probability that
they have to contact 100 customers before finding someone who
received a faulty product? (0.0037)

2. Suppose it’s known that 5% of all widgets on an assembly line are


defective. What is the probability of inspecting 0,1, 2 widgets, etc. before
an inspector comes across a defective widget?
Binomial model
• The Binomial Distribution: Predicting the number of
successes in a series of n Bernoulli trials.

• For a binomial random variable X, the binomial distribution are


the attached probabilities.
𝑛 𝑥 𝑛−𝑥 𝑛!
• 𝑃 𝑋=𝑥 = 𝑝 1−𝑝 = 𝑝𝑥 1 − 𝑝 𝑛−𝑥
𝑥 𝑥! 𝑛−𝑥 !
• For 𝑥 = 0, 1, 2, ⋯ 𝑛
• Note 0! = 1
n n n!
P ( X = x ) =   p x q n − x ,where   =
x  x  x !(n − x )!
The Binomial model
There are two parts of the formula.
𝑛 𝑛!
= is how many sequences with x successes and
𝑥 𝑥! 𝑛−𝑥 !
n−x failures.

𝑝 𝑥 1 − 𝑝 𝑛−𝑥 is the probability of any particular sequence


with x successes and n−x failures.

The mean is 𝐸 𝑋 = 𝜇 = 𝑛𝑝.


The variance is 𝑉𝑎𝑟 𝑋 = 𝜎 2 = 𝑛𝑝(1 − 𝑝).  = npq
()
Example evaluating n For 2 successes in 5 trials,
x
5
 
=
5!
=
(5  4  3  2  1) (5  4)
 2  2!(5 − 2)! (2  1 3  2  1) = (2  1) = 10.
The Binomial: Example (do it in the house)
A firm has estimated 30% of customers react positively to its new
web features.
Suppose five (5) adults are randomly selected.
a. What is the probability that none of the customers react
positively to the firm’s new web features?
b. Calculate the expected number of customers that react
positively to the firm’s new web features?

A customer reacts positively with probability p = 0.30.


Or a customer does not react positively with probability 1 − 𝑝 = 0.70.
5!
a. 𝑃 𝑋 = 0 = 0.300 0.70 5−0
=0.1681.
0! 5−0 !
b. The mean is 𝐸 𝑋 = 𝜇 = 𝑛𝑝 = 5 ∗ 0.30 = 1.5 adults.
Binomial Example (Do it in the house)
People turn to social media to stay in touch with friends and family members, connect with old
friends, catch the news, look for employment, and be entertained. According to a recent survey,
68% of all U.S. adults are Facebook users. Consider a sample of 100 randomly selected American
adults.
a. What is the probability that exactly 70 American adults are Facebook users?
b. What is the probability that no more than 70 American adults are Facebook users?

Solution
Let X denote the number of American adults who are Facebook users.
We also know that p = 0.68 and n = 100.
a. 𝑃 𝑋 = 70 = 0.0791
BINOM.DIST(70, 100, 0.68, FALSE)
dbinom(70, 100, 0.68)
b. P X ≤ 70 = 0.7007
BINOM.DIST(70, 100, 0.68, TRUE)
pbinom(70, 100, 0.68)
Poisson Distributions
• The Poisson Model predicting the number of events that occur over
a given interval of time or space.

• For a Poisson random variable X, the probability of x successes over


a given interval of time or space is
𝑒 −𝜇 𝜇 𝑥
𝑃 𝑋=𝑥 = .
𝑥!
• This is for 𝑥 = 0, 1, 2, ⋯ .
• 𝜇 is the mean number of successes.
• 𝑒 ≈ 2.718 is the base of the natural logarithm.
• The mean is 𝐸 𝑋 = 𝜇.
• The variance is 𝑉𝑎𝑟 𝑋 = 𝜎 2 = 𝐸(𝑋) = 𝜇.
Example
1. A website averages 4 hits per minute. Find the probability that
there will be no hits in the next minute.
e −4 40
P ( X = 0) = = e −4 = 0.0183(recall that e  2.71828).
0!

2. At a call centre the average number of calls received per minute is 7.


Calculate the probability that, in the next minute, there are:
(A) four calls.
(B) eight calls.
(C) zero calls.
(D) at most three calls.
(E) What is the expected number of calls in a minute?
Models for continuous random variables
• A continuous random variable is a random variable that may take on any
value in some interval [a, b].
• The distribution of the probabilities can be shown with a curve.
• The function of the curve is denoted f (x) and is called a probability
density function (pdf). Density functions must be:
• A continuous random variable is completely described by its pdf,
denoted 𝑓(𝑥).
• The graph of 𝑓 𝑥 approximates the relative frequency polygon for the population.
• 𝑓 𝑥 must be non-negative for every possible value.
+∞
• The area under 𝑓 𝑥 over all values of X must equal one ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1.
• The probability the variable assumes a value with an interval, 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) is
defined as the area under 𝑓 𝑥 between points 𝑎 and 𝑏.
Models for continuous random variables
Unlike a discrete random variable, 𝑃 𝑋 = 𝑥 = 0 for any value. There
is zero area under 𝑓 𝑥 at a single point. 𝑥 is a single value and there
is an uncountable number of possible values.
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝑃 𝑎 < 𝑋 < 𝑏 =𝑃 𝑎 < 𝑋 ≤ 𝑏
𝑏
=𝑃 𝑎 ≤ 𝑋 < 𝑏 = ‫𝑓 𝑎׬‬ 𝑥 𝑑𝑥.
A continuous random variable can also be defined in terms of their
cumulative distribution function (cdf),
F x = 𝑃 𝑋≤𝑥 .
Mean and Variance of continuous random variable
• The expected value is also referred to as the mean.

• It is calculated as 𝐸 𝑋 = 𝜇 = ‫׬‬−∞ 𝑥 𝑓 𝑥 𝑑𝑥.
• The variance and standard deviation are both measures of
variability.
• The variance is denoted Var 𝑋 or 𝜎 2.
• The variance is calculated as
∞ 2
2
𝑉𝑎𝑟(𝑋) = 𝜎 = ‫׬‬−∞ 𝑥−𝜇 𝑓 𝑥 𝑑𝑥.
• The standard deviation is denoted by 𝑆𝐷(𝑋) or 𝜎.
The Uniform Distribution
The Uniform Distribution For values c and d (c  d)
both within the interval [a, b]:
 1
 if axb
f (x) = b − a . (d − c )
 0 otherwise P (c  X  d ) =
(b − a )
Expected Value and Variance:
a+b
E( X ) =
2
(b − a )2
Var ( X ) =
12
(b − a )2
SD( X ) =
12
Figure 9.9 The density function of a
continuous uniform random variable on the
interval from a to b.
Example
The amount of time, in minutes, that a person must wait for a bus
is uniformly distributed between zero and 20 minutes.

a. What is the probability that a person waits fewer than 12.5


minutes?

b. On the average, how long must a person wait? Find the


mean, μ, and the standard deviation, σ.

c. Ninety percent of the time, the time a person must wait falls
below what value?
The Normal Distribution
• The normal distribution is one of the most
widely used continuous distributions.
• Bell-shaped and symmetric around its mean
• Gaussian distribution
• One reason for its extensive use is that
closely approximates the probability
distribution for a wide range of random
variables (naturally occurring events).
• Another important function of the normal
distribution is that it serves as the
cornerstone of statistical inference.
The Normal Distribution
1 1 𝑥 −𝜇 2

𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋
𝑥 1 𝑠 −𝜇 2
1 −
𝐹 𝑥 =Φ 𝑥 =න 𝑒 2 𝜎 𝑑𝑠
−∞ 𝜎 2𝜋
𝜎 is the population standard deviation. 𝜇 is the population mean
The Normal Distribution

It can be transformed into the standard Normal model (or the standard Normal distribution). This model is
used with standardized z-scores. The standard normal has mean 0 and standard deviation 1.
The Normal Distribution

To read the normal probability, we use a Table.


The table uses the standard Normal model, so we’ll have to
standardize our value to z-scores before using the table.

𝑿 −𝝁
𝒛=
𝝈
Example: The Normal Distribution
Scores on a management aptitude exam are normally distributed with a mean of
72 and a standard deviation of 8.
a. What is the probability that a randomly selected manager will score above 60?
b. What is the probability that a randomly selected manager will score between
68 and 84?

Let 𝑋 represent scores with 𝜇 = 72 and 𝜎 = 8.


60 −72
a. 𝑃 𝑋 > 60 = 𝑃 𝑍 > 8
= 0.9332
68 −72 84 −72
b. 𝑃 68 ≤ X ≤ 84 = 𝑃 8
≤𝑍≤ 8
= 0.62
The Normal Approximation to the Binomial
The Normal Approximation to the Binomial
A discrete Binomial model is approximately Normal if we expect at least
10 successes and 10 failures:
np  10 and nq  10
Example: Suppose the Canadian Blood Services anticipates the need for at
least 1850 units of O-negative blood this year. It also estimates they will collect
a total of n = 32,000 units this year. The probability of a donor having O-negative
blood is p = 0.06.
mean = np = 32,000  0.06 = 1920
std deviation = npq  42.48

 1850 − 0.5 − 1920 


P ( X  1850) = P  z    P (z  −1.6596)  0.952
 42.48 
The Exponential Distribution
The Exponential Distribution: For modelling the time between events.

f ( x ) = e −  x for x  0 and   0

− s − t
P (s  X  t ) = e −e

− 0 − t − t
P ( X  t ) = P (0  X  t ) = e −e = 1− e
Figure 9.19 The Exponential density
function (with  = 1).

(cumulative distribution function (cdf))


The Exponential Distribution
The Exponential Model: For modeling the time between events.
Example: If a website experiences 4 hits per minute, what is the
probability that we will have to wait less than 20 seconds (1/3 minute)
between two hits?

F (1/ 3) = P (0  X  1/ 3) = 1 − e −4/3 = 0.736.

We can expect to wait 20 seconds or less between hits about 75% of the time.
Ch10 Sampling Distribution
Learning Objectives
1) Understand how variations among multiple samples can be
represented in a sampling distribution
2) Calculate the sampling distribution (mean and variance) of a
proportion
3) Calculate the sampling distribution (mean and variance) of a
mean
Definitions
Sample Statistics are used as estimators of Population Parameters.

Statistical inference is about predicting, test hypotheses, and make


decisions about values of population parameters based on the obtained
statistics.

Generally, in Statistic, statistical inference is done based on sample


statistics derived from limited and incomplete sample information.

Make generalizations about the characteristics of a population. On the


basis of observations of a sample, a part of a population.
Sample Statistics as Estimators of Population Parameters
• A sample statistic is a numerical measure of a summary
characteristic of a sample.
• A population parameter is a numerical measure of a summary
characteristic of a population.
• An estimator of a population parameter is a sample statistic
used to estimate or predict the population parameter.
• An estimate of a parameter is a particular numerical value of a
sample statistic obtained through sampling.
• A point estimate is a single value use to estimate of a
population parameter.
Examples of estimators
•The sample mean, xത , is the most common estimator of the
population mean, 𝜇

•The sample variance, 𝑠 2 , is the most common estimator of the


population variance, 𝜎 2 .

•The sample standard deviation, s, is the most common estimator


of the population standard deviation, 𝜎.

•The sample proportion, 𝑝,ො is the most common estimator of the


population proportion, p.
Sampling Distributions

• In many applications, we are interested in the


characteristics of a population. (Population parameters)
• It is difficult if not impossible to analyze the entire
population.
• We make inferences about the characteristics of the
population based on a random sample.
• There is only one population, but many possible samples of
a given size.
Sampling Distributions of the Mean
• A population parameter is a constant, its value may be
unknown.
• For example, the population mean 𝜇 is a parameter.
• A statistic is a variable whose value depends on the sample.
• For example, the sample mean 𝑋ത is a statistic.
• The value of 𝑋ത will change if you choose a different random
sample.
• An estimator is a statistic used to estimate a parameter.
• And estimate is a particular value of an estimator.
A Population Distribution, a Sample from a Population, and the
Population and Sample Mean

RV PMF Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8


1 0.13 3 4 6 3 4 7 3 5
2 0.13 5 7 3 5 6 3 3 3
3 0.13 5 5 4 4 2 4 4 3
4 0.13 5 4 5 3 5 4 6 5
5 0.13 4 4 3 4 4 6 2 4
6 0.13 6 3 4 6 5 2 5 6
7 0.13 2 5 6 4 6 5 5 5
8 0.13 5 3 4 6 3 4 7 4
4.50 4.51 4.17 4.49 4.83 4.51 4.29 5.03 4.66
σ𝑥
Population mean = σ𝑥 𝑝 𝑥 sample mean = = σ𝑥 𝑝 𝑥
𝑛
There is a variation on the statistic obtain from one sample to another. This variation is what we call the
sampling variability.
Sampling Distributions of the Mean
• 𝑋ത is a variable (statistic) that depends on the resulting sample.
• The sampling distribution of the sample mean 𝑋ത is the probability
distribution derived from all the means that come from all
possible samples of a given size.
• Consider a sample mean derived from n observations
• Another sample mean can be derived from a different sample of n
observations
• Repeat the process a large number of times, the frequency distribution
of the sample means is the sampling distribution
Sampling Distributions of the Mean
• Let 𝑋 represent a certain characteristic of a population.
• Population mean E X = 𝜇, variance Var X = 𝜎 2
• Let the sample mean 𝑋ത be based on a random sample of n observations
• The expected value of 𝑋ത is the same as the expected value of
𝑋.
• E 𝑋ത = E X = 𝜇
• The average of the sample means is the average of the population
• Unbiased: expected value of an estimator equals the population parameter
𝜎2
• The variance of 𝑋ത is 𝑉𝑎𝑟 𝑋ത = 𝑛
.
• 𝑉𝑎𝑟 𝑋ത is less than Var X = 𝜎 2
• Each sample will contain both high and low values that cancel one another
𝜎
ത ത
• The standard deviation of 𝑋 is the standard error: se 𝑋 =
𝑛
Example
The chefs at a local pizza chain in Cambria, California, strive to maintain the
suggested size of their 16-inch pizzas. Despite their best efforts, they are unable
to make every pizza exactly 16 inches in diameter.
The manager has determined that the size of the pizzas is normally distributed
with a mean of 16 inches and a standard deviation of 0.8 inch.
a. What are the expected value and the standard error of the sample mean
derived from a random sample of 2 pizzas?
b. What are the expected value and the standard error of the sample mean
derived from a random sample of 4 pizzas?
Sampling Distributions of the Mean: Example
The population mean is 𝜇 = 16 and the standard deviation is 𝜎 = 0.8
0.8
ത ത
a. With the sample size 𝑛 = 2, 𝐸 𝑋 = 16 and 𝑠𝑒 𝑋 = = 0.57
2
0.8
b. With the sample size 𝑛 = 4, 𝐸 𝑋ത = 16 and 𝑠𝑒 𝑋ത = = 0.40
4
• The expected values are the same. The standard error is lower
when 𝑛 = 4 than when 𝑛 = 2
Sampling Distributions of the mean: Normal model
Let 𝑋 be normally distributed with expected value 𝜇 and standard
deviation 𝜎. For any sample size 𝑛, 𝑋ത is also normally distributed with
expected value 𝜇 and standard error 𝜎Τ 𝑛.
Example: the size of pizzas is normally distributed with a mean of 16
inches and a standard deviation of 0.8 inch.
a. What is the probability that a randomly selected pizza is less than
15.5 inches?
b. What is the probability that the average of two randomly selected
pizzas is less than 15.5 inches?
Sampling Distributions of the mean:
Normal
The population mean is 𝜇 = 16 and the standard
deviation is 𝜎 = 0.8
15.5−16
a. 𝑃 𝑋 < 15.5 = 𝑃 𝑍 < = 𝑃 𝑍 < −0.63 =
0.8
0.2653
b. With the sample size 𝑛 = 2, 𝐸 𝑋ത = 16, and 𝑠𝑒 𝑋ത =
0.8 15.5−16

= 0.57. 𝑃 𝑋 < 15.5 = 𝑃 𝑍 < 0.8 =
2
2
𝑃 𝑍 < −0.88 = 0.1894
Sampling Distributions of the proportion

• In many business applications, we are concerned with the


population proportion 𝑝.
• Recall that the Bernoulli, the Geometric and binomial
distribution describes the number of successes 𝑋 in 𝑛 trials
of a Bernoulli process where 𝑝 is the probability of success.
• The distribution of proportions over many independent
samples from the same population is called the sampling
distribution of the proportions
Sampling Distributions
of the proportion
𝑋

• The relevant statistic (estimator) is the sample proportion 𝑃 = .
𝑛
• 𝐸 𝑃ത = 𝑝; so the sample proportion is unbiased
𝑝(1−𝑝)
𝑠𝑒 𝑃ത =
𝑛
• By the CLT, the sampling distribution of 𝑃ത is approximately normal
when 𝑛𝑝 ≥ 5 and 𝑛(1 − 𝑝) ≥ 5.
Sampling Distributions of the proportion

Example: A study found that 55% of British firms experienced a


cyber-attack in the past year.
a. What are the expected value and the standard error of the
sample proportion derived from a random sample of 100
firms?
b. In a random sample of 100 firms, what is the probability that
the sample proportion is greater than 0.57?
Sampling Distributions

• Example continued
• 𝑝 = 0.55 and 𝑛 = 100, 𝑛𝑝 = 55 𝑎𝑛𝑑 𝑛 1 − 𝑝 = 45
• 𝐸 𝑃ത = 0.55
𝑝 (1 −𝑝) 0.55(1−0.55)
𝑠𝑒 𝑃ത = = = 0.0497
𝑛 100
0.57−0.55

• 𝑃 𝑃 ≥ 0.57 = 𝑃 𝑍 ≥
0.0497

= 𝑃 𝑍 ≥ 0.40 = 1 − 0.6554 = 0.3446


The Central Limit Theorem—
The Fundamental Theorem of Statistics
The Central Limit Theorem
Central Limit Theorem (CLT): The sampling distribution of any mean
becomes Normal as the sample size grows
This is true regardless of the shape of the population distribution!
However, if the population distribution is very skewed, it may take a
sample size of dozens or even hundreds of observations for the
Normal model to work well
Sampling Distributions of the mean
For making statistical inferences, it is essential that the sampling
distribution of 𝑋ത is normally distributed.
• What if the underlying population is not normally distributed?
The central limit theorem (CLT) states that the sum or the average of a
large number of independent observations from the same underlying
distribution has an approximate normal distribution.
• The approximation steadily improves as the number of observations
increases.
• Practitioners often use the normal distribution approximation when 𝑛 ≥
30.
𝜎

• Note E 𝑋 = E X = 𝜇 and se 𝑋 = ത
𝑛
Example
The mean weight of boxes shipped by a company is 12 kg, with a standard
deviation of 4 kg. Boxes are shipped in pallets of 10 boxes. The shipper has a
limit of 150 kg for such shipments. What’s the probability that a palette will
exceed that limit?
Asking the probability that the total weight of a sample of 10 boxes exceeds 150
kg is the same as asking the probability that the mean weight exceeds 15 kg.
Under these conditions, the CLT says that the sampling distribution of y̅ has a Normal model with
mean 12 and standard deviation

 4 y − 15 − 12
SD ( y ) = = = 1.26 and z = = = 2.38.
n 10 SD ( y ) 1.26

P ( y  150 ) = P ( z  2.38 ) = 0.0087

So the chance that the shipper will reject a palette is only .0087—less
than 1%. That’s probably good enough for the company.
Standard Error
When we don’t know σ, we estimate it with the standard deviation
of the one real sample. That gives us the standard error,
(
SE y )= s
n
More about t distributiton
Gosset’s t
William S. Gosset discovered that when he used the standard error the shape
of the curve was no longer Normal.

s / n,

He called the new model the Student’s t, which is a model that is


always bell-shaped, but the details change with the sample sizes.
The Student’s t-models form a family of related distributions
depending on a parameter known as degrees of freedom.
Sampling Distributions of the mean: Student’s t distribution
If the population standard deviation, 𝜎, is unknown, replace
it with the sample standard deviation, s.
𝑋ത − 𝜇
Then 𝑡 = has a Student t distribution with (n-1)
𝑠/√𝑛
degrees of freedom.
•The t is a family of bell-shaped and symmetric distributions, one for each number
of degrees of freedom.
•The expected value of t is 0.
•The t distribution approaches a standard normal as the number of degrees of
freedom increases.
Standard and t distributions
If 𝑋ത is (approximately) normally distributed, we transform/standardize
values with z-score such that

𝑋−𝜇
𝑍= 𝜎 is standard normal distributed.
𝑛

• The problem is that 𝜎 is usually unknown. In that case, the sample


standard deviation or the standard error, s, is used in place of 𝜎.

𝑋−𝜇
• Using s in place of 𝜎 gives another statistic, T = Τ
𝑠 𝑛
.
• T follows the Student’s t distribution with degree of freedom n-1 or
the 𝑡𝑑𝑓=𝑛−1 distribution. But when n is large then according to the
CLT the Student’s t distribution is normal.
Standard and t distributions
• The t distribution is a family of distributions similar to z.
• Bell-shaped and symmetric around zero
• They have slightly broader tails than the z distribution
• Identified by the degrees of freedom, 𝑑𝑓, that determine the broadness
of the tails
• Let 𝑡𝛼,𝑑𝑓 denote a value so that the upper tail area equals 𝛼 for a
given 𝑑𝑓, 𝑃 𝑇𝑑𝑓 ≥ 𝑡𝛼,𝑑𝑓 = 𝛼.
• Use table or software to get these value.
• Excell: T.INV
• R: qt

You might also like