ERM 4b Final
ERM 4b Final
n!
P X p 1 p
X n X
X ! n X !
P X : probability of X successes given n and p
X : number of "successes" in sample X 0,1, , n
p : the probability of each "success"
n : sample size Tails in 2 Tosses of Coin
X P(X)
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
Poisson Distribution
Siméon Poisson
Poisson Distribution
Discrete events (“successes”) occurring in a given
area of opportunity (“interval”)
“Interval” can be time, length, surface area, etc.
The probability of a “success” in a given “interval” is
the same for all the “intervals”
The number of “successes” in one “interval” is
independent of the number of “successes” in other
“intervals”
The probability of two or more “successes” occurring
in an “interval” approaches zero as the “interval”
becomes smaller
E.g., # customers arriving in 15 minutes
E.g., # defects per case of light bulbs
The Normal Distribution
e
2s
f X : density of random variable X
3.14159; e 2.71828
: population mean
s : population standard deviation
X : value of random variable X
Many Normal Distributions
There are an Infinite Number of Normal Distributions
f(Z) f(X) s
sZ 1
X
Z 0 Z
Finding Probabilities
Probability is
the area under
the curve! P c X d ?
f(X)
X
c d
Standardizing Example
X 6.2 5
Z 0.12
s 10
Normal Distribution Standardized
Normal Distribution
s 10 sZ 1
6.2 X 0.12 Z
5 Z 0
Example:
P 2.9 X 7.1 .1664
X 2.9 5 X 7.1 5
Z .21 Z .21
s 10 s 10
Normal Distribution Standardized
Normal Distribution
s 10 .0832 sZ 1
.0832
X 0.30
5 ? Z 0
Z
X Zs 5 .3010 8
More Examples of Normal
Distribution Using PHStat
A set of final exam grades was found to be normally
distributed with a mean of 73 and a standard deviation of 8.
What is the probability of getting a grade no higher than 91
on this exam?
X N 73,8 2
P X 91 ? s 8
Mean 73
Standard Deviation 8
Rectangular U-Shaped
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2
Unbiasedness ( X )
f X
Unbiased Biased
X X
Effect of Large Sample
For sampling with replacement:
As n increases, s X decreases
f X Larger
sample size
Smaller
sample size
X
When the Population is Normal
Population Distribution
Central Tendency s 10
X
Variation 50
s Sampling Distributions
sX
n n4 n 16
sX 5 s X 2.5
X 50 X
When the Population is
Not Normal
Population Distribution
Central Tendency s 10
X
Variation 50
s Sampling Distributions
sX
n n4 n 30
sX 5 s X 1.8
X 50 X
Level of Significance and the
Rejection Region
Rejection 0
Regions a
H0: 3.5
H1: > 3.5
0
a/2
H0: 3.5
H1: 3.5
0
One-Tail Z Test for Mean
( s Known)
Assumptions
Population is normally distributed
If not normal, requires large samples
Null hypothesis has or sign only
s is known
Z Test Statistic
X X X
Z
sX s/ n
Rejection Region
H0: 0 H0: 0
H1: < 0 H1: > 0
Reject H0 Reject H0
a a
0 Z 0 Z
Z must be significantly Small values of Z don’t
below 0 to reject H0 contradict H0 ; don’t
reject H0 !
Reject and Do Not Reject
Regions
H 0 : 368
Reject
.05
Do Not Reject
X
X X 368 372.5
0 1.645 Z
1.5
0
H1 : 368
Connection to Confidence
Intervals
For X 372.5, s 15 and n 25,
the 95% confidence interval is:
372.5 1.96 15 / 25 372.5 1.96 15 / 25
or
366.62 378.38
We are 95% confident that the population mean is
between 366.62 and 378.38.
If this interval contains the hypothesized mean (368),
we do not reject the null hypothesis.
It does. Do not reject.
t Test: s Unknown
Assumption
Population is normally distributed
If not normal, requires a large sample
s is unknown
t Test Statistic with n-1 Degrees of Freedom
X
t
S/ n
Example Solution: One-Tail
H0: 368 Test Statistic:
H1: 368
X 372.5 368
a = 0.01 t 1.80
S 15
n = 36, df = 35 n 36
Critical Value: 2.4377
Reject Decision:
Do Not Reject at a = .01.
.01
Conclusion:
Insufficient Evidence that
0 2.437 t35
7
1.80
True Mean is More Than 368.
p -Value Solution
(p-Value is between .025 and .05) (a = 0.01)
Do Not Reject.
Reject
a = 0.01
0 t35
1.80 2.4377
Test Statistic 1.80 is in the Do Not Reject Region
Potential Pitfalls and Ethical Issues
Data Collection Method is Not Randomized to Reduce
Selection Biases
Treatment of Human Subjects are Manipulated Without
Informed Consent
Data Snooping is Used to Choose between One-Tail
and Two-Tail Tests, and to Determine the Level of
Significance
Potential Pitfalls and Ethical Issues (continued)
Assumptions
Samples are randomly and independently drawn
This condition must be met
40% of the time you will reject the null hypothesis of equal means
1 2 3
One-Way ANOVA
(Treatment Effect Present)
H 0 : 1 2 c
H1 : Not all i are the same The Null
Hypothesis is
NOT True
1 2 3 1 2 3
One-Way ANOVA
(Partition of Total Variation)
Total Variation SST
SST ( X ij X ) 2
j 1 i 1
X
j 1 i 1
ij
X X
2 2 2
SST X 11 X 21 X nc c X
Response, X
MSW
MSA is mean squares among
MSW is mean squares within
Degrees of Freedom
df1 c 1
df 2 n c
One-Way ANOVA
Summary Table
Degrees Mean
Source of Sum of F
of Squares
Variation Squares Statistic
Freedom (Variance)
Among MSA =
c–1 SSA MSA/MSW
(Factor) SSA/(c – 1 )
Within MSW =
n–c SSW
(Error) SSW/(n – c )
SST =
Total n–1
SSA + SSW
Features of One-Way ANOVA F Statistic
The F Statistic is the Ratio of the Among Estimate
of Variance and the Within Estimate of Variance
The ratio must always be positive
df1 = c -1 will typically be small
df2 = n - c will typically be large
The Ratio Should be Close to 1 if the Null is True
If the Null Hypothesis is False
The numerator should be greater than the denominator
The ratio should be larger than 1
One-Way ANOVA F Test
Example
As Production Manager, you
want to see if 3 filling machines Machine1 Machine2 Machine3
have different mean filling 25.40 23.40 20.00
times. 26.31 21.80 22.20
You assign 15 similarly trained 24.10 23.50 19.75
& experienced workers, 5 per 23.74 22.75 20.60
machine, to the machines. 25.10 21.60 20.40
At the .05 significance level, is
there a difference in mean
filling times?
One-Way ANOVA
Example: Scatter Diagram
Y Y
X X
SR SR
X X
Heteroscedasticity Homoscedasticity
One-Way ANOVA Example
Computations
Machine1 Machine2 Machine3 X 1 24.93 nj 5
25.40 23.40 20.00
26.31 21.80 22.20 X 2 22.61 c3
24.10 23.50 19.75 X 3 20.59 n 15
23.74 22.75 20.60
25.10 21.60 20.40 X 22.71
Critical Value(s):
Decision:
Reject at a = 0.05.
a = 0.05 Conclusion:
There is evidence that at
least one i differs from
0 3.89 F the rest.
Two-Way ANOVA
Assumptions
Normality
Populations are normally distributed
Homogeneity of Variance
Populations have equal variances
Independence of Errors
Independent random samples are drawn
Two-Way ANOVA
Total Variation Partitioning
SSB +
Variation Due to
d.f.= c-1
Total Variation Factor B
Yi X i i
Dependent
Population Independent
(Response)
Variable Regression Y |X (Explanatory)
Line Variable
(Conditional Mean)
Simple Linear Regression Model
(continued)
Y (Observed Value of Y) = Yi X i i
i = Random Error
Y | X X i
(Conditional Mean)
X
Observed Value of Y
Linear Regression Equation
Sample regression line provides an estimate of
the population regression line as well as a
predicted value of Y
Sample
Sample
Slope
Y Intercept
Yi b0 b1 X i ei
Coefficient
Residual
n n
ei2
2
Yi Yˆi
i 1 i 1
b0 provides an estimate of
b1 provides an estimate of
Linear Regression Equation
(continued)
Yi b0 b1 X i ei Yi X i i
Y b1
i
ei
Y | X X i
Yˆi b0 b1 X i
b0 X
Observed Value
Interpretation of the Slope
and Intercept
change in E Y | X
1 measures the
change in X
change in the average value of Y as a result
of a one-unit change in X
Interpretation of the Slope
and Intercept (continued)
change in Eˆ Y | X
b1 is the estimated
change in X
change in the average value of Y as a result
of a one-unit change in X
Simple Linear Regression:
Example
You wish to examine Annual
Store Square Sales
the linear dependency Feet ($1000)
of the annual sales of 1 1,726 3,681
produce stores on their 2 1,542 3,395
sizes in square footage. 3 2,816 6,653
Sample data for 7 4 5,555 9,543
stores were obtained. 5 1,292 3,318
Find the equation of 6 2,208 5,563
the straight line that 7 1,313 3,760
fits the data best.
Scatter Diagram: Example
12000
Annua l Sa le s ($000)
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Excel Output
Simple Linear Regression
Equation: Example
Yˆi b0 b1 X i
1636.415 1.487 X i
Purposes
Examine linearity
Evaluate violations of assumptions
Graphical Analysis of Residuals
Plot residuals vs. X and time
Residual Analysis for Linearity
Y Y
X
e X
e
X
X
Not Linear
Linear
Residual Analysis for
Homoscedasticity
Y Y
X X
SR SR
X X
Heteroscedasticity Homoscedasticity
Residual Analysis: Excel Output
for Produce Stores Example
Observation Predicted Y Residuals
1 4202.344417 -521.3444173
2 3928.803824 -533.8038245
3 5822.775103 830.2248971
Excel Output 4 9894.664688 -351.6646882
5 3557.14541 -239.1454103
6 4918.90184 644.0981603
7 3588.364717 171.6352829
Residual Plot
Square Feet
Residual Analysis for
Independence
The Durbin-Watson Statistic
Used when data is collected over time to detect
autocorrelation (residuals in one time period are
related to residuals in another period)
Measures violation of independence assumption
n
i i1
( e e ) 2
Should be close to 2.
D i 2
n If not, examine the model
e
i 1
2
i for autocorrelation.
Sample Observations from
Various r Values
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
r = .6 r=1
Features of rand r
Unit Free
Range between -1 and 1
The Closer to -1, the Stronger the Negative
Linear Relationship
The Closer to 1, the Stronger the Positive
Linear Relationship
The Closer to 0, the Weaker the Linear
Relationship
t Test for Correlation
Hypotheses
H0: r = 0 (no correlation)
H1: r 0 (correlation)
Test Statistic
rr
t where
r 2
n2
n
X i X Yi Y
r r2 i 1
n n
X X Y Y
2 2
i i
i 1 i 1
Example: Produce Stores
From Excel Printout r
Is there any R e g r e ssi o n S ta ti sti c s
M u lt ip le R 0 .9 7 0 5 5 7 2
evidence of linear R S q u a re 0 .9 4 1 9 8 1 2 9
relationship between A d ju s t e d R S q u a re 0 . 9 3 0 3 7 7 5 4
annual sales of a S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7
store and its square O b s e rva t io n s 7
footage at .05 level
of significance? H0: r = 0 (no association)
H1: r 0 (association)
a .05
df 7 - 2 = 5
Example: Produce Stores
Solution
Decision:
rr .9706 Reject H0.
t 9.0099
r 2
1 .9420
Conclusion:
n2 5
There is evidence of a
Critical Value(s): linear relationship at 5%
level of significance.
Reject Reject
The value of the t statistic is
.025 .025 exactly the same as the t
statistic value for test on the
-2.5706 0 2.5706 slope coefficient.
Estimation of Mean Values
Confidence Interval Estimate for Y | X X i
:
The Mean of Y Given a Particular Xi
Size of interval varies according
Standard error to distance away from mean, X
of the estimate
1 (Xi X ) 2
Yˆi tn 2 SYX n
(Xi X )
n 2
t value from table
i 1
with df=n-2
Prediction of Individual Values
Prediction Interval for Individual Response
Yi at a Particular Xi
1 (Xi X ) 2
Yˆi tn 2 SYX 1 n
(Xi X )
n 2
i 1
Interval Estimates for Different
Values of X
Confidence
Prediction Interval Interval for the
for a Individual Yi Mean of Y
Y
X
X a given X
Example: Produce Stores
Data for 7 Stores:
Annual
Store Square Sales Consider a store
Feet ($000) with 2000 square
1 1,726 3,681 feet.
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543 Regression Model Obtained:
5 1,292 3,318
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
Estimation of Mean Values:
Example
Confidence Interval Estimate for Y | X X i
1 ( X i X )2
Yˆi tn 2 SYX n 4610.45 612.66
(Xi X )
n 2
i 1
3997.02 Y |X X i 5222.34
What is a Time Series?
Trend Cyclical
Time-Series
Seasonal Random
Trend Component
Sales
Time
Cyclical Component
Sales
Seasonal Component
Summer
Winter
Spring Fall
Time (Monthly or Quarterly)
Random or Irregular Component
25
20
15
Sales
10
0
0 5 10 15 20 25 30 35
Time
Example: Quarterly Retail Sales with
Seasonal Components Removed
25
20
15
Sales
Y(t)
10
0
0 5 10 15 20 25 30 35
Tim e
Multiplicative Time-Series Model
Yi Ti Si Ci I i Ii = Irregular
Si = Seasonal
Moving Averages
Sales
8
Data
6
2 Smoothed
0
94 95 96 97 98 99 Year
Linear Trend Model
Use the method of least squares to obtain the
linear trend forecasting equation:
4
3 Projected to
2 year 2001
1
0
0 1 2 3 4 5 6
X
The Quadratic Trend Model
Use the method of least squares to obtain
the quadratic trend forecasting equation:
Year Coded X Sales (Y)
95 0 2
96
97
1
2
5
2
ˆ
Yi b0 b1 X i b2 X i
2
98 3 2
99 4 7
00 5 6
The Quadratic Trend Model
(continued)
ˆ
Yi b0 b1 X i b2 X i 2.857 .33X i .214 X i
2 2
Excel Output 8
Coefficients 7
In te rce p t 2.85714286 6