Chapter 5
Chapter 5
(MA 324)
Class Notes
January – May, 2021
Instructor
Ayon Ganguly
Department of Mathematics
IIT Guwahati
Contents
1 Review 3
1.1 Transformation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Technique 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Technique 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Technique 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Some Results on Independent and Identically Distributed Normal RVs . . . . 18
1.4 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Point Estimation 31
2.1 Introduction to Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Parametric Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Minimal Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Ancillary Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Complete Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.9 Families of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.1 Location Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.2 Scale Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.9.3 Location-Scale Family . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9.4 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.10 Basu’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.11 Method of Finding Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.11.1 Method of Moment Estimator . . . . . . . . . . . . . . . . . . . . . . 52
2.11.2 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . 53
2.12 Criteria to Compare Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.12.1 Unbiasedness, Variance, and Mean Squared Error . . . . . . . . . . . 58
2.12.2 Best Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.12.3 Rao-Blackwell Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.12.4 Uniformly Minimum Variance Unbiased Estimator . . . . . . . . . . . 64
2.12.5 Large Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Tests of Hypotheses 71
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Errors and Errors Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 73
1
3.3 Best Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Simple Null Vs. Simple Alternative . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 One-sided Composite Alternative . . . . . . . . . . . . . . . . . . . . . . . . 80
3.5.1 UMP Test via Neyman-Pearson Lemma . . . . . . . . . . . . . . . . 80
3.5.2 UMP Test via Monotone Likelihood Ratio Property . . . . . . . . . . 81
3.6 Simple Null Vs. Two-sided Alternative . . . . . . . . . . . . . . . . . . . . . 82
3.7 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.8 p-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Interval Estimation 90
4.1 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1.1 Interpretation of Confidence Interval . . . . . . . . . . . . . . . . . . 91
4.2 Method of Finding CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1 One-sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.2 Two-sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3 Asymptotic CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 Distribution Free Population Mean . . . . . . . . . . . . . . . . . . . 95
4.3.2 Using MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Regression Analysis 97
5.1 Regression and Model Building . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Least Squares Estimation of the Parameters . . . . . . . . . . . . . . 100
5.2.2 Properties of Least Squares Estimators . . . . . . . . . . . . . . . . . 101
5.2.3 Estimation of Error Variance . . . . . . . . . . . . . . . . . . . . . . 103
5.2.4 Hypothesis Testing on the Slope and Intercept . . . . . . . . . . . . . 103
5.2.5 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.6 Prediction of New Observation . . . . . . . . . . . . . . . . . . . . . . 106
5.2.7 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . 106
2
Chapter 5
Regression Analysis
Most of the contains of the chapter is taken from Montgomery, Peck, and Vining, Introduc-
tion to Linear Regression Analysis, Wiley, 2003
97
Example 5.1 (The Rocket Propellant Data). A rocket motor is manufactured by bonding
an igniter propellant and a sustainer propellant together inside a metal housing. The shear
strength of the bond between the two types of propellant is an important quality character-
istic. It has been suspected that the shear strength depends on the age of the batch of the
sustainer propellant. Therefore, 20 observations on shear strength and age in weeks of the
corresponding batch of sustainer propellant are made and given in the following table.
Let us denote the shear strengths by yi ’s and ages by xi ’s. In any regression analysis
plotting of data is very important. We will come back to this with an example. Note that in
this case sample correlation coefficient is −0.948. The scatter plot of shear strength versus
propellant age is provided in the Figure 5.1. This figure suggests that there is a relationship
among shear strength and propellant age. The impression is that the data points generally,
but not exactly, fall along a straight line with negative slope.
98
Figure 5.1: Scatter diagram of shear strength versus propellant age
Denoting shear strength by y and propellant age by x, the equation of a straight line
relating these two variables may be presented by
y = β0 + β1 x,
where β0 is intercept and β1 is the slope. Note that data points do not fall exactly on the
straight line. Therefore, we should modify the equation so that it can take this into account.
Thus a more plausible model for shear straight is
y = β0 + β1 x + ε, (5.1)
where ε = y − (β0 + β1 x) is the difference between the observed value y and the straight line
(β0 + β1 x). Thus, ε is called an error. ||
Definition 5.1 (Simple Linear Regression). Equation (5.1) is called a linear regression
model. Also, as the (5.1) includes only one predictor, it is called simple linear regression.
1. The regressor x is controlled (thus is not a RV) by the analyst and measured with
negligible error.
99
2. The random errors are assumed to have mean zero and variance σ 2 . Note that on an
average we do not want to commit any error, and hence, the mean zero is a meaningful
assumption.
E (y) = E (β0 + β1 x + ε) = β0 + β1 x,
and
V ar (y) = V ar (ε) = σ 2 ,
respectively. Thus, mean of y is a linear function of x and variance of y does not depend on
x. Moreover, as the errors are assumed to be uncorrelated, responses are uncorrelated.
The parameters β0 and β1 are called regression coefficients and they have useful practical
interpretation in many cases. For example, the slope β1 is the amount by which the mean
of the response variable changes with a unit change in regressor variable. If the range on x
includes zero, then the intercept β0 is the mean of y when x = 0. Of course, β0 does not
have any practical interpretation when the range of x does not include zero.
Then, βb0 and βb1 are least squares estimators of β0 and β1 , respectively, if
S β0 , β1 = min S (β0 , β1 ) .
b b
β0 , β 1
and
n n n n
∂S X X X
2
X
= −2 xi yi − β0 − β1 xi = 0 =⇒ β0
b b b xi + β1
b xi = xi yi . (5.3)
∂β1 βb0 , βb1 i=1 i=1 i=1 i=1
100
Equations (5.2) and (5.3) are called the least squares normal equations (or simply normal
equations). The solutions to the normal equations are
Sxy
βb0 = y − βb1 x and βb1 = , (5.4)
Sxx
2
where Sxx = ni=1 P (xi − x)2 = ni=1 x2i − n1 ( ni=1 xi ) and Sxy = ni=1 (xi − x) (yi − y) =
P P P P
P n n
i=1 (xi − x) yi = i=1 xi yi − nx y. The difference between the observed value yi and its’
fitted value ybi = β0 + βb1 xi is called residual. Thus, the ith residual is
b
Example 5.2 (The Rocket Propellant Data). It seems reasonable to fit a linear regression
form the Figure 5.1. Therefore, we want to fit the model
y = β0 + β1 x + ε.
It can be easily seen that Sxx = 1106.56 and Sxy = −41112.65. Thus, using (5.1), βb1 =
−37.15 and βb0 = 2627.82. The Table 5.2 provides the fitted values ybi and residuals ei . ||
Theorem 5.2. βb0 and βb1 are UE of the parameters β0 and β1 , respectively.
Proof:
n
! n ∞ n n
X X X X X
E βb1 = E ci y i = ci E(yi ) = ci (β0 + β1 xi ) = β0 ci + β 1 ci x i = β 1 ,
i=1 i=1 i=1 i=1 i=1
Pn Pn
as i=1 ci = 0 and i=1 ci xi = 1. Also,
1X n
E β0 = E y − β1 x = E (y) − xE β1 =
b b b (β0 + β1 xi ) − β1 x = β0 .
n i=1
101
1 x2 σ2
Theorem 5.3. The variance of βb0 and βb1 are σ 2 n
+ Sxx
and Sxx
, respectively.
Proof:
n
! n n
X X X (xi − x)2 σ2
V ar β1 = V ar
b ci y i = c2i V ar(yi ) = σ 2
2
= .
i=1 i=1 i=1
Sxx Sxx
The second equality above is true as yi are uncorrelated. The third equality holds true as
V ar(yi ) = σ 2 for all i = 1, 2, . . . , n.
2
V ar β0 = V ar y − β1 x = V ar (y) + x V ar β1 − 2xCov y, β1 .
b b b b
Now,
σ2
V ar (y) =
n
and
n n
! n n
1X X X ci σ2 X
Cov y, βb1 = Cov yi , ci y i = V ar(yi ) = ci = 0.
n i=1 i=1 i=1
n n i=1
Therefore,
2
2 1 x
V ar βb0 = σ + .
n Sxx
102
5.2.3 Estimation of Error Variance
In the previous couple of subsections, we have discussed estimation of two regression param-
eters. For many purpose, it is important to estimate the error variance σ 2 , which can be
estimated unbiasedly as follows. The estimator of σ 2 can be obtained from the residual or
error sum of square
n
X n
X
SSRes = e2i = (yi − ybi )2 .
i=1 i=1
E (SSRes ) = (n − 2) σ 2 .
103
Suppose that we want to test if the slope equals to a constant, say β10 . Therefore,
appropriate hypotheses are H0 : β1 = β10 against H1 : β1 6= β10 . Based on the assumption
on errors, we can see that yi ∼ N (β0 + β1 xi , σ 2 ) and yi ’s are independent. Therefore, βb1 ,
being a linear combination of yi ’s, follows a normal distribution with mean β1 and variance
σ2
Sxx
. Thus, the statistic
βb1 − β10
Z= q
σ2
Sxx
βb1 − β10
t= q ∼ tn−2
M SRes
Sxx
under the null hypothesis H0 : β1 = β10 . The null hypothesis is rejected at the level α if
|t| > tn−2, α2 .
Similarly, we may want to test H0 : β0 = β00 against H1 : β0 6= β00 . We can use the test
statistic
βb0 − β00
t= r .
x2
M SRes n1 + Sxx
Under the null hypothesis H0 : β0 = β00 , t follows a t-distribution with degrees of freedom
n − 2. Therefore, the null hypothesis may be rejected at level α if |t| > tn−2, α2 .
Example 5.4 (The Rocket Propellant Data). We will test for the significance of the re-
gression in the rocket propellant data. That means we want to test H0 : β1 = 0 against
H1 : β1 6= 0. The observed value of the test statistic is
−37.15
t= p = −12.85.
9244.59/1106.56
If we choose α = 0.05, t18, 0.025 = 2.101. Thus, the null hypothesis H0 : β1 = 0 is rejected
and we conclude that there is a linear relationship between shear strength and the age of
the propellant. ||
βb0 − β0 βb1 − β1
and ,
se(βb0 ) se(βb1 )
104
r q
x2 M SRes
where se(βb0 ) = M SRes n1 + Sxx
and se(βb1 ) = Sxx
. Note that both the pivots follow
tn−2 distribution. Therefore, a 100(1 − α)% CI for β0 is
" s s #
2 2
1 x 1 x
βb0 − tn−2, α2 M SRes + , βb0 + tn−2, α2 M SRes + ,
n Sxx n Sxx
Now, we will discuss CI for mean response for a particular value of regressor. In case of
forecasting it is meaningful. For example, we may want to know the estimate of mean shear
strength of a propellant that is 10 weeks old. In general, let x0 be the level of the regressor
variable for which we wish to estimate the mean response yb0 = βb0 + βb1 x0 = µy|x0 , say. Note
that µy|x0 follows a normal distribution as it is a linear combination of yi ’s. The mean of
µy|x0 is E (y|x0 ) = β0 + β1 x0 and variance of µy|x0 is
!
2
1 (x 0 − x)
V ar(µy|x0 ) = V ar βb0 + βb1 x0 = V ar y + βb1 (x0 − x) = σ 2 + .
n Sxx
Here, the second can be found by replacing βb0 by y − βb1 x. The third equality holds true as
Cov(y, βb1 ) = 0. Thus, the distribution of
µy|x0 − E (y|x0 )
r
2
M SRes n1 + (x0S−x)
xx
Example 5.5 (The Rocket Propellant Data). We will construct 95% CI on β1 . The
standard error βb1 is se(βb1 ) = 2.89 and t18,0.025 = 2.101. Therefore, a 95% CI for β1
is [−43.22, −31.08]. Similarly, a 95% CI for mean response at x = 13.3625 becomes
[2086.230, 2176.571]. ||
105
5.2.6 Prediction of New Observation
An important application of the regression model is prediction of new observations y corre-
sponding to a specified level of the regressor variable x. If x0 is the value of the regressor
variable of interest, then
Example 5.6 (The Rocket Propellant Data). We will find a 95% prediction interval on a
future value of propellant shear strength in a motor made from a batch of sustainer propellant
that is 10 weeks old. Using the previous formula, the prediction interval becomes [2048.32,
2464.32]. ||
using normal equation. Since, SST is a measure of the variability in y with out considering
the effect of the regressor variable x and SSRes is a measure of the variability in y remaining
after x has been considered, R2 is often called the proportion of variation explained by the
regressor x. It is clear that 0 ≤ R2 ≤ 1. Values of R2 that are close to 1 imply that most of
the variability in y is explained by the regression model.
Example 5.7 (The Rocket Propellant Data). For the regression model for the rocket
propellant data, we have R2 = 0.9018. That means that 90.18% of the variability in strength
is accounted for by the regression model. ||
106
However, the statistic R2 should be used with caution, since it is always possible to
make R2 large by adding new regressor in the model. But, it may happen that adding new
regressor may not improve the quality of the regression significantly. In a matter of fact,
adding new regressor may damage the quality of regression.
107