0% found this document useful (0 votes)

12 views14 pages

Chapter 5

The document contains class notes for a course on Statistical Inference and Multivariate Analysis, taught by Ayon Ganguly at IIT Guwahati from January to May 2021. It covers various topics including transformation techniques, point estimation, hypothesis testing, interval estimation, and regression analysis. The notes provide a structured overview of statistical concepts and methodologies relevant to the course.

Uploaded by

Deepak Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

Chapter 5

Uploaded by

Deepak Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Statistical Inference and Multivariate Analysis

(MA 324)

Class Notes
January – May, 2021

Instructor
Ayon Ganguly
Department of Mathematics
IIT Guwahati
Contents

1 Review 3
1.1 Transformation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Technique 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Technique 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Technique 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Some Results on Independent and Identically Distributed Normal RVs . . . . 18
1.4 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 Point Estimation 31
2.1 Introduction to Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Parametric Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Minimal Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Ancillary Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Complete Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.9 Families of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.1 Location Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.2 Scale Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.9.3 Location-Scale Family . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9.4 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.10 Basu’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.11 Method of Finding Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.11.1 Method of Moment Estimator . . . . . . . . . . . . . . . . . . . . . . 52
2.11.2 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . 53
2.12 Criteria to Compare Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.12.1 Unbiasedness, Variance, and Mean Squared Error . . . . . . . . . . . 58
2.12.2 Best Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.12.3 Rao-Blackwell Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.12.4 Uniformly Minimum Variance Unbiased Estimator . . . . . . . . . . . 64
2.12.5 Large Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . 68

3 Tests of Hypotheses 71
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Errors and Errors Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 73

1
3.3 Best Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Simple Null Vs. Simple Alternative . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 One-sided Composite Alternative . . . . . . . . . . . . . . . . . . . . . . . . 80
3.5.1 UMP Test via Neyman-Pearson Lemma . . . . . . . . . . . . . . . . 80
3.5.2 UMP Test via Monotone Likelihood Ratio Property . . . . . . . . . . 81
3.6 Simple Null Vs. Two-sided Alternative . . . . . . . . . . . . . . . . . . . . . 82
3.7 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.8 p-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4 Interval Estimation 90
4.1 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1.1 Interpretation of Confidence Interval . . . . . . . . . . . . . . . . . . 91
4.2 Method of Finding CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1 One-sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.2 Two-sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3 Asymptotic CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 Distribution Free Population Mean . . . . . . . . . . . . . . . . . . . 95
4.3.2 Using MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Regression Analysis 97
5.1 Regression and Model Building . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Least Squares Estimation of the Parameters . . . . . . . . . . . . . . 100
5.2.2 Properties of Least Squares Estimators . . . . . . . . . . . . . . . . . 101
5.2.3 Estimation of Error Variance . . . . . . . . . . . . . . . . . . . . . . 103
5.2.4 Hypothesis Testing on the Slope and Intercept . . . . . . . . . . . . . 103
5.2.5 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.6 Prediction of New Observation . . . . . . . . . . . . . . . . . . . . . . 106
5.2.7 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . 106

2
Chapter 5

Regression Analysis

Most of the contains of the chapter is taken from Montgomery, Peck, and Vining, Introduc-
tion to Linear Regression Analysis, Wiley, 2003

5.1 Regression and Model Building

Regression analysis is a statistical tool for investigating and modeling the relationship be-
tween variables. In fact, regression analysis may be the most widely use statistical technique.
This technique is used in all most all the areas of science and technology, social sciences,
economics, management.
In a typical problem of regression, we are interested in one particular variable. This
variable is called target, response, or dependent variable. We have a set of k other variables
which might be useful to model or explain the response. These variables are called predictors,
regressors or independent variables. We will use y to denote the response and x1 , x2 , . . . , xk
to denote the predictors. Note that the sense in which independent/dependent used in
regression is different then that us used for independent/dependent of random variables.
For example, we may be interested in sales of a particular product, sale price of a home,
voting preference of a particular voter, or delivery time of bottles of a particular soft drinks.
These variables are response. In different problems (i.e., analyzing of different response
variables), we may have different predictors. For example, when sales of a particular product
is response, predictors may include the price of the product, the prices of the competitor
products, etc. Similarly, to model the sale price of a home useful predictors might include lot
size, number of bedrooms, number of bathrooms, etc. For voter preference, age, sex, income,
party membership, etc. could be considered as predictor. Typically, a regression analysis is
used for one (or more) of three purposes:
• Modeling the relationship between predictors and response;
• Prediction of the target variable (forecasting);
• Testing of hypotheses.
We will discuss these purposes in the next sections.

5.2 Simple Linear Regression

Let us start with an example.

97
Example 5.1 (The Rocket Propellant Data). A rocket motor is manufactured by bonding
an igniter propellant and a sustainer propellant together inside a metal housing. The shear
strength of the bond between the two types of propellant is an important quality character-
istic. It has been suspected that the shear strength depends on the age of the batch of the
sustainer propellant. Therefore, 20 observations on shear strength and age in weeks of the
corresponding batch of sustainer propellant are made and given in the following table.

Table 5.1: Propellant Data

Sl. no. Shear Strength (psi) Age (in weeks)

1 2158.70 15.50
2 1678.15 23.75
3 2316.00 8.00
4 2061.30 17.00
5 2207.50 5.50
6 1708.30 19.00
7 1784.70 24.00
8 2575.00 2.50
9 2357.90 7.50
10 2256.70 11.00
11 2165.20 13.00
12 2399.55 3.75
13 1799.80 25.00
14 2336.75 9.75
15 1765.30 22.00
16 2053.50 18.00
17 2414.40 6.00
18 2200.50 12.50
19 2654.20 2.00
20 1753.70 21.50

Let us denote the shear strengths by yi ’s and ages by xi ’s. In any regression analysis
plotting of data is very important. We will come back to this with an example. Note that in
this case sample correlation coefficient is −0.948. The scatter plot of shear strength versus
propellant age is provided in the Figure 5.1. This figure suggests that there is a relationship
among shear strength and propellant age. The impression is that the data points generally,
but not exactly, fall along a straight line with negative slope.

98
Figure 5.1: Scatter diagram of shear strength versus propellant age

Denoting shear strength by y and propellant age by x, the equation of a straight line
relating these two variables may be presented by

y = β0 + β1 x,

where β0 is intercept and β1 is the slope. Note that data points do not fall exactly on the
straight line. Therefore, we should modify the equation so that it can take this into account.
Thus a more plausible model for shear straight is

y = β0 + β1 x + ε, (5.1)

where ε = y − (β0 + β1 x) is the difference between the observed value y and the straight line
(β0 + β1 x). Thus, ε is called an error. ||

Definition 5.1 (Simple Linear Regression). Equation (5.1) is called a linear regression
model. Also, as the (5.1) includes only one predictor, it is called simple linear regression.

It is convenient to assume that ε is a statistical error, i.e., it is a random variable that

accounts for the failure of the model to fit th data exactly. The error may be made up for the
effects of other variables on the response like measurement errors. We will also assume that
we can fix the value of the predictor x and observe the corresponding value of the response
y. As x is fixed, the probabilistic properties of y will be determined by the random error ε.
Thus, we make following assumptions:

1. The regressor x is controlled (thus is not a RV) by the analyst and measured with
negligible error.

99
2. The random errors are assumed to have mean zero and variance σ 2 . Note that on an
average we do not want to commit any error, and hence, the mean zero is a meaningful
assumption.

3. We assume that the errors are uncorrelated.

As we assume that the error is a RV, y is also a RV. Thus, for each x, we have a
distribution of y. The mean and variance of this distribution are

E (y) = E (β0 + β1 x + ε) = β0 + β1 x,

and

V ar (y) = V ar (ε) = σ 2 ,

respectively. Thus, mean of y is a linear function of x and variance of y does not depend on
x. Moreover, as the errors are assumed to be uncorrelated, responses are uncorrelated.
The parameters β0 and β1 are called regression coefficients and they have useful practical
interpretation in many cases. For example, the slope β1 is the amount by which the mean
of the response variable changes with a unit change in regressor variable. If the range on x
includes zero, then the intercept β0 is the mean of y when x = 0. Of course, β0 does not
have any practical interpretation when the range of x does not include zero.

5.2.1 Least Squares Estimation of the Parameters

The method of least squared can be used to estimate regression coefficients β0 and β1 . This
method is described below. Assume that we have n pairs of data point

(y1 , x1 ) , (y2 , x2 ) , . . . , (yn , xn )

on response and predictor, respectively. We estimate the regressions coefficients β0 and β1

such that the sum of the squares of the differences between the responses yi and the straight
line β0 + β1 xi is a minimum. Thus, the least square criterion is
n
X
S (β0 , β1 ) = (yi − β0 − β1 xi )2 .
i=1

Then, βb0 and βb1 are least squares estimators of β0 and β1 , respectively, if

S β0 , β1 = min S (β0 , β1 ) .
b b
β0 , β 1

Thus, βb0 and βb1 must satisfy

n n n
∂S X X X
= −2 yi − β0 − β1 xi = 0 =⇒ nβ0 + β1
b b b b xi = yi , (5.2)
∂β0 βb0 , βb1 i=1 i=1 i=1

and
n n n n
∂S X X X
2
X
= −2 xi yi − β0 − β1 xi = 0 =⇒ β0
b b b xi + β1
b xi = xi yi . (5.3)
∂β1 βb0 , βb1 i=1 i=1 i=1 i=1

100
Equations (5.2) and (5.3) are called the least squares normal equations (or simply normal
equations). The solutions to the normal equations are

Sxy
βb0 = y − βb1 x and βb1 = , (5.4)
Sxx
2
where Sxx = ni=1 P (xi − x)2 = ni=1 x2i − n1 ( ni=1 xi ) and Sxy = ni=1 (xi − x) (yi − y) =
P P P P
P n n
i=1 (xi − x) yi = i=1 xi yi − nx y. The difference between the observed value yi and its’
fitted value ybi = β0 + βb1 xi is called residual. Thus, the ith residual is
b

ei = yi − ybi = yi − βb0 − βb1 xi , for i = 1, 2, . . . , n.

Example 5.2 (The Rocket Propellant Data). It seems reasonable to fit a linear regression
form the Figure 5.1. Therefore, we want to fit the model

y = β0 + β1 x + ε.

It can be easily seen that Sxx = 1106.56 and Sxy = −41112.65. Thus, using (5.1), βb1 =
−37.15 and βb0 = 2627.82. The Table 5.2 provides the fitted values ybi and residuals ei . ||

Table 5.2: Fitted Values and Residuals

Sl. No. Fitted Value (b

yi ) Residual (ei )
1 2051.94 106.76
2 1745.42 -67.27
3 2330.59 -13.59
4 1996.21 65.09
..
.
19 2553.52 100.68
20 1829.02 -75.32

5.2.2 Properties of Least Squares Estimators

Theorem 5.1. βb0 and βb1 are linear combinations of the observations yi .

Proof: Easy to see. For example βb1 = ni=1 ci yi , where ci = xSi xx

−x
P
.

Theorem 5.2. βb0 and βb1 are UE of the parameters β0 and β1 , respectively.

Proof:
n
! n ∞ n n
X X X X X
E βb1 = E ci y i = ci E(yi ) = ci (β0 + β1 xi ) = β0 ci + β 1 ci x i = β 1 ,
i=1 i=1 i=1 i=1 i=1
Pn Pn
as i=1 ci = 0 and i=1 ci xi = 1. Also,
1X n
E β0 = E y − β1 x = E (y) − xE β1 =
b b b (β0 + β1 xi ) − β1 x = β0 .
n i=1

101

1 x2 σ2
Theorem 5.3. The variance of βb0 and βb1 are σ 2 n
+ Sxx
and Sxx
, respectively.

Proof:
n
! n n
X X X (xi − x)2 σ2
V ar β1 = V ar
b ci y i = c2i V ar(yi ) = σ 2
2
= .
i=1 i=1 i=1
Sxx Sxx

The second equality above is true as yi are uncorrelated. The third equality holds true as
V ar(yi ) = σ 2 for all i = 1, 2, . . . , n.

2
V ar β0 = V ar y − β1 x = V ar (y) + x V ar β1 − 2xCov y, β1 .
b b b b

Now,
σ2
V ar (y) =
n
and
n n
! n n
1X X X ci σ2 X
Cov y, βb1 = Cov yi , ci y i = V ar(yi ) = ci = 0.
n i=1 i=1 i=1
n n i=1

Therefore,
2

2 1 x
V ar βb0 = σ + .
n Sxx

Definition 5.2 (Linear Estimator). An estimator θb is called a linear estimator of θ if θb is

a linear combination of random observations.
Definition 5.3 (BLUE). An estimator θb is called best linear unbiased estimator (BLUE) of
a parameter θ if θb is a linear estimator and UE of θ and θb has minimum variance among all
linear unbiased estimator of θ.
Theorem 5.4 (Gauss-Markov Theorem). The least squares estimators βb0 and βb1 are best
linear unbiased estimator of β0 and β1 , respectively.
Proof: Proof is skipped.
Pn
bi ) = ni=1 ei = 0.
P
Theorem 5.5. i=1 (yi − y

Proof: It follows directly form the first normal equation.

Pn Pn
Corollary 5.1. i=1 yi = i=1 y
bi .
Theorem 5.6. The least squared regression line always passes through the centroid (the
point (x, y)) of the data.
Proof: The proof is trivial.
Pn
Theorem 5.7. i=1 xi ei = 0.

Proof: It follows directly form the second normal equation.

Pn
Theorem 5.8. i=1 y
bi ei = 0.
Pn Pn b
b1 xi ei = βb0 Pn ei + βb1 Pn xi ei = 0.
Proof: i=1 y
b i e i = i=1 β 0 + β i=1 i=1

102
5.2.3 Estimation of Error Variance
In the previous couple of subsections, we have discussed estimation of two regression param-
eters. For many purpose, it is important to estimate the error variance σ 2 , which can be
estimated unbiasedly as follows. The estimator of σ 2 can be obtained from the residual or
error sum of square
n
X n
X
SSRes = e2i = (yi − ybi )2 .
i=1 i=1

It can be shown that

E (SSRes ) = (n − 2) σ 2 .

Therefore, an unbiased estimator of σ 2 is

SSRes
b2 =
σ = M SRes .
n−2
b2 depends on the residual
The quantity M SRes is called the residual mean square. Note that σ
sum of squares, which in turn depends on model assumption. Therefore, any violation of
assumptions may have serious damage on the usefulness or σ b2 as an estimator of σ 2 .
A convenient computing formula for SSRes may be found as follows.
n
X
SSRes = (yi − ybi )2
i=1
n
X 2
= yi − βb0 − βb1 xi
i=1
Xn 2
= yi − y − βb1 x − βb1 xi
i=1
n
X
= (yi − y)2 + βb12 Sxx − 2βb1 Sxy
i=1

= SST − βb1 Sxy ,

Pn
where SST = i=1 (yi − y)2 is called total sum of square.
Example 5.3 (The Rocket Propellant Data). It can be seen that SST = 1693737.60.
b2 = 166402.65
Hence SSRes = 166402.65. Therefore, the estimate of σ 2 is σ 20−2
= 9244.59. ||

5.2.4 Hypothesis Testing on the Slope and Intercept

In this section we will discuss hypothesis testing related to simple linear regression. Note
that if β1 = 0, then y = β0 + ε. That means that the regression is not meaningful if β1 = 0.
Therefore, it is one of the fundamental thing to test in case of simple linear regression.
To perform the hypothesis testing and to construct interval estimator, we need an ad-
ditional assumption, viz., the errors are normally distributed. Thus, the complete set of
assumptions are as follows: the errors εi are i.i.d. RVs having a normal distribution with
mean zero and variance σ 2 .

103
Suppose that we want to test if the slope equals to a constant, say β10 . Therefore,
appropriate hypotheses are H0 : β1 = β10 against H1 : β1 6= β10 . Based on the assumption
on errors, we can see that yi ∼ N (β0 + β1 xi , σ 2 ) and yi ’s are independent. Therefore, βb1 ,
being a linear combination of yi ’s, follows a normal distribution with mean β1 and variance
σ2
Sxx
. Thus, the statistic

βb1 − β10
Z= q
σ2
Sxx

follows a N (0, 1) distribution if β1 = β10 . If σ is known, we can use Z to to test the

hypothesis. However, generally σ is unknown in practice. Hence, Z cannot be use for testing
purpose. We can replace σ 2 with its estimator σb2 . It can be shown that (n−2)M
σ2
SRes
∼ χ2n−2 .
Also, M SRes and βb1 are independent RVs. Therefore, the statistic

βb1 − β10
t= q ∼ tn−2
M SRes
Sxx

under the null hypothesis H0 : β1 = β10 . The null hypothesis is rejected at the level α if
|t| > tn−2, α2 .
Similarly, we may want to test H0 : β0 = β00 against H1 : β0 6= β00 . We can use the test
statistic

βb0 − β00
t= r .
x2
M SRes n1 + Sxx

Under the null hypothesis H0 : β0 = β00 , t follows a t-distribution with degrees of freedom
n − 2. Therefore, the null hypothesis may be rejected at level α if |t| > tn−2, α2 .

Example 5.4 (The Rocket Propellant Data). We will test for the significance of the re-
gression in the rocket propellant data. That means we want to test H0 : β1 = 0 against
H1 : β1 6= 0. The observed value of the test statistic is
−37.15
t= p = −12.85.
9244.59/1106.56

If we choose α = 0.05, t18, 0.025 = 2.101. Thus, the null hypothesis H0 : β1 = 0 is rejected
and we conclude that there is a linear relationship between shear strength and the age of
the propellant. ||

5.2.5 Interval Estimation

In this subsection we will discuss about interval estimation of β0 , β1 , σ 2 and mean response
E(y). To construct CI for β0 and β1 , we can use pivots

βb0 − β0 βb1 − β1
and ,
se(βb0 ) se(βb1 )

104
r q
x2 M SRes
where se(βb0 ) = M SRes n1 + Sxx
and se(βb1 ) = Sxx
. Note that both the pivots follow
tn−2 distribution. Therefore, a 100(1 − α)% CI for β0 is
" s s #
2 2

1 x 1 x
βb0 − tn−2, α2 M SRes + , βb0 + tn−2, α2 M SRes + ,
n Sxx n Sxx

and a 100(1 − α)% CI for β1 is

" r r #
M S Res M S Res
βb1 − tn−2, α2 , βb1 + tn−2, α2 .
Sxx Sxx

Under the assumption of normal errors, it can be shown that

(n − 2)M SRes
∼ χ2n−2 .
σ2
Therefore, we can use it as a pivot to construct CI for σ 2 . A 100(1 − α)% CI for σ 2 is
" #
(n − 2)M SRes (n − 2)M SRes
, .
χ2n−2, α χ2n−2, 1− α
2 2

Now, we will discuss CI for mean response for a particular value of regressor. In case of
forecasting it is meaningful. For example, we may want to know the estimate of mean shear
strength of a propellant that is 10 weeks old. In general, let x0 be the level of the regressor
variable for which we wish to estimate the mean response yb0 = βb0 + βb1 x0 = µy|x0 , say. Note
that µy|x0 follows a normal distribution as it is a linear combination of yi ’s. The mean of
µy|x0 is E (y|x0 ) = β0 + β1 x0 and variance of µy|x0 is
!
2
1 (x 0 − x)
V ar(µy|x0 ) = V ar βb0 + βb1 x0 = V ar y + βb1 (x0 − x) = σ 2 + .
n Sxx

Here, the second can be found by replacing βb0 by y − βb1 x. The third equality holds true as
Cov(y, βb1 ) = 0. Thus, the distribution of

µy|x0 − E (y|x0 )
r
2
M SRes n1 + (x0S−x)
xx

is t with n − 2 degrees of freedom. Therefore, it can be used as a pivot. A 100(1 − α)% CI

for mean response at x = x0 is given by
" s s #
1 (x0 − x)2 1 (x0 − x)2
µy|x0 − tn−2, α2 M SRes + , µy|x0 + tn−2, α2 M SRes + .
n Sxx n Sxx

Example 5.5 (The Rocket Propellant Data). We will construct 95% CI on β1 . The
standard error βb1 is se(βb1 ) = 2.89 and t18,0.025 = 2.101. Therefore, a 95% CI for β1
is [−43.22, −31.08]. Similarly, a 95% CI for mean response at x = 13.3625 becomes
[2086.230, 2176.571]. ||

105
5.2.6 Prediction of New Observation
An important application of the regression model is prediction of new observations y corre-
sponding to a specified level of the regressor variable x. If x0 is the value of the regressor
variable of interest, then

yb0 = βb0 + βb1 x0

is the point estimate of the new value of the response y0 .

Now, consider interval estimation of this future response y0 . The CI on the mean response
at x = x0 is inappropriate as it is interval estimate for mean response, not a probability
statement on future observation. Note that the random variable ψ = y0 − yb0 follows a
normal distribution with mean zero and variance
1 (x0 − x)2

2
V ar(ψ) = V ar(y0 − yb0 ) = σ 1 + + ,
n Sxx
as y0 and yb0 are independent. Thus, a 100(1−α)% predictive interval on a future observation
at x0 is
" s s #
1 (x0 − x)2 1 (x0 − x)2
yb0 − tn−2, α2 M SRes 1 + + , yb0 + tn−2, α2 M SRes 1 + + .
n Sxx n Sxx

Example 5.6 (The Rocket Propellant Data). We will find a 95% prediction interval on a
future value of propellant shear strength in a motor made from a batch of sustainer propellant
that is 10 weeks old. Using the previous formula, the prediction interval becomes [2048.32,
2464.32]. ||

5.2.7 Coefficient of Determination

The quantity
SSR SSRes
R2 = =1−
SST SST
is called the coefficient of determination. Note that

SST = SSR + SSRes ,

Pn
where SSR = i=1 yi − y)2 . Also, note that the cross-product term
(b
n
X
(yi − ybi ) (b
yi − y) = 0
i=1

using normal equation. Since, SST is a measure of the variability in y with out considering
the effect of the regressor variable x and SSRes is a measure of the variability in y remaining
after x has been considered, R2 is often called the proportion of variation explained by the
regressor x. It is clear that 0 ≤ R2 ≤ 1. Values of R2 that are close to 1 imply that most of
the variability in y is explained by the regression model.
Example 5.7 (The Rocket Propellant Data). For the regression model for the rocket
propellant data, we have R2 = 0.9018. That means that 90.18% of the variability in strength
is accounted for by the regression model. ||

106
However, the statistic R2 should be used with caution, since it is always possible to
make R2 large by adding new regressor in the model. But, it may happen that adding new
regressor may not improve the quality of the regression significantly. In a matter of fact,
adding new regressor may damage the quality of regression.

107

Ekstrøm, Claus Thorn - Sørensen, Helle - Introduction To Statistical Data Analysis For The Life Sciences-CRC Press (2014)
No ratings yet
Ekstrøm, Claus Thorn - Sørensen, Helle - Introduction To Statistical Data Analysis For The Life Sciences-CRC Press (2014)
521 pages
Lessard-Lucea Embracing Risk As A Core Competence The Case of CEMEX
No ratings yet
Lessard-Lucea Embracing Risk As A Core Competence The Case of CEMEX
10 pages
Siwes Guide
100% (1)
Siwes Guide
47 pages
Stat2 by Ann R. Cannon
No ratings yet
Stat2 by Ann R. Cannon
639 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
REG2022
No ratings yet
REG2022
313 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
Dating The US M1
No ratings yet
Dating The US M1
25 pages
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
No ratings yet
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
259 pages
Statistics Coping With Uncertainty-2024!10!15
No ratings yet
Statistics Coping With Uncertainty-2024!10!15
346 pages
12-Month Swing Trading Roadmap
No ratings yet
12-Month Swing Trading Roadmap
5 pages
RFP 08-2024 Appointment of Service Provider For Transactional Advisory Service On PPP Projects For Isimangaliso
No ratings yet
RFP 08-2024 Appointment of Service Provider For Transactional Advisory Service On PPP Projects For Isimangaliso
101 pages
Report On Omore
60% (5)
Report On Omore
26 pages
Advanced Statistics Lecture Notes
No ratings yet
Advanced Statistics Lecture Notes
115 pages
Bio Stat
No ratings yet
Bio Stat
472 pages
Math2831 Course Pack
No ratings yet
Math2831 Course Pack
246 pages
ST2134 ASSI 2021 Guide
No ratings yet
ST2134 ASSI 2021 Guide
232 pages
Analysing Data Using Linear Models 5th Ed January 2021
No ratings yet
Analysing Data Using Linear Models 5th Ed January 2021
388 pages
Lecture 11.7 - Expectation and Variance of Hypergeometric Distribution
No ratings yet
Lecture 11.7 - Expectation and Variance of Hypergeometric Distribution
17 pages
WT ST102
No ratings yet
WT ST102
201 pages
Introduction To Statistics WITH SAS
No ratings yet
Introduction To Statistics WITH SAS
238 pages
LEC 5 Standard Costing and Variance Analysis
No ratings yet
LEC 5 Standard Costing and Variance Analysis
33 pages
STA501 Study Guide 2024-02-27 01 - 00 - 08
No ratings yet
STA501 Study Guide 2024-02-27 01 - 00 - 08
270 pages
Applied Robust Statistics 2005 PDF
No ratings yet
Applied Robust Statistics 2005 PDF
532 pages
Advnced Excel Skills D2 2
No ratings yet
Advnced Excel Skills D2 2
122 pages
Ncert Solutions Chapter 1. The End of Bipolarity
No ratings yet
Ncert Solutions Chapter 1. The End of Bipolarity
4 pages
Book IntroStatistics
No ratings yet
Book IntroStatistics
422 pages
Stats For Science Course Reader
No ratings yet
Stats For Science Course Reader
77 pages
Linear Regression With Python
No ratings yet
Linear Regression With Python
140 pages
STAT1301 Notes
No ratings yet
STAT1301 Notes
215 pages
Letterhad: Our Ref: Date
No ratings yet
Letterhad: Our Ref: Date
2 pages
SFM Portfolio Test 26 Jan 25 SOL
No ratings yet
SFM Portfolio Test 26 Jan 25 SOL
5 pages
Applied Robust Statistics
No ratings yet
Applied Robust Statistics
532 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
Course HEM245 2021
No ratings yet
Course HEM245 2021
157 pages
AQA Specimen Paper 1 AS MCQs
No ratings yet
AQA Specimen Paper 1 AS MCQs
28 pages
Econometric S 2007
No ratings yet
Econometric S 2007
167 pages
Kuan C.-M. Introduction To Econometric Theory (LN, Taipei, 2002) (202s) - GL
No ratings yet
Kuan C.-M. Introduction To Econometric Theory (LN, Taipei, 2002) (202s) - GL
202 pages
Math Stats Lecture 2020F
No ratings yet
Math Stats Lecture 2020F
122 pages
PF Investments
No ratings yet
PF Investments
7 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
#Worksheet-Demand Shifters-Answers
No ratings yet
#Worksheet-Demand Shifters-Answers
4 pages
Chapter 3 4
No ratings yet
Chapter 3 4
9 pages
Foundations of Applied Statistical Methods, 2nd Edition Official Ebook Release
No ratings yet
Foundations of Applied Statistical Methods, 2nd Edition Official Ebook Release
17 pages
剑桥大学统计学讲义
No ratings yet
剑桥大学统计学讲义
78 pages
FAR 1A Solutions 42 44
No ratings yet
FAR 1A Solutions 42 44
3 pages
PAYSLIP Mar24
No ratings yet
PAYSLIP Mar24
1 page
Pemko - Sello Perimetral
No ratings yet
Pemko - Sello Perimetral
2 pages
Regbook Inside
100% (1)
Regbook Inside
21 pages
Stat 378
No ratings yet
Stat 378
73 pages
Appstat PDF
No ratings yet
Appstat PDF
197 pages
Hansen (2006, Econometrics)
No ratings yet
Hansen (2006, Econometrics)
196 pages
C Mart Note
No ratings yet
C Mart Note
14 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
European Monetary System (EMS) An Arrangement Initiated in 1979 Where Members of The European Economic Community Agreed To Link Their Currencies
No ratings yet
European Monetary System (EMS) An Arrangement Initiated in 1979 Where Members of The European Economic Community Agreed To Link Their Currencies
5 pages
Worksheet Unit 3 Supply
50% (2)
Worksheet Unit 3 Supply
4 pages
Tcaeby4240 230 KW
No ratings yet
Tcaeby4240 230 KW
12 pages
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
No ratings yet
Probability and Statistics Ii: George Deligiannidis Module Lecturer 2020/21: Kalliopi Mylona
107 pages
Handbook of Regression Methods
100% (5)
Handbook of Regression Methods
654 pages
ch18 PDF
No ratings yet
ch18 PDF
35 pages
Ecstats
No ratings yet
Ecstats
299 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Sasha AFA Assignment 2 MCQ
No ratings yet
Sasha AFA Assignment 2 MCQ
3 pages
Notes
No ratings yet
Notes
172 pages
Notes On Estimation
No ratings yet
Notes On Estimation
76 pages
Royal University of Phnom Penh Institute of Foreign Languages Department of International Relations
No ratings yet
Royal University of Phnom Penh Institute of Foreign Languages Department of International Relations
15 pages
Jal Del
No ratings yet
Jal Del
2 pages
Chart English Growth Shera
No ratings yet
Chart English Growth Shera
38 pages
BROKER
No ratings yet
BROKER
2 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Foundations of Econometrics Using SAS Simulations and Examples
No ratings yet
Foundations of Econometrics Using SAS Simulations and Examples
56 pages
Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
What Statistics Books Try To Teach You But Dont Joe King University of Washington
No ratings yet
What Statistics Books Try To Teach You But Dont Joe King University of Washington
40 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
46 pages
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
Basic Stats
No ratings yet
Basic Stats
49 pages
Pile Fabric Abrasion: Standard Test Method For
No ratings yet
Pile Fabric Abrasion: Standard Test Method For
6 pages
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Generalized Linear Models
100% (9)
Generalized Linear Models
243 pages
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet

Chapter 5

Uploaded by

Chapter 5

Uploaded by

Statistical Inference and Multivariate Analysis

5.1 Regression and Model Building

5.2 Simple Linear Regression

Table 5.1: Propellant Data

Sl. no. Shear Strength (psi) Age (in weeks)

It is convenient to assume that ε is a statistical error, i.e., it is a random variable that

3. We assume that the errors are uncorrelated.

5.2.1 Least Squares Estimation of the Parameters

(y1 , x1 ) , (y2 , x2 ) , . . . , (yn , xn )

on response and predictor, respectively. We estimate the regressions coefficients β0 and β1

Thus, βb0 and βb1 must satisfy

ei = yi − ybi = yi − βb0 − βb1 xi , for i = 1, 2, . . . , n.

Table 5.2: Fitted Values and Residuals

Sl. No. Fitted Value (b

5.2.2 Properties of Least Squares Estimators

Proof: Easy to see. For example βb1 = ni=1 ci yi , where ci = xSi xx

Definition 5.2 (Linear Estimator). An estimator θb is called a linear estimator of θ if θb is

Proof: It follows directly form the first normal equation.

Proof: It follows directly form the second normal equation.

It can be shown that

Therefore, an unbiased estimator of σ 2 is

= SST − βb1 Sxy ,

5.2.4 Hypothesis Testing on the Slope and Intercept

follows a N (0, 1) distribution if β1 = β10 . If σ is known, we can use Z to to test the

5.2.5 Interval Estimation

and a 100(1 − α)% CI for β1 is

Under the assumption of normal errors, it can be shown that

is t with n − 2 degrees of freedom. Therefore, it can be used as a pivot. A 100(1 − α)% CI

yb0 = βb0 + βb1 x0

is the point estimate of the new value of the response y0 .

5.2.7 Coefficient of Determination

SST = SSR + SSRes ,

You might also like