Simple Linear Regression: Y XI. XI X
Simple Linear Regression: Y XI. XI X
2.1 INTRODUCTlO N
We start with the simple case of studying the relationship between a response vari-
able Y and a predictor variable X I . Since we have only one predictor variable,
we shall drop the subscript in X I and use X for simplicity. We discuss covariance
and correlation coefficient as measures of the direction and strength of the linear
relationship between the two variables. Simple linear regression model is then
formulated and the key theoretical results are given without mathematical deriva-
tions, but illustrated by numerical examples. Readers interested in mathematical
derivations are referred to the bibliographic notes at the end of the chapter, where
books that contain a formal development of regression analysis are listed.
Table 2.1 Notation for the Data Used in Simple Regression and Correlation
~~~ ~ ~
n Yn Xn
On the scatter plot of Y versus X , let us draw a vertical line at 2 and a horizontal
line at j j , as shown in Figure 2.1, where
are the sample mean of Y and X, respectively. The two lines divide the graph into
four quadrants. For each point i in the graph, compute the following quantities:
0 yi - j j , the deviation of each observation yi from the mean of the response
variable,
0 i - 2, the deviation of each observation xi from the mean of the predictor
z
variable, and
0 the product of the above two quantities, (yi - fj)(xi- 2 ) .
It is clear from the graph that the quantity (yi - y) is positive for every point in the
first and second quadrants, and is negative for every point in the third and fourth
quadrants. Similarly, the quantity (xi- Z ) is positive for every point in the first and
fourth quadrants, and is negative for every point in the second and third quadrants.
These facts are summarized in Table 2.2.
Quadrant Yi - Y xi - z (Yi -
i.($ - 2)
+ + +
~
3
4
where
(2.4)
is the sample standard deviation of Y. It can be shown that the standardized
variable 2 in (2.3) has mean zero and standard deviation one. We standardize X in
24 SIMPLE LINEAR REGRESSION
a similar way by subtracting the mean Z from each observation zi then divide by
the standard deviation sx.The covariance between the standardized X and Y data
is known as the correlation coeflcient between Y and X and is given by
Cor(Y,X) = -
n-1.
Thus, Cor(Y, X) can be interpreted either as the covariance between the standard-
ized variables or the ratio of the covariance to the standard deviations of the two
Variables. From (2.5), it can be seen that the correlation coefficient is symmetric,
that is, Cor(Y, X) = Cor(X, Y).
Unlike Cov(Y,X), Cor(Y, X) is scale invariant, that is, it does not change if we
change the units of measurements. Furthermore, Cor(Y, X) satisfies
-1 5 Cor(Y,X) 5 1. (2.8)
These properties make the Cor(Y,X) a useful quantity for measuring both the
direction and the strength of the relationship between Y and X . The magnitude of
Cor(Y, X) measures the strength of the linear relationship between Y and X . The
closer Cor(Y, X) is to 1 or -1, the stronger is the relationship between Y and X .
The sign of Cor(Y, X ) indicates the direction of the relationship between Y and X .
That is, Cor(Y, X) > 0 implies that Y and X are positively related. Conversely,
Cor(Y, X) < 0, implies that Y and X are negatively related.
Note, however, that Cor(Y, X) = 0 does not necessarily mean that Y and X are
not related. It only implies that they are not linearly related because the correlation
coefficient measures only linear relationships. In other words, the Cor(Y, X ) can
still be zero when Y and X are nonlinearly related. For example, Y and X in Table
2.3 have the perfect nonlinear relationship Y = 50 - X2 (graphed in Figure 2.2),
yet Cor(Y, X ) = 0.
Furthermore, like many other summary statistics, the Cor(Y,X) can be sub-
stantially influenced by one or few outliers in the data. To emphasize this point,
Anscombe (1973) has constructed four data sets, known as Anscombe’s quartet,
each with a distinct pattern, but each having the same set of summary statistics (e.g.,
the same value of the correlation coefficient). The data and graphs are reproduced
in Table 2.4 and Figure 2.3. The data can be found in the book’s Web site.’ An
analysis based exclusively on an examination of summary statistics, such as the
correlation coefficient, would have been unable to detect the differences in patterns.
’ http://www.ilr.corneIl.eduThadi/RABE4
COVARIANCE AND CORRELATION COEFFICIENT 25
Y X Y X Y X
1 -7 46 -2 41 3
14 -6 49 -1 34 4
25 -5 50 0 25 5
34 -4 49 1 14 6
41 -3 46 2 1 7
-4 0 4
.I
A
Table 2.4 Anscombe’s Quartet: Four Data Sets Having Same Values of Summary
Statistics
Yl XI y2 x2 y3 x3 y4 x4
8.04 10 9.14 10 7.46 10 6.58 8
6.95 8 8.14 8 6.77 8 5.76 8
7.58 13 8.74 13 12.74 13 7.71 8
8.81 9 8.77 9 7.1 1 9 8.84 8
8.33 11 9.26 11 7.81 11 8.47 8
9.96 14 8.10 14 8.84 14 7.04 8
7.24 6 6.13 6 6.08 6 5.25 8
4.26 4 3.10 4 5.39 4 12.50 19
10.84 12 9.13 12 8.15 12 5.56 8
4.82 7 7.26 7 6.42 7 7.91 8
5.68 5 4.74 5 5.73 5 6.89 8
Source: Anscombe (1973).
26 SIMPLE LINEAR REGRESSION
4 6 8 10 12 14 4 6 8 10 12 14
XI x2
M
4 8 12 16 20
x4
Figure 2.3 Scatter plots of the data in Table 2.4 with the fitted lines.
An examination of Figure 2.3 shows that only the first set, whose plot is given
in (a), can be described by a linear model. The plot in (b) shows the second
data set is distinctly nonlinear and would be better fitted by a quadratic function.
The plot in (c) shows that the third data set has one point that distorts the slope
and the intercept of the fitted line. The plot in (d) shows that the fourth data set
is unsuitable for linear fitting, the fitted line being determined essentially by one
extreme observation. Therefore, it is important to examine the scatter plot of Y
versus X before interpreting the numerical value of Cor(Y, X ) .
50 100 150
Units
Figure 2.4 Computer Repair data: Scatter plot of Minutes versus Units.
n
c (Yi - 9(.i - 5)
- 1768 - 136,
Cov(Y,X) = Z=l - --
n-1 13
and
Cor(Y, X ) =
C(Yi - i.()y - 2) -
1768
= 0.996.
JC(Yi - Y)2 C(.i - $ 2 J27768.36 x 114
Before drawing conclusions from this value of Cor(Y, X ) , we should examine the
corresponding scatter plot of Y versus X . This plot is given in Figure 2.4. The
high value of Cor(Y, X ) = 0.996 is consistent with the strong linear relationship
between Y and X exhibited in Figure 2.4. We therefore conclude that there is a
strong positive relationship between repair time and units repaired.
Although Cor(Y, X) is a useful quantity for measuring the direction and the
strength of linear relationships, it cannot be used for prediction purposes, that is,
we cannot use Cor(Y, X) to predict the value of one variable given the value of the
other. Furthermore, Cor(Y, X) measures only pairwise relationships. Regression
analysis, however, can be used to relate one or more response variable to one or
more predictor variables. It can also be used in prediction. Regression analysis
28 SIMPLE LINEAR REGRESSION
Table 2.6 Quantities Needed for the Computation of the Correlation Coefficient
Between the Length of Service Calls, Y , and Number of Units Repaired, X
Y = Po + p1x + E, (2.9)
where Po and PI, are constants called the model regression coeficients or purum-
eters, and E is a random disturbance or error. It is assumed that in the range of
the observations studied, the linear equation (2.9) provides an acceptable approxi-
mation to the true relation between Y and X. In other words, Y is approximately
a linear function of X , and E measures the discrepancy in that approximation.
'The adjective linenr has a dual role here. It may be taken to describe the fact that the relationship
between Y and X is linear. More generally, the word linear refers to the fact that the regression
parameters, PO and PI, enter (2.9) in a linear fashion. Thus, for example, Y = PO PIX'+ + E is
also a linear model even though the relationship between Y and X is quadratic.
PARAMETER ESTIMATION 29
Based on the available data, we wish to estimate the parameters POand PI. This is
equivalent to finding the straight line that gives the best f i r (representation) of the
points in the scatter plot of the response versus the predictor variable (see Figure
2.4). We estimate the parameters using the popular least squares method, which
gives the line that minimizes the sum of squares of the vertical distances3 from
each point to the line. The vertical distances represent the errors in the response
variable. These errors can be obtained by rewriting (2.10) as
Ez = yz - Po - p1xi, i = 1 , 2 , . . . , n. (2.12)
The sum of squares of these distances can then be written as
W O , P1) = c&4
n
i= 1
=
n
C(Yi- Po -
i= 1
P1xi)2. (2.13)
3An alternative to the vertical distance is the perpendicular (shortest) distance from each point to the
line. The resultant line is called the orthogonal regression line.
30 SIMPLE LINEAR REGRESSION
(2.14)
and
po = g - ,813. (2.15)
Note that we give the formula for $1 before the formula for $0 because ,&uses $1.
The estimates ,&and $1 are called the least squares estimates of 00and /31 because
they are the solution to the least squares method, the intercept and the slope of the
line that has the smallest possible sum of squares of the vertical distances from each
point to the line. For this reason, the line is called the least squares regression line.
The least squares regression line is given by
Y = $0 + PIX. (2.16)
Note that a least squares line always exists because we can always find a line that
gives the minimum sum of squares of the vertical distances. In fact, as we shall
see later, in some cases a least squares line may not be unique. These cases are not
common in practice.
For each observation in our data we can compute
These are called the$tted values. Thus, the ith fitted value, jji, is the point on
the least squares regression line (2.16) corresponding to xi. The vertical distance
corresponding to the ith observation is
These vertical distances are called the ordinary4 least squares residuals. One
properties of the residuals in (2.18) is that their sum is zero (see Exercise 2.5(a)).
This means that the sum of the distances above the line is equal to the sum of the
distances below the line.
Using the Computer Repair data and the quantities in Table 2.6, we have
-
P1 =
C ( y i - i j ) ( x i - Z) - -
1768
- - 15.509,
C(.i - Z)2 114
and
$0 ---y - ,BIZ
. = 97.21 - 15.509 x 6 = 4.162.
2 4 6 8 10
Units
Figure 2.5 Plot of Minutes versus Units with the fitted least squares regression line.
This least squares line is shown together with the scatter plot of Minutes versus
Units in Figure 2.5. The fitted values in (2.17) and the residuals in (2.18) are shown
in Table 2.7.
The coefficients in (2.19) can be interpreted in physical terms. The constant
term represents the setup or startup time for each repair and is approximately 4
minutes. The coefficient of Units represents the increase in the length of a service
call for each additional component that has to be repaired. From the data given,
we estimate that it takes about 16 minutes (15.509) for each additional component
that has to be repaired. For example, the length of a service call in which four
components had to be repaired is obtained by substituting Units = 4 in the equation
+
of the regression line (2.19) and obtaining y = 4.162 15.509 x 4 = 66.20. Since
Units = 4, corresponds to two observations in our data set (observations 4 and 5),
the value 66.198 is the fitted value for both observations 4 and 5, as can be seen
from Table 2.7. Note, however, that since observations 4 and 5 have different values
for the response variable Minutes, they have different residuals.
We should note here that by comparing (2.2), (2.7), and (2.14), an alternative
formula for ,!?I can be expressed as
(2.20)
from which it can be seen that ,!?I, Cov(Y, X ) , and Cor(Y, X ) have the same
sign. This makes intuitive sense because positive (negative) slope means positive
(negative) correlation.
So far in our analysis we have made only one assumption, namely, that Y and
X are linearly related. This assumption is referred to as the linearity assumption.
This is merely an assumption or a hypothesis about the relationship between the
response and predictor variables. An early step in the analysis should always be
the validation of this assumption. We wish to determine if the data at hand support
32 SIMPLE LINEAR REGRESSION
Table 2.7 The Fitted Values, yi, and the Ordinary Least Squares Residuals, ei, for
the Computer Repair Data
i xi yi Ya ei i xi Yi $2 ei
1 1 23 19.67 3.33 8 6 97 97.21 -0.21
2 2 29 35.18 -6.18 9 7 109 112.72 -3.72
3 3 49 50.69 -1.69 10 8 119 128.23 -9.23
4 4 64 66.20 -2.20 11 9 149 143.74 5.26
5 4 74 66.20 7.80 12 9 145 143.74 1.26
6 5 87 81.71 5.29 13 10 154 159.25 -5.25
7 6 96 97.21 -1.21 14 10 166 159.25 6.75
the assumption that Y and X are linearly related. An informal way to check
this assumption is to examine the scatter plot of the response versus the predictor
variable, preferably drawn with the least squares line superimposed on the graph
(see Figure 2.5). If we observe a nonlinear pattern, we will have to take corrective
action. For example, we may re-express or transform the data before we continue
the analysis. Data transformation is discussed in Chapter 6.
If the scatter of points resemble a straight line, then we conclude that the linearity
assumption is reasonable and continue with our analysis. The least squares estima-
tors have several desirable properties when some additional assumptions hold. The
required assumptions are stated in Chapter 4. The validity of these assumptions
must be checked before meaningful conclusions can be reached from the analysis.
Chapter 4 also presents methods for the validation of these assumptions. Using the
properties of least squares estimators, one can develop statistical inference proce-
dures (e.g., confidence interval estimation, tests of hypothesis, and goodness-of-fit
tests). These are presented in Sections 2.6 to 2.9.
bo and b1 are unbiased’ estimates of Po and PI, respectively. Their variances are
Var(j3))= a2 [A +
- C(x222- 2 ) 2 1: (2.21)
and
(2.22)
Furthermore, the sampling distributions of the least squares estimates ,& and ,&
are normal with means PO and P1 and variance as given in (2.21) and (2.22),
respectively.
The variances of j o and b1 depend on the unknown parameter 02. So, we need
to estimate a’ from the data. An unbiased estimate of 0’ is given by
(2.23)
where SSE is the sum of squares of the residuals (errors). The number n - 2 in
the denominator of (2.23) is called the degrees of freedom (df). It is equal to the
number of observations minus the number of estimated regression coefficients.
Replacing g 2 in (2.21) and (2.22) by e2 in (2.23), we get unbiased estimates
of the variances of bo and ,&. An estimate of the standard deviation is called the
standard error (s.e.) of the estimate. Thus, the standard errors of ,& and ,& are
22
(2.24)
and
u
s.e.(Pl) = (2.25)
Jmz- ’
respectively, where 6 is the square root of e2in (2.23). The standard errors of is
a measure of how precisely the slope has been estimated. The smaller the standard
error the more precise the estimator.
With the sampling distributions of bo
and PI, we are now in position to perform
statistical analysis concerning the usefulness of X as a predictor of Y . Under the
normality assumption, an appropriate test statistic for testing the null hypothesis
Ho : p1 = 0 against the alternative H1 : # 0 is the t-test,
(2.26)
‘An estimate 0is said to be an unbiased estimate of a parameter ’6 if the expected value of 8 is equal
to 8.
34 SIMPLE LINEAR REGRESSION
Figure 2.6 A graph of the probability density function of a t-distribution. The p-value
for the t-test is the shaded areas under the curve.
value obtained from the t-table given in the Appendix to this book (see Table A.2),
which is t(,-2,a/2), where (Y is a specified significance level. Note that we divide
(Y by 2 because we have a two-sided alternative hypothesis. Accordingly, HOis to
Testing Ho: PI = Py
The above t-test can be generalized to test the more general hypothesis HO :
p1 = @, where p," is a constant chosen by the investigator, against the two-sided
alternative H1 : p1 # 0
:. The appropriate test statistic in this case is the t-test,
(2.29)
Note that when = 0, the t-test in (2.29) reduces to the t-test in (2.26). The
statistic tl in (2.29) is also distributed as a Student's t with ( n - 2) degrees of
TESTS OF HYPOTHESES 35
$1 - 12 - 15.509 - 12
tl = = 6.948,
0.505
~
4 1 )
with 12 degrees of freedom. The critical value for this test is t(n-2,a/2)- -
t(12,0.025) = 2.18. Since tl = 6.948 > 2.18, the result is highly significant,
leading to the rejection of the null hypothesis. The management’s estimate of the
increase in time for each additional component to be repaired is not supported by
the data. Their estimate is too low.
The need for testing hypotheses regarding the regression parameter ,& may also
arise in practice. More specifically, suppose we wish to test Ho : /30 = against
the alternative H1 : 00 # fig, where is a constant chosen by the investigator.
The appropriate test in this case is given by
(2.30)
(2.31)
rejected, which means that the predictor variable Units is a statistically significant
predictor of the response variable Minutes. This conclusion can also be reached
using (2.28) by observing that the p-value (p1 < 0.0001) is much less than cy = 0.05
indicating very high significance.
(2.32)
a 0 fq n - 2 4 2 ) x s.e.(bo), (2.33)
where t(n-2,cu/2)
is the (1 - a / 2 )percentile of a t distribution with ( n- 2) degrees
of freedom. Similarly, limits of the (1 - a ) x 100%confidence interval for ,& are
given by
A f q n - - 2 . 4 2 ) x s.e.(b1). (2.34)
The confidence interval in (2.34) has the usual interpretation, namely, if we were
to take repeated samples of the same size at the same values of X and construct for
example 95% confidence intervals for the slope parameter for each sample, then
95% of these intervals would be expected to contain the true value of the slope.
From Table 2.9 we see that a 95% confidence interval for PI is
That is, the incremental time required for each broken unit is between 14 and 17
minutes. The calculation of confidence interval for POin this example is left as an
exercise for the reader.
Note that the confidence limits in (2.33) and (2.34) are constructed for each of
the parameters POand PI, separately. This does not mean that a simultaneous (joint)
confidence region for the two parameters is rectangular. Actually, the simultaneous
confidence region is elliptical. This region is given for the general case of multiple
regression in the Appendix to Chapter 3 in (A.15), of which the simultaneous
confidence region for ,f?o and 01 is a special case.
2.8 PREDICTIONS
The fitted regression equation can be used for prediction. We distinguish between
two types of predictions:
(2.36)
38 SIMPLE LINEAR REGRESSION
(2.37)
Hence, the confidence limits for the predicted value with confidence coefficient
(1 - a ) are given by
Yo f t ( n - 2 , a p ) s.e.(Yoo>. (2.38)
For the second case, the mean response po is estimated by
bo = bo + 81x0. (2.39)
The standard error of this estimate is
(2.40)
from which it follows that the confidence limits for po with confidence coefficient
(1 - a ) are given by
bo ft ( n - 2 , a p ) s.e.(bo). (2.41)
Note that the point estimate of po is identical to the predicted response Go. This
can be seen by comparing (2.36) with (2.39). The standard error of jio is, however,
smaller than the standard error of yo and can be seen by comparing (2.37) with
(2.40). Intuitively, this makes sense. There is greater uncertainty (variability)
in predicting one observation (the next observation) than in estimating the mean
response when X = 20. The averaging that is implied in the mean response reduces
the variability and uncertainty associated with the estimate.
To distinguish between the limits in (2.38) and (2.41), the limits in (2.38) are
sometimes referred to as the prediction orforecast limits, whereas the limits given
in (2.41) are called the conjidence limits.
Suppose that we wish to predict the length of a service call in which four
components had to be repaired. If 54 denotes the predicted value, then from (2.36)
we get
$4 = 4.162 +15.509 x 4 = 66.20,
with a standard error that is obtained from (2.37) as
On the other hand, if the service department wishes to estimate the expected (mean)
service time for a call that needed four components repaired, we would use (2.39)
and (2.40), respectively. Denoting by p4, the expected service time for a call that
needed four components to be repaired, we have:
With these standard errors we can construct confidence intervals using (2.38) and
(2.41), as appropriate.
As can be seen from (2.37), the standard error of prediction increases the farther
the value of the predictor variable is from the center of the actual observations.
Care should be taken when predicting the value of Minutes corresponding to a
value for Units that does not lie close to the observed data. There are two dangers
in such predictions. First, there is substantial uncertainty due to the large standard
error. More important, the linear relationship that has been estimated may not hold
outside the range of observations. Therefore, care should be taken in employing
fitted regression lines for prediction far outside the range of observations. In our
example we would not use the fitted equation to predict the service time for a service
call which requires that 25 components be replaced or repaired. This value lies too
far outside the existing range of observations.
After fitting a linear model relating Y to X , we are interested not only in knowing
whether a linear relationship exits, but also in measuring the quality of the fit of the
model to the data. The quality of the fit can be assessed by one of the following
highly related (hence, somewhat redundant) ways:
1. When using the tests in (2.26) or (2.32), if HOis rejected, the magnitude of
the values of the test (or the corresponding p-values) gives us information
about the strength (not just the existence) of the linear relationship between
Y and X . Basically, the larger the t (in absolute value) or the smaller the
corresponding p-value, the stronger the linear relationship between Y and X .
These tests are objective but they require all the assumptions stated earlier,
specially the assumption of normality of the E ' S .
2. The strength of the linear relationship between Y and X can also be assessed
directly from the examination of the scatter plot of Y versus X together with
the corresponding value of the correlation coefficient Cor(Y, X ) in (2.6).
The closer the set of points to a straight line (the closer Cor(Y, X ) to 1 or
-l), the stronger the linear relationship between Y and X . This approach is
informal and subjective but it requires only the linearity assumption.
3. Examine the scatter plot of Y versus Y . The closer the set of points to a
straight line, the stronger the linear relationship between Y and X . One can
measure the strength of the linear relationship in this graph by computing the
40 SIMPLE LINEAR REGRESSION
Note that Cor(Y, Y )cannot be negative (why?), but Cor(Y, X ) can be positive
or negative (-1 6 Cor(Y, X ) 5 1). Therefore, in simple linear regression,
the scatter plot of Y versus Y is redundant. However, in multiple regression,
the scatter plot of Y versus Y is not redundant. The graph is very useful
because, as we shall see in Chapter 3, it is used to assess the strength of the
relationship between Y and the set of predictor variables X I ,X 2 , . . . ,X,.
4. Although scatter plots of Y versus Y and Cor(Y, Y ) are redundant in simple
linear regression, they give us an indication of the quality of the fit in both
simple and multiple regression. Furthermore, in both simple and multiple
regressions, Cor(Y, Y ) is related to another useful measure of the quality of
fit of the linear model to the observed data. This measure is developed as
follows. After we compute the least squares estimates of the parameters of a
linear model, let us compute the following quantities:
SSR = C(yi-
SSE C(Y2- Yd2,
where SST stands for the total sum of squared deviations in Y from its mean
3, SSR denotes the sum of squares due to regression, and SSE represents
the sum of squared residuals (errors). The quantities (Gi - g), ($i - g), and
(yi - @) are depicted in Figure 2.7 for a typical point (x2,yi). The line
yi = PO + j l x i is the fitted regression line based on all data points (not
shown on the graph) and the horizontal line is drawn at Y = g. Note that
for every point (xi,yi), there are two points, (52,&), which lies on the fitted
line, and (xi,g ) which lies on the line Y = jj.
A fundamental equality, in both simple and multiple regressions, is given by
Yi = yz + ( ~ -
i k)
Observed = Fit + Deviation from fit.
Yz - Y -
-
(Yz - ii) + (Yi - $2)
Deviation from mean = Deviation due to fit + Residual.
(2.46)
[Cor(Y, x)12
= [Cor(Y, Y)12= R ~ . (2.47)
Y = Po + p,x + E, (2.48)
a1 =
c yzxz (2.50)
(2.53)
where
(2.54)
Note that the degrees of freedom for SSE is n - 1, not n - 2, as is the case for a
model with an intercept.
Note that the residuals in (2.52) do not necessarily add up to zero as is the case
for a model with an intercept (see Exercise 2.1 l(c)). Also, the fundamental identity
in (2.45) is no longer true in general. For this reason, some quality measures for
models with an intercept such as R2 in (2.46), are no longer appropriate for models
with no-intercept. The appropriate identity for the case of models with no intercept
is obtained by replacing jj in (2.44) by zero. Hence, the fundamental identity
becomes
(2.55)
i=l i=l i=l
(2.56)
This is the appropriate form of R2 for models with no intercept. Note, however,
that the interpretations for the two formulas of R2 are different. In the case of
models with an intercept, R2 can be interpreted as the proportion of the variation in
Y that is accounted for by the predictor variable X after adjusting Y by its mean.
For models without an intercept, no adjustment of Y is made. For example, if we
fit (2.49) but use the formula for R2 in (2.46), it is possible for R2 to be negative in
44 SIMPLE LINEAR REGRESSION
some cases (see Exercise 2.1 l(d)). Therefore, the correct formula and the correct
interpretation should be used.
The formula for the t-test in (2.29) for testing HO : p1 = ,@ against the two-
sided alternative H1 : P1 # ,@, continues to hold but with the new definitions of
and s.e.(bl) in (2.50) and (2.53), respectively.
As we mentioned earlier, models with no intercept should be used whenever
they are consistent with the subject matter (domain) theory or other physical and
material considerations. In some applications, however, one may not be certain as
to which model should be used. In these cases, the choice between the models given
in (2.48) and (2.49) has to be made with care. First, the goodness of fit should be
judged by comparing the residual mean squares (6’) produced by the two models
because it measures the closeness of the observed and predicted values for the two
models. Second, one can fit model (2.48) to the data and use the t-test in (2.31)
to test the significance of the intercept. If the test is significant, then use (2.48),
otherwise use (2.49).
An excellent exposition of regression models through the origin is provided by
Eisenhauer (2003) who also alerts the users of regression models through the origin
to be careful when fitting these models using computer software programs because
some of them give incorrect and confusing results for the case of regression models
through the origin.
In this section we give two examples of trivial regression models, that is, regression
equations that have no regression coefficients. The first example arises when we
wish to test for the mean p of a single variable Y based on a random sample of
n observations y1, y2, . . ., yn. Here we have HO : p = 0 against HI : p # 0.
Assuming that Y is normally distributed with mean p and variance 0 2 ,the well-
known one-sample t-test
t = - j -j --0 - Y (2.57)
s.e.(jj) sy/fi ’
can be used to test Ho, where sy is sample standard deviation of Y.Alternatively,
the above hypotheses can be formulated as
Ho(Model 1) : Y = E against Hl(Mode2 2) : Y = PO + E, (2.58)
where Po = po. Thus, Model 1 indicates that p = 0 and Model 2 indicates that
p # 0. The least squares estimate of PO in Model 2 is 3, the ith fitted value is
yi = jj, and the ith residual is ei = yi - g. It follows then that an estimate of u2 is
(2.59)
(2.60)
EXERCISES
2.1 Using the data in Table 2.6:
Compute Var(Y)and V a r ( X ) .
n
Prove or verify that C (yi - jj) = 0.
i= 1
Prove or verify that any standardized variable has a mean of 0 and a
standard deviation of 1.
Prove or verify that the three formulas for Cor(Y, X ) in (2.3, (2.6), and
(2.7) are identical.
Prove or verify that the three formulas for ,& in (2.14) and (2.20) are
identical.