0% found this document useful (0 votes)
229 views

Simple Linear Regression

This document provides an overview of simple linear regression analysis. It defines key concepts such as the dependent and independent variables, the regression line, and regression coefficients. It explains how to estimate the regression line using the least squares method and how to calculate predictions and residuals. The assumptions of the linear regression model are stated. Key steps in simple linear regression including developing the model, interpreting coefficients, checking assumptions, and using the model for prediction, description and control are summarized.

Uploaded by

Pooja Garg
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
229 views

Simple Linear Regression

This document provides an overview of simple linear regression analysis. It defines key concepts such as the dependent and independent variables, the regression line, and regression coefficients. It explains how to estimate the regression line using the least squares method and how to calculate predictions and residuals. The assumptions of the linear regression model are stated. Key steps in simple linear regression including developing the model, interpreting coefficients, checking assumptions, and using the model for prediction, description and control are summarized.

Uploaded by

Pooja Garg
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 95

Applied Business Forecasting

and Planning
Simple Linear Regression

Simple Regression
Simple regression analysis is a statistical tool That
gives us the ability to estimate the mathematical
relationship between a dependent variable (usually
called y) and an independent variable (usually
called x).
The dependent variable is the variable for which
we want to make a prediction.
While various non-linear forms may be used,
simple linear regression models are the most
common.
Introduction
The primary goal of
quantitative analysis is to use
current information about a
phenomenon to predict its
future behavior.
Current information is usually
in the form of a set of data.
In a simple case, when the data
form a set of pairs of numbers,
we may interpret them as
representing the observed
values of an independent (or
predictor ) variable X and a
dependent ( or response)
variable Y.

lot size Man-hours
30 73
20 50
60 128
80 170
40 87
50 108
60 135
30 69
70 148
60 132
Introduction
The goal of the analyst
who studies the data is to
find a functional relation

between the response
variable y and the
predictor variable x.
) (x f y =
Statistical relation between Lot size and Man-Hour
0
20
40
60
80
100
120
140
160
180
0 10 20 30 40 50 60 70 80 90
Lot size
M
a
n
-
H
o
u
r
Regression Function
The statement that the relation
between X and Y is statistical
should be interpreted as providing
the following guidelines:
1. Regard Y as a random variable.
2. For each X, take f (x) to be the
expected value (i.e., mean value) of
y.
3. Given that E (Y) denotes the
expected value of Y, call the
equation

the regression function.


) ( ) ( x f Y E =
Pictorial Presentation of Linear Regression
Model
Historical Origin of Regression
Regression Analysis was
first developed by Sir
Francis Galton, who
studied the relation
between heights of sons
and fathers.
Heights of sons of both
tall and short fathers
appeared to revert or
regress to the mean of
the group.

Construction of Regression Models
Selection of independent variables
Since reality must be reduced to manageable proportions
whenever we construct models, only a limited number of
independent or predictor variables can or should be included in a
regression model. Therefore a central problem is that of
choosing the most important predictor variables.
Functional form of regression relation
Sometimes, relevant theory may indicate the appropriate
functional form. More frequently, however, the functional form
is not known in advance and must be decided once the data have
been collected and analyzed.
Scope of model
In formulating a regression model, we usually need to restrict
the coverage of model to some interval or region of values of the
independent variables.
Uses of Regression Analysis
Regression analysis serves Three major
purposes.
1. Description
2. Control
3. Prediction
The several purposes of regression analysis
frequently overlap in practice

Formal Statement of the Model
General regression model

1. |
0
, and |
1
are parameters
2. X is a known constant
3. Deviations c are independent N(o, o
2
)

c | | + + = X Y
1 0
Meaning of Regression Coefficients
The values of the regression parameters |
0
,
and |
1
are not known.We estimate them
from data.
|
1
indicates the change in the mean
response per unit increase in X.
Regression Line
If the scatter plot of our sample data
suggests a linear relationship between two
variables i.e.

we can summarize the relationship by
drawing a straight line on the plot.
Least squares method give us the best
estimated line for our set of sample data.
x y
1 0
| | + =
Regression Line
We will write an estimated regression line
based on sample data as

The method of least squares chooses the
values for b
0
, and b
1
to minimize the sum of
squared errors

x b b y
1 0
+ =
( )
2
1
1 0
1
2
) (

= =
= =
n
i
n
i
i i
x b b y y y SSE
Regression Line
Using calculus, we obtain estimating
formulas:

or





= =
= = =
=
=


=
n
i
n
i
i i
n
i
n
i
n
i
i i i i
n
i
i
n
i
i i
x x n
y x y x n
x x
y y x x
b
1 1
2 2
1 1 1
1
2
1
1
) ( ) (
) )( (
x b y b
1 0
=
x
y
S
S
r b =
1
Estimation of Mean Response
Fitted regression line can be used to estimate the
mean value of y for a given value of x.
Example
The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.


y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
Point Estimation of Mean Response
From previous table we have:


The least squares estimates of the regression
coefficients are:





= =
= = =
818755 14365
32604 564 10
2
xy y
x x n
8 . 10
) 564 ( ) 32604 ( 10
) 14365 )( 564 ( ) 818755 ( 10
) (
2 2 2
1
=

=


x x n
y x xy n
b
828 ) 4 . 56 ( 8 . 10 5 . 1436
0
= = b
Point Estimation of Mean Response
The estimated regression function is:


This means that if the weekly advertising
expenditure is increased by $1 we would expect
the weekly sales to increase by $10.8.

e Expenditur 8 . 10 828 Sales
10.8x 828 y
+ =
+ =
Point Estimation of Mean Response
Fitted values for the sample data are
obtained by substituting the x value into the
estimated regression function.
For example if the advertising expenditure
is $50, then the estimated Sales is:

This is called the point estimate (forecast)
of the mean response (sales).


1368 ) 50 ( 8 . 10 828 = + = Sales
Example:Retail sales and floor space
It is customary in retail operations to asses the
performance of stores partly in terms of their
annual sales relative to their floor area (square
feet). We might expect sales to increase linearly as
stores get larger, with of course individual
variation among stores of the same size. The
regression model for a population of stores says
that
SALES = |
0
+ |
1
AREA + c
Example:Retail sales and floor space
The slope |
1
is as usual a rate of change: it is the
expected increase in annual sales associated with
each additional square foot of floor space.
The intercept |
0
is needed to describe the line but
has no statistical importance because no stores
have area close to zero.
Floor space does not completely determine sales.
The term c in the model accounts for difference
among individual stores with the same floor space.
A stores location, for example, is important.
Residual
The difference between the observed value
y
i
and the corresponding fitted value .

Residuals are highly useful for studying
whether a given regression model is
appropriate for the data at hand.


i i i
y y e =
i
y

Example: weekly advertising expenditure


y x y-hat Residual (e)
1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
Estimation of the variance of the error
terms, o
2
The variance o
2
of the error terms c
i
in the
regression model needs to be estimated for a
variety of purposes.
It gives an indication of the variability of the
probability distributions of y.
It is needed for making inference concerning
regression function and the prediction of y.

Regression Standard Error
To estimate o we work with the variance and take
the square root to obtain the standard deviation.
For simple linear regression the estimate of o
2
is
the average squared residual.


To estimate o , use
s estimates the standard deviation o of the error
term c in the statistical model for simple linear
regression.



=
2 2
2
.
) (
2
1
2
1
i i i x y
y y
n
e
n
s
2
. . x y x y
s s =
Regression Standard Error
y x y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04
y-hat = 828+10.8X total 36124.76
S
y .x
67.19818
Basic Assumptions of a Regression Model
A regression model is based on the following
assumptions:
1. There is a probability distribution of Y for each
level of X.
2. Given that
y
is the mean value of Y, the
standard form of the model is

where c is a random variable with a normal
distribution with mean 0 and standard deviation o.

c + = ) ( x f
y
Conditions for Regression Inference
You can fit a least-squares line to any set of
explanatory-response data when both variables are
quantitative.
If the scatter plot doesnt show an approximately
linear pattern, the fitted line may be almost
useless.
Conditions for Regression Inference
The simple linear regression model, which
is the basis for inference, imposes several
conditions.
We should verify these conditions before
proceeding with inference.
The conditions concern the population, but
we can observe only our sample.

Conditions for Regression Inference
In doing Inference, we assume:
1. The sample is an SRS from the population.
2. There is a linear relationship in the population.
1. We can not observe the population , so we check
the scatter plot of the sample data.
3. The standard deviation of the responses about the
population line is the same for all values of the
explanatory variable.
1. The spread of observations above and below the
least-squares line should be roughly uniform as x
varies.
Conditions for Regression Inference
Plotting the residuals against the
explanatory variable is helpful in checking
these conditions because a residual plot
magnifies patterns.
Analysis of Residual
To examine whether the regression model is
appropriate for the data being analyzed, we can
check the residual plots.
Residual plots are:
Plot a histogram of the residuals
Plot residuals against the fitted values.
Plot residuals against the independent variable.
Plot residuals over time if the data are chronological.
Analysis of Residual
A histogram of the residuals provides a check on
the normality assumption. A Normal quantile plot
of the residuals can also be used to check the
Normality assumptions.
Regression Inference is robust against moderate
lack of Normality. On the other hand, outliers and
influential observations can invalidate the results
of inference for regression
Plot of residuals against fitted values or the
independent variable can be used to check the
assumption of constant variance and the aptness
of the model.
Analysis of Residual
Plot of residuals against time provides a
check on the independence of the error
terms assumption.
Assumption of independence is the most
critical one.

Residual plots
The residuals should
have no systematic
pattern.
The residual plot to
right shows a scatter
of the points with no
individual
observations or
systematic change as x
increases.
Degree Days Residual Plot
-1
-0.5
0
0.5
1
0 20 40 60
Degree Days
R
e
s
i
d
u
a
l
s
Residual plots
The points in this
residual plot have a
curve pattern, so a
straight line fits poorly
Residual plots
The points in this plot
show more spread for
larger values of the
explanatory variable x,
so prediction will be
less accurate when x is
large.
Variable transformations
If the residual plot suggests that the variance is not
constant, a transformation can be used to stabilize
the variance.
If the residual plot suggests a non linear
relationship between x and y, a transformation
may reduce it to one that is approximately linear.
Common linearizing transformations are:

Variance stabilizing transformations are:


) log( ,
1
x
x
2
, ), log( ,
1
y y y
y
Inference about the Regression Model
When a scatter plot shows a linear
relationship between a quantitative
explanatory variable x and a quantitative
response variable y, we can use the least
square line fitted to the data to predict y for
a give value of x.
Now we want to do tests and confidence
intervals in this setting.
Inference about the Regression Model
We think of the least square line we
calculated from a sample as an estimate of a
regression line for the population.
Just as the sample mean is an estimate of the
population mean .
x
Inference about the Regression Model
We will write the population regression line as

The numbers and are parameters that describe the
population.
We will write the least-squares line fitted to
sample data as
This notation reminds us that the intercept b
0
of the
fitted line estimates the intercept |
0
of the population
line, and the slope b
1
estimates the slope |
1
.
x
1 0
| | +
0
|
1
|
x b b
1 0
+
Confidence Intervals and Significance
Tests
In our previous lectures we presented confidence intervals
and significance tests for means and differences in
means.In each case, inference rested on the standard error s
of the estimates and on t or z distributions.
Inference for the slope and intercept in linear regression is
similar in principal, although the recipes are more
complicated.
All confidence intervals, for example , have the form
estimate t* Se
estimate
t* is a critical value of a t distribution.
Confidence Intervals and Significance
Tests
Confidence intervals and tests for the slope and
intercept are based on the sampling distributions
of the estimates b
1
and b
0
.
Here are the facts:
If the simple linear regression model is true, each of b
0

and b
1
has a Normal distribution.
The mean of b
0
is |
0
and the mean of b
1
is |
1
.
That is, the intercept and slope of the fitted line are unbiased
estimators of the intercept and slope of the population
regression line.


Confidence Intervals and Significance Tests
The standard deviations of b
0
and b
1
are multiples of the
model standard deviation o.





= =
2
1
) (
) (
1
x x
s
b S SE
b

=

+ = =
n
i
i
b
x x
x
n
s b S SE
1
2
2
0
) (
1
) (
0
Confidence Intervals and Significance
Tests
Example:Weekly Advertising Expenditure
Let us return to the Weekly advertising
expenditure and weekly sales example.
Management is interested in testing whether
or not there is a linear association between
advertising expenditure and weekly sales,
using regression model. Use o = .05
Example:Weekly Advertising Expenditure
Hypothesis:

Decision Rule:
Reject H
0
if
or






0 :
0 :
1
1 0
=
=
|
|
a
H
H
306 . 2
8 ; 025 .
> > t t t
306 . 2
8 ; 025 .
< < t t t
Example:Weekly Advertising Expenditure
Test statistic:

) (
1
1
b S
b
t =
38 . 2
4 . 794
2 . 67
) (
) (
2
.
1
= =

x x
S
b S
x y


8 . 10
1
= b
5 . 4
38 . 2
8 . 10
= = t
Example:Weekly Advertising Expenditure
Conclusion:
Since t =4.5 > 2.306 then we reject H
0
.
There is a linear association between
advertising expenditure and weekly sales.
Confidence interval for |
1


Now that our test showed that there is a
linear association between advertising
expenditure and weekly sales, the
management wishes an estimate of |
1
with a
95% confidence coefficient.
)) ( (
1
) 2 ;
2
(
1
b S t b
n

o
Confidence interval for |
1
For a 95 percent confidence coefficient, we
require t (.025; 8). From table B in appendix
III, we find t(.025; 8) = 2.306.
The 95% confidence interval is:

) 3 . 16 , 31 . 5 ( 49 . 5 8 . 10
) 38 . 2 ( 306 . 2 8 . 10
)) ( (
1
) 2 ;
2
(
1


b S t b
n
o
Example: Do wages rise with experience?
Many factors affect the wages of workers: the industry
they work in, their type of job, their education and their
experience, and changes in general levels of wages. We
will look at a sample of 59 married women who hold
customer service jobs in Indiana banks. The following
table gives their weekly wages at a specific point in time
also their length of service with their employer, in month.
The size of the place of work is recorded simply as large
(100 or more workers) or small. Because industry, job
type, and the time of measurement are the same for all 59
subjects, we expect to see a clear relationship between
wages and length of service.
Example: Do wages rise with experience?
Example: Do wages rise with experience?
Example: Do wages rise with experience?
From previous table we have:


The least squares estimates of the regression
coefficients are:





= = =
= = =
1719376 9460467 23069
451031 4159 59
2
2
xy y y
x x n
= =
x
y
s
s
r b
1
= = x b y b
0
Example: Do wages rise with experience?
What is the least-squares regression line for
predicting Wages from Los?
Suppose a woman has been with her bank for 125
months. What do you predict she will earn?
If her actual wages are $433, then what is her
residual?
The sum of squared residuals for the entire sample
is
641 . 385453 ) (
59
1
2
=

= i
i i
y y
Example: Do wages rise with experience?
Do wages rise with experience?
The hypotheses are:
H
0
: |
1
= 0, H
a
: |
1
> 0
The test statistics


The P- value is:

Conclusion:


= =
1
1
b
SE
b
t
Example: Do wages rise with experience?
A 95% confidence interval for the average
increase in wages per month of stay for the
regression line in the population of all married
female customer service workers in Indiana bank
is


The t distribution for this problem has n-2 = 57
degrees of freedom

=
1
*
1 b
SE t b
Example: Do wages rise with experience?
Regression calculations in Practice are
always done by software.
The computer out put for the case study is
given in the following slide.
Example: Do wages rise with experience?
Using the regression Line
One of the most common reasons to fit a
line to data is to predict the response to a
particular value of the explanatory variable.
In our example, the least square line for
predicting the weekly earnings for female
bank customer service workers from their
length of service is
x y 5905 . 4 . 349 + =
Using the regression Line
For a length of service of 125 months, our least-
squares regression equation gives

There are two different uses of this prediction.
We can estimate the mean earnings of all workers in the
subpopulation of workers with 125 months on the job.
We can predict the earnings of one individual worker
with 125 months of service.

per week 423 $ ) 125 )( 5905 (. 4 . 349 = + = y
Using the regression Line
For each use, the actual prediction is the same,
.But the margin of error is different for the two
cases.
To estimate the mean response, we use a confidence
interval.
To estimate an individual response y, we use
prediction interval.
A prediction interval estimates a single random response y
rather than a parameter like
y


423 $ = y
*
1 0
x
y
| | + =
Using the regression Line
The main distinction is that it is harder to
predict for an individual than for the mean
of a population of individuals.
Each interval has the usual form

The margin of error for the prediction interval
is wider than the margin of error for the
confidence interval.
SE t y *

=
Using the regression Line
The standard error for estimating the mean
response when the explanatory variable x
takes the value x* is:

Using the regression Line
The standard error for predicting an
individual response when the explanatory
variable x takes the value x* is:

Prediction of a new response ( )
We now consider the prediction of a new
observation y corresponding to a given level
x of the independent variable.
In our advertising expenditure and weekly
sales, the management wishes to predict the
weekly sales corresponding to the
advertising expenditure of x = $50.
y

Interval Estimation of a new response ( )


The following formula gives us the point estimator
(forecast) for y.

1-o % prediction interval for a new observation
is:

Where


x b b y
1 0

+ =
y

) (

) 2 ;
2
(
f
n
S t y

+ + =
2
2
.
) (
) ( 1
1
x x
x x
n
S S
x y f
Example
In our advertising expenditure and weekly sales,
the management wishes to predict the weekly sales
if the advertising expenditure is $50 with a 90 %
prediction interval.





We require t(.05; 8) = 1.860


1368 ) 50 ( 8 . 10 828 = + = y
11 . 72
4 . 794
) 4 . 56 50 (
10
1
1 2 . 67
) (
) ( 1
1
2
2
2
.
=

+ + =

+ + =

f
x y f
S
x x
x x
n
S S
Example
The 90% prediction interval is:

) 1 . 1502 , 9 . 1233 (
) 11 . 72 ( 860 . 1 1368
) (

) 8 ; 05 (.

f
S t y
Analysis of variance approach to Regression
analysis
Analysis of Variance is the term for statistical
analyses that break down the variation in data into
separate pieces that correspond to different
sources of variation.
It is based on the partitioning of sums of squares
and degrees of freedom associated with the
response variable.
In the regression setting, the observed variation in
the responses (y
i
) comes from two sources.

Analysis of variance approach to Regression
analysis
Consider the weekly advertising
expenditure and the weekly sales example.
There is variation in the amount ($) of
weekly sales, as in all statistical data. The
variation of the y
i
is conventionally
measured in terms of the deviations:

y y
i

Analysis of variance approach to Regression
analysis
The measure of total variation, denoted by SST, is the sum
of the squared deviations:

If SST = 0, all observations are the same(No variability).
The greater is SST, the greater is the variation among the y
values.
When we use the regression model, the measure of
variation is that of the y observations variability around the
fitted line:

=
2
) ( y y SST
i
i i
y y
Analysis of variance approach to Regression
analysis
The measure of variation in the data around the
fitted regression line is the sum of squared
deviations (error), denoted SSE:

For our Weekly expenditure example
SSE = 36124.76
SST = 128552.5
What accounts for the substantial difference
between these two sums of squares?

=
2
) (
i i
y y SSE
Analysis of variance approach to Regression
analysis
The difference is another sum of squares:

SSR stands for regression sum of squares.
SSR is the variation among the predicted
responses . The predicted responses lie on the
least-square line. They show how y moves in
response to x.
The larger is SSR relative to SST, the greater is
the role of regression line in explaining the total
variability in y observations.

=
2
) ( y y SSR
i
i
y
Analysis of variance approach to Regression
analysis
In our example:

This indicates that most of variability in
weekly sales can be explained by the
relation between the weekly advertising
expenditure and the weekly sales.

74 . 92427 76 . 36124 5 . 128552 = = = SSE SST SSR
Formal Development of the Partitioning
We can decompose the total variability in the
observations y
i
as follows:

The total deviation can be viewed as the
sum of two components:
The deviation of the fitted value around the mean
.
The deviation of y
i
around the fitted regression line.

i i i i
y y y y y y

+ =
i
y

y
y y
i

Formal Development of the Partitioning
Skipping quite a bit of messy algebra, we
just state that this analysis of variance
equation always holds:

Breakdown of degree of freedom:


2 2 2
) ( ) ( ) (
i i i i
y y y y y y + =

) 2 ( 1 1 + = n n
Mean squares
A sum of squares divided by its degrees of
freedom is called a mean square (MS)
Regression mean square (MSR)


Error mean square (MSE)


Note: mean squares are not additive.


1
SSR
MSR=
2
=
n
SSE
MSE
Mean squares
In our example:





74 . 92427
1
74 . 92427
1
= = =
SSR
MSR
6 . 4515
8
76 . 36124
2
= =

=
n
SSE
MSE
Analysis of Variance Table
The breakdowns of the total sum of squares
and associated degrees of freedom are
displayed in a table called analysis of
variance table (ANOVA table)


Source of
Variation
SS df MS F-Test
Regression SSR 1 MSR
=SSR/1
MSR/MSE
Error SSE n-2 MSE
=SSE/(n-2)
Total SST n-1
Analysis of Variance Table
In our weekly advertising expenditure and
weekly sales example the ANOVA table is:

Source of
variation
SS df MS
Regression 92427.74 1 92427.74
Error 36124.76 8 4515.6
Total 128552.5 9
Analysis of Variance Table
The Analysis of Variance table reports in a
different way quantities such as r2 and s
that are needed in regression analysis.
It also reports in a different way the test for
the overall significance of the regression.
If regression on x has no value for
predicting y, we expect the slope of the
population regression line to be close to 0.
Analysis of Variance Table
That is the null hypothesis of no linear
relationship is:

We standardize the slope of the least-
squares line to get a t statistic.

0 :
1 0
= | H
F-Test for |
1
= 0 versus |
1
= 0
The analysis of variance approach starts with sums
of squares.
If regression on x has no value for predicting y,
we expect the SSR to be only a small part of the
SST, most of which will be made of the SSE.
The proper way to standardize this comparison is
to use the ratio



MSE
MSR
F =
F-Test for |
1
= 0 versus |
1
= 0
In order to be able to construct a statistical
decision rule, we need to know the
distribution of our test statistic F.
When H
0
is true, our test statistic, F, follows
the F- distribution with 1, and n-2 degrees
of freedom.
Table C-5 on page 513 of your text gives
the critical values of the F-distribution at o
= 0.05 and .01.

F-Test for |
1
= 0 versus |
1
= 0
Construction of decision rule:
At o = 5% level
Reject H
0
if

Large values of F support H
a
and Values of
F near 1 support H
0
.
) 2 , 1 ; ( > n F F o
F-Test for |
1
= 0 versus |
1
= 0
Using our example again, let us repeat the earlier
test on |
1
. This time we will use the F-test. The
null and alternative hypothesis are:


Let o = .05. Since n=10, we require F(.05; 1, 8).
From table 5-3 we find that F(.05; 1, 8) = 5.32.
Therefore the decision rule is:
Reject H
0
if:



0 :
0 :
1
1 0
=
=
|
|
a
H
H
32 . 5 > F
F-Test for |
1
= 0 versus |
1
= 0
From ANOVA table we have
MSR = 92427.74
MSE = 4515.6
Our test statistic F is:

Decision:
Since 20.47> 5.32, we reject H
0
, that is there is a linear
association between weekly advertising expenditure
and weekly sales.

47 . 20
6 . 4515
74 . 92427
= = F
F-Test for |
1
= 0 versus |
1
= 0
Equivalence of F Test and t Test:
For given o level, the F test of |
1
= 0 versus
|
1
= 0 is equivalent algebraically to the two sided t-
test.
Thus, at a given level, we can use either the t-test
or the F-test for testing |
1
= 0 versus
|
1
= 0.
The t-test is more flexible since it can be used for
one sided test as well.
Analysis of Variance Table
The complete ANOVA table for our
example is:

Source of
Variation
SS df MS F-Test
Regression 92427.74 1 92427.74 20.47
Error 36124.76 8 4515.6
Total 128552.5 9
Computer Output
The EXCEL out put for our example is:

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.847950033
R Square 0.719019259
Adjusted R Square 0.683896667
Standard Error 67.19447214
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 92431.72331 92431.72 20.4717 0.0019382
Residual 8 36120.77669 4515.097
Total 9 128552.5
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 828.1268882 136.1285978 6.083416 0.000295 514.2135758 1142.0402
AD-Expen (X) 10.7867573 2.384042146 4.524567 0.001938 5.289142698 16.2843719
Coefficient of Determination
Recall that SST measures the total variations in y
i

when no account of the independent variable x is
taken.
SSE measures the variation in the y
i
when a
regression model with the independent variable x
is used.
A natural measure of the effect of x in reducing
the variation in y can be defined as:

SST
SSE
SST
SSR
SST
SSE SST
R = =

= 1
2
Coefficient of Determination
R
2
is called the coefficient of determination.
0 s SSE s SST, it follows that:

We may interpret R
2
as the proportionate
reduction of total variability in y associated with
the use of the independent variable x.
The larger is R
2
, the more is the total variation of y
reduced by including the variable x in the model.

1 0
2
s s R
Coefficient of Determination
If all the observations fall on the fitted regression
line, SSE = 0 and R
2
= 1.
If the slope of the fitted regression line
b
1
= 0 so that , SSE=SST and R
2
= 0.
The closer R
2
is to 1, the greater is said to be the
degree of linear association between x and y.
The square root of R
2
is called the coefficient of
correlation.

y y
i
=
2
R r =
Correlation Coefficient
Recall that the algebraic expression for the
correlation coefficient is.



=


=
2 2 2 2
2 2
) ( ) (
) ( ) (
) )( (
y y n x x n
y x xy n
r
y y x x
y y x x
r

You might also like