Reg 01
Reg 01
Introduction
Tatek () Math4319 1 / 17
Outline
2 Data Collection
3 Uses of Regression
Tatek () Math4319 2 / 17
Regression and Model Building
Tatek () Math4319 4 / 17
the difference between the observed value of y and the straight line
(β0 + β1 x) be an error .
It is convenient to think of as a statistical error; that is, it is a
random variable that accounts for the failure of the model to fit the
data exactly.
The error may be made up of the effects of other variables on delivery
time, measurement errors, and so forth.
Thus, a more plausible model for the delivery time data is
y = β0 + β1 x +
The above Equation is called a linear regression model.
x is called the independent variable and y is called the dependent
variable.
we refer to x as the predictor or regressor variable and y as the
response variable.
The Equation involves only one regressor variable, it is called a
simple linear regression model.
Tatek () Math4319 5 / 17
We assume x is fixed, the random component on the right-hand side
of Eq. determines the properties of y.
Suppose that the mean and variance of are 0 and σ 2 , respectively.
Then the mean response at any value of the regressor variable is
E (y /x) = µy /x = E [β0 + β1 x + ] = β0 + β1 x
Tatek () Math4319 7 / 17
These functional relationships are often based on physical, chemical,
or other engineering or scientific theory, that is, knowledge of the
underlying mechanism.
these types of models are often called mechanistic models.
Regression models, are thought of as empirical models.
Figure 1.3 illustrates a situation where the true relationship between y
and x is relatively complex, yet it may be approximated quite well by
a linear regression equation.
Sometimes the underlying mechanism is more complex, resulting in
the need for a more complex approximating function,
in Figure 1.4, where a ”piecewise linear” regression function is used to
approximate the true relationship between y and x.
Generally regression equations are valid only over the region of the
regressor variables contained in the observed data.
Tatek () Math4319 8 / 17
For example, consider Figure 1.5. Suppose that data on y and x were
collected in the interval x1 ≤ x ≤ x2 .
Over this interval the linear regression equation shown in Figure 1.5 is
a good approximation of the true relationship.
However, suppose this equation were used to predict values of y for
values of the regressor variable in the region x2 ≤ x ≤ x3 .
Clearly the linear regression model is not going to perform well over
this range of x because of model error or equation error.
Tatek () Math4319 9 / 17
In general, the response variable y may be related to k regressors,
x1 , x2 , . . . , xk , so that
y = β0 + β1 x1 + β2 x2 + · + βk xk +
Tatek () Math4319 10 / 17
The adjective linear is employed to indicate that the model is linear in
the parameters β0 , β1 , . . . , βk , not because y is a linear function of
the x’s.
An important objective of regression analysis is to estimate the
unknown parameters in the regression model.
This process is also called fitting the model to the data.
We study several parameter estimation techniques in this book. One
of these techniques is the method of least squares (introduced in
Chapter 2 ).
The next phase of a regression analysis is called model adequacy
checking, in which the appropriateness of the model is studied and
the quality of the fit ascertained.
Through such analyses the usefulness of the regression model may be
determined.
The outcome of adequacy checking may indicate either that the
model is reasonable or that the original fit must be modified.
Thus, regression analysis is an iterative procedure, in which data lead
to a model and a fit of the model to the data is produced.
Tatek () Math4319 11 / 17
The quality of the fit is then investigated, leading either to
modification of the model or the fit or to adoption of the model.
A regression model does not imply a cause - and - effect relationship
between the variables.
Finally it is important to remember that regression analysis is part of
a broader data - analytic approach to problem solving.
That is, the regression equation itself may not be the primary
objective of the study.
It is usually more important to gain insight and understanding
concerning the system generating the data.
Tatek () Math4319 12 / 17
Data Collection
Data Collection
Tatek () Math4319 13 / 17
Data Collection
Tatek () Math4319 14 / 17
Data Collection
Tatek () Math4319 15 / 17
Uses of Regression
Uses of Regression
Regression models are used for several purposes, including the
following:
1. Data description
2. Parameter estimation
3. Prediction and estimation
4. Control
Regression analysis is helpful in developing use equations to
summarize or describe a set of data.
regression model would probably be a much more convenient and
useful summary of those data than a table or even a graph.
Sometimes parameter estimation problems can be solved by regression
methods
Many applications of regression involve prediction of the response
variable.
For example, we may wish to predict delivery time for a specified
number of cases of soft drinks to be delivered.
Regression models may be used for control purposes.
Tatek () Math4319 16 / 17
Role of the Computer
Tatek () Math4319 17 / 17