Unit 4 Multiple Linear Regression
Unit 4 Multiple Linear Regression
Contents
4.0 Aims and Objectives
4.1 Introduction
4.2 Two Independent Variables
4.3 Estimating Multiple Regression Model with two Explanatory Variables
4.4 Partitioning the Total Variability in the Dependent Variable Y.
4.5 The Matrix Approach to Linear Regression (k = 2)
4.6 The Variance and Covariance of the Regression coefficients
4.7 Confidence Intervals and Tests of Hypothesis concerning the Regression
Coefficients
4.8 Multiple and Partial Correlation
4.9 Summary
4.10 Glossary
4.12 Answers to Check Your Progress Problems
4.12 Model Examination
4.13 Recommended Books
83
4.1 INTRODUCTION
In the course introduction to statistics, we studied regression and correlation when two
variables are under study simultaneously. But scientific, social and economic phenomena
do not confine to two variables. In these studies, we often need to give actual relationship
between them. For this, multivariate regression and correlation are strong tools. For
instance, the cost of production of a manufactured product mainly depends on the cost of
raw material, the labor changes and the cost of energy. The cost of a crop mainly depends
upon the cost of seeds, fertilizer, irrigation, pesticides and many farm operations. In both
the examples, the cost of the produced product is a dependent factor, while others are
independent factors. If we want to establish the relationship between the dependent
variable and the independent variables, a mathematical equation can be given to do this.
This type of mathematical equation is known as a mathematical model. The equation
pertaining to such a relationship may be of any type. But we will only deal with a linear
relationship which represents a plane or a hyper-plane according to the number of
variables involved.
First, we will discuss the mathematical model before giving its fitting. Fitting of a
regression equation means, the estimation of parameters involved in the model. A
mathematical model with dependent variable y and two independents variables x1 and x2
is given by:
Y = 0 + 1 X1 + 2X2 + e --- (4.1)
This type of regression equation is also known as multiple regression equation or the
prediction equation, with Y as predictant and X1 and X2 as predictors. e is the error which
is distributed normally with mean 0 and variance 2, i.e., e N (0, 2). To fit the equation
(4.1), we have to estimate the parameters 0, 1 and 2 on the basis of n sample
observations in which each observation (2 + 1) topple. n composite sample observations
can be presented in the following format.
84
Table 4.1 presentation of sample observation
Composite Variables
observation number Y X1 X2
1 y1 x11 x21
2 y2 x12 x22
3 y3 x13 x23
. . . .
. . . .
. . . .
yn x1n x2n
Total yii
x i
1i x i
2i
Making use of the data given in table (4.1) we want to fit the regression equation (4.1). It
means that we want to estimate the parameters 0, 1, and 2. Let the estimated values of
the parameters be b0, b1 and b2 respectively. These estimates should be such that the error
e is minimum, preferably zero. Hence to achieve this objective, we adopt the method of
least squares for estimation of partial regression coefficients 1, 2 and the intercept 0.
The advantage of least square method is, that these estimates are unbiased. Before giving
the estimates, we define partial regression coefficient j which is the coefficient of xj (j =
1, 2) in equation (4.1).
85