03a.session Notes On Multiple Linear Regression Analysis
03a.session Notes On Multiple Linear Regression Analysis
Let us look at the situation considered for the discussion. We have a consortium of US firms
that produce raw materials used in Singapore. They are interested in the following
1. Predicting the level of exports from US.
2. Understanding the relationship between US exports to Singapore and certain
variables affecting the economy of that country.
Let us question, what are the advantages by doing the above.
1. Understanding the relation will allow the consortium members to time their
marketing efforts to coincide with favourable conditions in the Singapore economy.
2. Understanding the relationship would also allow the exporters to determine
whether expansion of exports to Singapore is feasible.
3. Also, to identify the significant variables that acts as main drivers of the exports to
Singapore.
Variables considered in the study
US exports to Singapore in billions of Singapore Dollars (the dependent variable,
Exports),
money supply figures in billions of Singapore dollars (variable M1),
minimum Singapore bank lending rate in percentages (variable Lend),
an index of local prices where the base year is 1974 (variable Price),
the exchange rate of Singapore dollars per U.S. dollar (variable Exchange)
Now, why regression should be used as a method for analysing this data. Taking into
consideration the objectives, the appropriate method is regression. Regression gives one an
opportunity to
1. Measure the level of changes in the exports with the change in the levels of other
drivers considered.
2. To test the significance of each driver or variable that contributes to the change in
exports.
3. To help the US consortium to find the favourable conditions
4. To build a model that connects the exports and significant drivers of the exports and
make predictions.
Assumptions associated with the linear regression analysis
1. The response and the regressor variables are linearly related
2. On average residual is zero
3. All residuals have constant variance
4. All residuals are uncorrelated
5. Residuals are normally distributed
6. All regressors are independent
Discussion on R codes
In order to adopt R as a tool for running the regression analysis, we need to install few
packages available in R. These packages are developed by researchers and comes with
various built-in functions that are used to run the analysis. For running regression analysis in
R, we install the following packages
car-Companion to Applied regression analysis
https://www.rdocumentation.org/packages/car/versions/3.0-8
https://www.rdocumentation.org/packages/psych/versions/1.9.12.31
https://www.rdocumentation.org/packages/Hmisc/versions/4.4-0
https://www.rdocumentation.org/packages/lmtest/versions/0.9-37
R-codes
setwd("F:/07.PGDM 2020/03.DAR/09.R-Codes") # This is used to set the working directory
getwd() # used for getting the working directory used
install.packages("readxl") # Used to install the package for importing the excel files to R
library(readxl) # Used to call the package readxl
install.packages("psych")
library(psych)
install.packages("Hmisc")
library(Hmisc)
install.packages("lmtest")
library(lmtest)
install.packages("lm.beta")
library(lm.beta)
install.packages("car")
library(car)
exports=read_excel(file.choose()) # Import the excel file named as exports
attach(exports) # Attach the file
fix(exports) # Open the data file in the R editor
View(exports) # Open the data file to view the data
#Summary Statistics
summary(exports) # Before building the model, it is very important to understand the
variables better. For this, one can obtain the summary statistics. One has to describe each
variable using the summary statistics like mean, median, mode, quartiles etc.
Exports M1 Lend Price
Min. :2.600 Min. :4.900 Min. : 7.80 Min. :114.0
1st Qu.:4.200 1st Qu.:6.000 1st Qu.: 9.00 1st Qu.:146.0
Median :4.800 Median :7.000 Median :10.00 Median :151.0
Mean :4.528 Mean :6.909 Mean :10.52 Mean :147.3
3rd Qu.:5.100 3rd Qu.:8.100 3rd Qu.:11.60 3rd Qu.:154.0
Max. :5.600 Max. :8.800 Max. :15.00 Max. :162.0
Exchange
Min. :2.040
1st Qu.:2.100
Median :2.130
Mean :2.133
3rd Qu.:2.160
Max. :2.240
#scatter plots
pairs(~exports$Exports+exports$M1+exports$Lend+exports$Price+exports$Exchange)
# This is used to get the scatter plots for all the variables considered in the study
#Building the model
exp_lm=lm(Exports~M1+Lend+Price+Exchange, exports) # “lm” means “linear model” and is
used to build the model. The symbol ~ is used to link the response (dependent variable) and
the regressor variables (independent variables). All the regressor variables are included in
the code using “+” sign.
exp_lm # This gives the coefficient values of the model.
Coefficients:
(Intercept) M1 Lend Price
-4.015461 0.368456 0.004702 0.036511
Exchange
0.267896
data: exp_lm2
BP = 3.0888, df = 2, p-value = 0.2134
data: exp_lm2$residuals
W = 0.96227, p-value = 0.03998
2.5 % 97.5 %
(Intercept) -4.50343606 -2.34247841
M1 0.28301385 0.43982079
Price 0.02885435 0.04521092