Lecture 6 - Regression Analysis
Lecture 6 - Regression Analysis
Business Statistics I
Regression Analysis
Facilitator: Makurunge, S.
Email: [email protected]
Introduction
• Regression analysis is the branch of statistical theory that is
widely used in almost all the scientific disciplines.
• In statistics, Regression analysis consists of techniques for
modeling the relationship between a dependent variable and
one or more independent variables.
• Regression is a statistical method used in finance, investing and
other disciplines that attempts to determine the strength and
character of the relationship between one dependent variable
(usually denoted by Y) and a series of other variables (known as
independent variables)
Introduction
• To perform regression analysis, an investigator
collects data on underlying variables of interest and
employs regression model to estimate the
quantitative causal effect of the independent
variables to the response (dependent) variable.
• If the fitted regression model adequately reflects the
true relationship between the dependent variable
and independent variables, this model can be used
for predicting the dependent variable.
Uses of Regression
• Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least
one independent variable. The device used to accomplish this
estimation procedure is the regression line. The regression line describes
the average relationship existing between X and Y variables, i.e., it
displays mean values of X for given values of Y. The equation of this line,
known as the regression equation, provides estimates of the dependent
variable when values of the independent variable are inserted into the
equation.
Explain the impact of changes in an independent variable on the
dependent variable.
Uses of Regression.... Cont..
Another goal of regression analysis is to obtain a
measure of the error involved in using the regression
line as a basis for estimation. For this purpose the
standard error of estimate is calculated. This is a
corresponding value estimated from the regression line.
If the line fits the data closely, that is , if there is little
scatter of the observations around the regression line,
good estimates can be made of the Y variable. On the
other hand, there is a great deal of scatter of the
observations around the fitted regression line, the line
will not produce accurate estimates of the dependent
variable.
Dependent & Independent Variables
Dependent variable:
The variable we wish to predict or explain.
It is also known as regress and, the explained variable, the
predicted variable or response variable .
Independent variable:
The variable used to explain the dependent variable.
It is also known as regressor, explanatory variable, or
predictor.
Correlation vs Regression
• Correlation analysis is used to measure strength
of the association (linear relationship) between
two variables.
Y 0 1 X
• where Y is the dependent variable, β0 is Y
intercept, β1 is the slope of the simple linear
regression line, x is the independent variable, and
ε is the random error.
Simple Linear Regression Model
• The typical experiment for the simple linear
regression is that we observe n pairs of data (x1,
y1), (x2, y2),… (xn, yn) from a scientific experiment,
and model in terms of the n pairs of the data can
be written as
Yi 0 1 X i i
for i=1, 2, …, n
Simple Linear Regression Model
• To perform regression analysis, we determine
the equation of the regression line which is the
best fit to the data.
Estimation of Parameters
Here is how to find the good estimates of
parameters
.......................
b) $ 317, 850.
Example -2
• The following table shows the number of motor
registrations in a certain territory for a term of 5
years and the sale of motor tyres by a firm in that
territory for the same period.
Year Motor Number of
Registrations Tyres Sold
1 600 1,250
2 630 1,100
3 720 1,300
4 750 1,350
5 800 1,500
Example -2
• Find the regression equation to estimate the sale
of tyres when the motor registration is known.
Estimate sale of tyres when registration is 850.
• Answers:
a) The regression equation is
Y 255.04 1.4928 X
b) 1,524 tyres.
Interpreting the Regression Line
Equation
• Let’s combine all these parts of a linear
regression equation and see how to interpret
them.
• Coefficient signs: Indicates whether the
dependent variable increases (+) or decreases (-)
as the IV increases.
• Coefficient values: Represents the average
change in the DV given a one-unit increase in the
IV.
• Constant: Value of the DV when the IV equals
zero.
Example.
Weight kg = -114.3 + 106.5 Height M
• The coefficient sign is positive, meaning that
weight tends to increase as height increases.
Additionally, the coefficient is 106.5. This value
indicates that if you increase height by 1m,
weight increases by an average of 106.5kg.
However, our data have a range of only 0.4M. So,
we can’t use a full meter but a proportion of one.
For example, with an additional 0.1m, you’d
expect a 10.65kg increase.
End
Coefficient of Determination
• The coefficient of determination R2 is the % of
variation in the response variable that is
explained by variation in the predictor
variable.
R2 : Coefficient of Determination
Variation due to Error –
Sum of squared Residuals
Variation Explained by the X variable:
SS due to Regression
30
Coefficient of Determination
• The coefficient of determination is the ratio of
the explained variation to the total variation.
• The symbol for the coefficient of determination
is R2.
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑅2 =
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
• Another way to arrive at the value for R2 is to
square the correlation coefficient.
R2 = (coefficient of correlation)2
31
Coefficient of Determination
• To determine R2 for the linear regression
model simply square the value of the linear
correlation coefficient.
NOTE: The method does not work for regression
equations that have more than 1 predictor variable.
R2 - Example
A market analyst for Chinese Toys in a certain
island collected the following data.
Advertisement ($) Sales (Units)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the coefficient of
determination.
R2 - Example
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817