0% found this document useful (0 votes)
14 views

Lecture 6 - Regression Analysis

Uploaded by

kyxqfnpytc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 6 - Regression Analysis

Uploaded by

kyxqfnpytc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ACU 07210

Business Statistics I

Regression Analysis

Facilitator: Makurunge, S.
Email: [email protected]
Introduction
• Regression analysis is the branch of statistical theory that is
widely used in almost all the scientific disciplines.
• In statistics, Regression analysis consists of techniques for
modeling the relationship between a dependent variable and
one or more independent variables.
• Regression is a statistical method used in finance, investing and
other disciplines that attempts to determine the strength and
character of the relationship between one dependent variable
(usually denoted by Y) and a series of other variables (known as
independent variables)
Introduction
• To perform regression analysis, an investigator
collects data on underlying variables of interest and
employs regression model to estimate the
quantitative causal effect of the independent
variables to the response (dependent) variable.
• If the fitted regression model adequately reflects the
true relationship between the dependent variable
and independent variables, this model can be used
for predicting the dependent variable.
Uses of Regression
• Regression analysis is used to:
 Predict the value of a dependent variable based on the value of at least
one independent variable. The device used to accomplish this
estimation procedure is the regression line. The regression line describes
the average relationship existing between X and Y variables, i.e., it
displays mean values of X for given values of Y. The equation of this line,
known as the regression equation, provides estimates of the dependent
variable when values of the independent variable are inserted into the
equation.
 Explain the impact of changes in an independent variable on the
dependent variable.
Uses of Regression.... Cont..
 Another goal of regression analysis is to obtain a
measure of the error involved in using the regression
line as a basis for estimation. For this purpose the
standard error of estimate is calculated. This is a
corresponding value estimated from the regression line.
If the line fits the data closely, that is , if there is little
scatter of the observations around the regression line,
good estimates can be made of the Y variable. On the
other hand, there is a great deal of scatter of the
observations around the fitted regression line, the line
will not produce accurate estimates of the dependent
variable.
Dependent & Independent Variables
 Dependent variable:
 The variable we wish to predict or explain.
 It is also known as regress and, the explained variable, the
predicted variable or response variable .
 Independent variable:
 The variable used to explain the dependent variable.
 It is also known as regressor, explanatory variable, or
predictor.
Correlation vs Regression
• Correlation analysis is used to measure strength
of the association (linear relationship) between
two variables.

• Correlation is only concerned with strength of the


relationship.

• No causal effect is implied with correlation.


Types of Regression Models
Linear & Nonlinear Regression
Linear Regression Models:
• Relationship between Y and X is a linear function.
• Linear regression is used to capture linear
relationships.
Nonlinear Regression Models:
• If a relation between Y and X is nonlinear.
• Nonlinear regression can be used to capture
nonlinear relationships.
Simple & Multiple Regression
• Simple regression is the most basic type of regression.
• There is only one independent variable and one dependent
variable in simple regression.
• Simple regression aims to find the line that best fits the
data.
• Equation for simple regression:
𝒀 = 𝒂 + 𝒃𝑿 + 𝒖
• Where,
Y= Dependent variable
X= Independent(Explanatory) variable
a= Intercept,
b= Slope,
u= The regression residual.
Simple & Multiple Regression
• Multiple regression is a type of regression analysis
that uses more than one predictor variable to predict
the dependent variable.
• In multiple regression, the model simultaneously fits
the data using all of the predictor variables.
• The equation for Multiple regression:
𝒀 = 𝒂 + 𝒃𝑿𝟏 + 𝒄𝑿𝟐 + 𝒅𝑿𝟑 + ⋯ + 𝒕𝑿𝒕 + 𝒖
Where;
Y= Dependent variable
X1, X2, X3, X4= Independent (Explanatory) variables
a= Intercept,
b, c, d, … = slopes,
u= the regression residual
Simple & Multiple Regression
Note:

• In this module we discuss the simple linear


regression only.
Simple Linear Regression Model
• Simple linear regression is a regression model that
estimate the relationship between one independent
variable and one dependent variable using straight
line. You can use simple linear regression when you
want to know:
1. How strong the relationship is between two
variables (e.g., the relationship between rainfall and
soil erosion).
2. The value of the dependent variable at a certain
value of the independent variable (e.g., the amount
of soil erosion at a certain level of rainfall).
• Relationship between X and Y is described by a
linear function.
• Changes in Y are assumed to be caused by
changes in X.
• Regression models describe the relationship
between variables by fitting a line to the
observed data. Linear regression models use a
straight line, while logistic and nonlinear
regression models use a curved line. Regression
allows you to estimate how a dependent variable
changes as the independent variable(s) change.
Assumptions of simple linear
regression
Simple linear regression is a parametric test,
meaning that it makes certain assumptions
about the data. These assumptions are:
Homogeneity of variance (homoscedasticity):
the size of the error in our prediction doesn’t
change significantly across the value of the
independent variable.
Independence observations: the observations
in the dataset were collected using statistically
valid sampling methods, and there are no
hidden relationships among observations.
Assumption.... ..cont(s)
Normality: the data follows a normal
distribution.
The relationship between the independent
and dependent variable is linear: the line of
best fit through the data points is a straight
line (rather than a curve or some sort of
grouping factor).
Simple Linear Regression Model
• The simple linear regression model is typically
stated in the form:

Y  0  1 X  
• where Y is the dependent variable, β0 is Y
intercept, β1 is the slope of the simple linear
regression line, x is the independent variable, and
ε is the random error.
Simple Linear Regression Model
• The typical experiment for the simple linear
regression is that we observe n pairs of data (x1,
y1), (x2, y2),… (xn, yn) from a scientific experiment,
and model in terms of the n pairs of the data can
be written as

Yi   0  1 X i   i
for i=1, 2, …, n
Simple Linear Regression Model
• To perform regression analysis, we determine
the equation of the regression line which is the
best fit to the data.
Estimation of Parameters
Here is how to find the good estimates of
parameters
.......................

Here is an example on how to find simple


linear regression model/ simple linear
regression equation.
Example -1
• A real estate agent wishes to examine the
relationship between the selling price of a house
and its size (in squared feet). A random sample of
10 houses is selected and the observations are as
shown in the table below.
Example -1
Observation House price ($1000s) House size
(squared feet)
1 245 1400
2 312 1600
3 279 1700
4 308 1875
5 199 1100
6 219 1550
7 405 2350
8 324 2450
9 319 1425
10 255 1700
Example -1
• Taking house price as the response variable and
house size as the explanatory variable, find;
a) The regression model for this relationship.
b) The price for a house with 2000 ft2.
• Answers:

a) The regression model is



Y  98.24833  0.10977 X

b) $ 317, 850.
Example -2
• The following table shows the number of motor
registrations in a certain territory for a term of 5
years and the sale of motor tyres by a firm in that
territory for the same period.
Year Motor Number of
Registrations Tyres Sold
1 600 1,250
2 630 1,100
3 720 1,300
4 750 1,350
5 800 1,500
Example -2
• Find the regression equation to estimate the sale
of tyres when the motor registration is known.
Estimate sale of tyres when registration is 850.

• Answers:
a) The regression equation is

Y  255.04  1.4928 X

b) 1,524 tyres.
Interpreting the Regression Line
Equation
• Let’s combine all these parts of a linear
regression equation and see how to interpret
them.
• Coefficient signs: Indicates whether the
dependent variable increases (+) or decreases (-)
as the IV increases.
• Coefficient values: Represents the average
change in the DV given a one-unit increase in the
IV.
• Constant: Value of the DV when the IV equals
zero.
Example.
Weight kg = -114.3 + 106.5 Height M
• The coefficient sign is positive, meaning that
weight tends to increase as height increases.
Additionally, the coefficient is 106.5. This value
indicates that if you increase height by 1m,
weight increases by an average of 106.5kg.
However, our data have a range of only 0.4M. So,
we can’t use a full meter but a proportion of one.
For example, with an additional 0.1m, you’d
expect a 10.65kg increase.
End
Coefficient of Determination
• The coefficient of determination R2 is the % of
variation in the response variable that is
explained by variation in the predictor
variable.
R2 : Coefficient of Determination
Variation due to Error –
Sum of squared Residuals
Variation Explained by the X variable:
SS due to Regression

R2 =the proportion of variation explained by X variable

30
Coefficient of Determination
• The coefficient of determination is the ratio of
the explained variation to the total variation.
• The symbol for the coefficient of determination
is R2.
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑅2 =
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
• Another way to arrive at the value for R2 is to
square the correlation coefficient.
R2 = (coefficient of correlation)2
31
Coefficient of Determination
• To determine R2 for the linear regression
model simply square the value of the linear
correlation coefficient.
NOTE: The method does not work for regression
equations that have more than 1 predictor variable.
R2 - Example
A market analyst for Chinese Toys in a certain
island collected the following data.
Advertisement ($) Sales (Units)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the coefficient of
determination.
R2 - Example
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817

Interpretation: About 81.7% of the sample variation


in Sales (y) can be explained by using Ad $ (x) to
predict Sales (y) in the linear model.

You might also like