0% found this document useful (0 votes)
56 views31 pages

M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics

This document discusses correlation and regression analysis. It defines correlation as a measure of the relationship between two variables, and regression as developing an equation to express their relationship. Key points covered include: scatter diagrams are used to visualize the relationship; the correlation coefficient r measures the strength of the linear relationship; and the regression equation is estimated to minimize the error and can be used to predict dependent variable values based on independent variables.

Uploaded by

SP Vet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views31 pages

M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics

This document discusses correlation and regression analysis. It defines correlation as a measure of the relationship between two variables, and regression as developing an equation to express their relationship. Key points covered include: scatter diagrams are used to visualize the relationship; the correlation coefficient r measures the strength of the linear relationship; and the regression equation is estimated to minimize the error and can be used to predict dependent variable values based on independent variables.

Uploaded by

SP Vet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Course no: EMBA 502: Business Mathematics and

Statistics

Topic: Correlation and Regression

M. Amir Hossain PhD


Correlation and Regression

 The most commonly used forms of bi-variate


statistical analysis
 Useful in making business and economic decisions
 Helpful in identifying the nature of relationship
among many business and economic variables
 Recognize that there is a quantifiable relationship
between two or more variables
 One variable depends on another and can be
determined by it
Correlation and Regression
The variables :
Students GPAs and amount of time they
spend on studying
A firm’s sale and expenditure on advertisement
Dependent variable and Independent variable
Determination of dependent and independent
variable is crucial
Usually
X : Independent variable
Y : Dependent variable
Scatter Diagram
 A plot of the paired observations of X and Y on a graph
 Graphically shows the relationship between two
variables
 Common practice is to place the dependent variable on
Y–axis and independent variable on X–axis

Ex. Sales and advertisement expenditures (in million


Taka) of a firm on different months are
Sales 3 6 4 6 3 5 4
Advertisement 2 4 2 3 1 3 2.5
Scatter Diagram
Scatter Diagram
Correlation Analysis

• Correlation Analysis: A group of statistical


techniques used to measure the strength of the
relationship (correlation) between two variables.
• Scatter Diagram: A chart that portrays the
relationship between the two variables of interest.
• Dependent Variable: The variable that is being
predicted or estimated.
• Independent Variable: The variable that provides
the basis for estimation. It is the predictor
variable.
The Coefficient of Correlation, r

The Coefficient of Correlation (r) is a measure


of the strength of the relationship between
two variables.
 It requires interval or ratio-scaled data (variables).
 It can range from -1.00 to 1.00.
 Values of -1.00 or 1.00 indicate perfect and strong
correlation.
 Values close to 0.0 indicate no linear correlation.
 Negative values indicate an inverse relationship and
positive values indicate a direct relationship.
The Coefficient of Correlation, r
Perfect Negative Correlation

10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation

10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation

10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Strong Positive Correlation

10
9
8
7
6
Y
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Formula for r

n( XY )  ( X )( Y )
r
 n( X 2 2

)  ( X )  n Y    Y 
2 2

Coefficient of Determination
The Coefficient of Determination, r2 - the proportion
of the total variation in the dependent variable Y
that is explained or accounted for by the variation
in the independent variable X.
The coefficient of determination is the square of the
coefficient of correlation, and ranges from 0 to 1.
Example: Sales and advertisement expense data,
r = 0.759 and r2 = (0.759)2 = 0.576
57.6% variation of sales can be explained by the
variation in advertisement expenses
Example: The following sample observations were
randomly selected:
X:4 5 3 6 7
Y:4 6 5 7 8
Determine the coefficient of correlation and coefficient
of determination and interpret.
Ans: Here, ∑X = 25, ∑Y = 30, ∑X2 = 135, ∑Y2 = 190,
∑XY = 159, r = 0.9, r2 = 0.82.
Comment: 82% of the variation in Y can be explained by
X
Example: Dan Ireland, the student body president at Toledo State
University, is concerned about the cost of textbooks. To provide
insight into the problem he selects a sample of eight textbooks
currently on sale in the bookstores. He decides to study the
relationship between the number of pages in the text and the cost.
Compute the correlation coefficient.
No. of Pages Price (in $)
500 28
700 25
800 33
600 24
400 23
500 27
600 21
800 31
Calculate r and comment on the relationship between the variables.
Also r2
Regression Analysis
 In regression analysis an equation is developed to
express the relationship between dependent and
independent variables
 The equation must be linear
Purpose: to determine the regression equation; it is used to
predict the value of the dependent variable (Y) based on
the independent variable (X).
Procedure: select a sample from the population and list the
paired data for each observation; draw a scatter diagram
to give a visual portrayal of the relationship; determine
the regression equation.
Regression Analysis

General form of linear regression model


Y = a + bX + e
Where,
Y : dependent variable
a : intercept term
b : slope of the line
X : independent variable
e : error term
Want to estimate a and b such that ∑e2 is minimum
Regression Analysis

Linear Regression Model


 The relationship between X and Y is described by a
linear function
 Changes in Y are assumed to be caused by changes in X
 Linear regression population equation model

Yi  β0  β1x i  ε i
 Where 0 and 1 are the population model coefficients
and  is a random error term.
Simple Linear Regression Model
The population regression model:

Population Random
Population Independent
Slope Error
Y intercept Variable
Coefficient term

Yi  β0  β1Xi  ε i
Dependent
Variable

Linear component Random Error


component
Simple Linear Regression Model

Y Yi  β 0  β1X i  ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi
Regression Analysis

We estimate β0 and β1 such that ∑e2 is minimum

The error sum of squares ∑e2 will be minimum if

β̂ 0  b0  y  bˆ1 x
  x - x  y  y 
β̂1  b1 
  x - x
2
Simple Linear Regression Equation

The simple linear regression equation provides an


estimate of the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value for
observation i intercept
Value of x for

yˆ i  b0  b1x i observation i
Interpretation of the Slope and the Intercept

• b0 is the estimated average value of y when


the value of x is zero (if x = 0 is in the range
of observed x values)

• b1 is the estimated change in the average value


of y as a result of a one-unit change in x
Prediction

• The regression equation can be used to predict a


value for y, given a particular x

• For a specified value, xn+1 , the predicted value is

yˆ n1  b0  b1x n1


Regression Analysis

The error sum of squares ∑e2 will be minimum if

 These estimates are known as least squares estimates


 Sign of is similar to that of correlation
coefficient r
Estimated value of dependent variable:

Estimated error is:


Regression Analysis

is the average predicted value of Y for


any X.

is the Y-intercept, or the estimated Y


value when X=0

is the slope of the line, or the average


change in Y’ for each change of one unit in X
Regression Analysis (Coefficient of determination)

r 2 = Percentage of total variation in the dependent


variable explained by the independent variable.
From a linear regression model one can write

r 2 = (Explained variation/total variation)


= (Total variation – Unexplained variation)
Total variation
Regression Analysis (Coefficient of determination)

Total Variation (TSS) =

Unexplained variation (ESS) =

Explained variation (RSS) =

Coefficient of variation (r2) =


Regression Analysis (Coefficient of determination)

RSS ESS
r 
2
 1
TSS TSS
ESS
SY  X 
n2

You might also like