Simple Linear Regression and Correlation
Simple Linear Regression and Correlation
Business Statistics
For SOB students
2024
4/15/2024 1
Simple linear regression and
Correlation
Topic-2
4/15/2024 2
Correlation
Introduction
4/15/2024 5
Correlation
Positive or direct relationship
❑ X and Y variables are said to have a positive or
direct linear relationship when X increases and
Y increases or when X decreases and Y
decreases
4/15/2024 6
Correlation
Negative or Indirect or Inverse relationship
❑ X and Y variables are said to have a negative
linear relationship when X increases and Y
decreases or when X decreases and Y increases
4/15/2024 7
Correlation
Zero or no relationship
❑ X and Y variables are said to have a zero or no
relationship when changes in X (either increase
or decrease) does not determine changes in Y
4/15/2024 8
Correlation
Non-linear relationship
❑ X and Y variables are said to have a non-linear
relationship when changes in X (either increase
or decrease) does not correspond with a
constant change in Y
4/15/2024 9
Measurement of correlation
❑ Correlation can be computed by using Karl
Pearson’s Coefficient of correlation which is
given by;
4/15/2024 10
Measurement of correlation
❑ Alternatively, correlation coefficient is given by;
4/15/2024 11
Measurement of correlation
Properties of r
4/15/2024 12
Measurement of correlation
Properties of r (Continued)
4/15/2024 14
Measurement of correlation
Assumptions of Karl Pearson’s coefficient of
correlation
4/15/2024 16
Measurement of correlation
Example
No Wing length (X cm) Tail length (Y cm)
1 10.4 7.4
2 10.8 7.6
3 11.1 7.9
4 10.2 7.2
5 10.3 7.4
6 10.2 7.1
7 10.7 7.4
8 10.5 7.2
9 10.8 7.8
10 11.2 7.7
11 10.6 7.8
12 11.4 8.3
4/15/2024 17
Measurement of correlation
Solution
No X Y X2 Y2 XY
1 10.4 7.4 108.16 54.76 76.96
2 10.8 7.6 116.64 57.76 82.08
3 11.1 7.9 123.21 62.41 87.69
4 10.2 7.2 104.04 51.84 73.44
5 10.3 7.4 106.09 54.76 76.22
6 10.2 7.1 104.04 50.41 72.42
7 10.7 7.4 114.49 54.76 79.18
8 10.5 7.2 110.25 51.84 75.6
9 10.8 7.8 116.64 60.84 84.24
10 11.2 7.7 125.44 59.29 86.24
11 10.6 7.8 112.36 60.84 82.68
12 11.4 8.3 129.96 68.89 94.62
n=12 Sum (X)=128.2 Sum (Y)=90.8 Sum (X2)=1371.32 Sum (Y2)=688.4 Sum (XY)=971.37
4/15/2024 18
Measurement of correlation
Solution
4/15/2024 19
Measurement of correlation
Solution
❑ Therefore Coefficient of Correlation is 0.87
❑ Since the value of r=0.87 this means that, there
is a very strong direct (positive) linear
relationship between wing length and tail
length of such birds.
❑ That is the longer the length of the wing the
longer the length of the tail and vice versa is
true.
4/15/2024 20
Measurement of correlation
Solution
❑ The Scatter plot to show the relationship
between Wing length and Tail length of birds
4/15/2024 21
Simple Linear Regression
Introduction
4/15/2024 24
Simple Linear Regression
Introduction (Continued)
4/15/2024 25
Simple Linear Regression Model/Equation
The concept of a straight line
Example
❑ It is believed that, age of a sparrow bird is one of
the factors that determines its wing length.
Suppose you have been provided with age and
wing length data from a sample of 13 birds as
indicated in the following Table.
i. Present the data with an aid of a scatter plot
ii. Draw a straight line to fit the points
4/15/2024 26
Simple Linear Regression Model/Equation
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
4/15/2024 27
Simple Linear Regression Model/Equation
The concept of a straight line
❑ The Scatter plot to show age and wing length
data
4/15/2024 28
Simple Linear Regression Model/Equation
The concept of a straight line
❑ The straight line drawn to fit the points
4/15/2024 29
Simple Linear Regression Model/Equation
The concept of a straight line
4/15/2024 30
Simple Linear Regression Model/Equation
The challenge of the straight line
4/15/2024 31
Simple Linear Regression Model/Equation
The challenge of the straight line
4/15/2024 32
Simple Linear Regression Model/Equation
Introduction of the regression model
4/15/2024 33
Simple Linear Regression Model/Equation
Introduction of the regression model
❑ The general form of a simple linear regression model is
as follows;
Y- Dependent Variable
X- Independent Variable
β0 – Intercept coefficient (model parameter)
β1 – Slope coefficient (model parameter)
– Residual or error, stochastic, disturbance term
4/15/2024 34
Simple Linear Regression Model/Equation
Introduction of the regression model
4/15/2024 35
Simple Linear Regression Model/Equation
Terms in the regression model
β0 – intercept coefficient
-It is taken when X=0
-The average (expected) value of Y (dependent)
variable without an influence of an X (explanatory)
variable .
β1 – Slope coefficient
-Express the rate of change in Y for unit change in X
-The average (expected) change in Y (explained)
variable brought about by a unit change observed in
X (independent) variable.
4/15/2024 36
Simple Linear Regression Model/Equation
Terms in the regression model
Error term
- residual, error, stochastic, or disturbance term.
-Explains the influence of other variable not
included in the model (apart from given
independent variable)
4/15/2024 37
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Example
❑ Suppose you were provided with the following
estimated simple linear regression models and
you are required to interpret them.
4/15/2024 38
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;
Intercept term/parameter
❑ The estimated value of a dependent variable (Y) is
4.35 units in the absence or when there is no
influence of the independent variable (X)
Slope coefficient/parameter
❑ The dependent variable (Y) increases by 1.56 units
when the independent variable (X) increases by a unit
and vice versa is true (Positive linear relationship)
4/15/2024 39
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;
Intercept term/parameter
❑ The estimated value of a dependent variable (Y) is
-2.75 units in the absence or when there is no
influence of the independent variable (X)
Slope coefficient/parameter
❑ The dependent variable (Y) decreases by 0.675 units
when the independent variable (X) increases by a unit
and vice versa is true (Negative linear relationship)
4/15/2024 40
Simple Linear Regression Model/Equation
Assumptions of the Simple linear regression
4/15/2024 42
Fitting the simple regression line
Ordinary Least Square Method
4/15/2024 43
Fitting the simple regression line
Estimation of model parameters
4/15/2024 44
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Example
❑ It is believed that, age of a sparrow bird is one of the
factors that determines its wing length. Suppose you
have been provided with age and wing length data from
a sample of 13 birds as indicated in the following Table.
4/15/2024 45
Fitting the simple regression line
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
4/15/2024 46
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
❑ The estimated simple linear Regression Model is
given by;
4/15/2024 47
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
❑ The parameters in the model are calculated as;
4/15/2024 48
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
n=13
4/15/2024 49
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4 9 4.2
2 4.0 1.5 16 6
3 5.0 2.2 25 11
4 6.0 2.4 36 14.4
5 8.0 3.1 64 24.8
6 9.0 3.2 81 28.8
7 10.0 3.2 100 32
8 11.0 3.9 121 42.9
9 12.0 4.1 144 49.2
10 14.0 4.7 196 65.8
11 15.0 4.5 225 67.5
12 16.0 5.2 256 83.2
13 17.0 5.0 289 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum(XY)=514.8
4/15/2024 50
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
4/15/2024 51
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
4/15/2024 52
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
4/15/2024 53
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
4/15/2024 54
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
4/15/2024 55
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
❑ In other words
4/15/2024 57
Data, Estimated(fitted) value, Residual
Example
❑ Refer to the Sparrow bird example. Use the
estimated regression model to prepare a table
consisting of the following columns;
4/15/2024 58
Data, Estimated(fitted) value, Residual
4/15/2024 59
Data, Estimated(fitted) value, Residual
4/15/2024 60
Total variability of an outcome variable
❑ Total variability of an outcome variable refers
to the square summation of the deviation of
the dependent variable (Y) from its central
value (Mean)
❑ Total variability of the dependent variable (Y)
can be partitioned or broken down into
i. Explained variability (variability due to
estimated regression model)
ii. Unexplained variability (variability due to
errors or residual)
4/15/2024 61
Total variability of an outcome variable
❑ Total variability of the dependent variable Y is
also known as Total Sum Square (TSS) or Sum
of Square Total (SSTotal)
❑ Total variability due to regression model is
also known as Explained Sum Square (ESS) or
Sum of Square Regression (SSRegression)
❑ Total variability due to error or residual is also
known as Residual Sum Square (RSS) or Sum
of Square Residual (SSResidual)
4/15/2024 62
Total variability of an outcome variable
❑ Mathematically, partition of the total variability
of the dependent variable Y can be presented
as follows;
4/15/2024 63
Total variability of an outcome variable
❑ Further more;
4/15/2024 64
Coefficient of determination
❑ The coefficient of determination refers to the
proportion or percentage of the total
variability in Y (dependent variable) that is
explained or accounted for by a fitted
regression model.
❑ Coefficient of determination is the measure
of goodness of fit of the regression model.
❑ Coefficient of determination is denoted by
R2.
4/15/2024 65
Coefficient of determination
❑ Mathematically coefficient of determination
is computed as;
4/15/2024 66
Coefficient of determination
❑ The components that are used to compute
the coefficient of determination can also be
obtained as;
4/15/2024 67
Coefficient of determination
❑ Note: For the case of simple linear
regression coefficient of determination is
equal to the squared value of the coefficient
of correlation between an independent and a
dependent variable. This is expressed as;
4/15/2024 68
Coefficient of determination
Interpretation
Usually: 0 ≤ R2 ≤ 1
❑ When R2 is very close to 1, implies a very good
fit; in other words the variability in Y (dependent
variable) is highly explained by the variability of
X’s in the model (independent variables).
i. Coefficient of determination
4/15/2024 71
Computation of R2
Solution
❑ The estimated linear regression model was given
by
❑ Where β1 =0.2702
❑ Now, consider the next table
4/15/2024 72
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
4/15/2024 73
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4 9 1.96 4.2
2 4.0 1.5 16 2.25 6
3 5.0 2.2 25 4.84 11
4 6.0 2.4 36 5.76 14.4
5 8.0 3.1 64 9.61 24.8
6 9.0 3.2 81 10.24 28.8
7 10.0 3.2 100 10.24 32
8 11.0 3.9 121 15.21 42.9
9 12.0 4.1 144 16.81 49.2
10 14.0 4.7 196 22.09 65.8
11 15.0 4.5 225 20.25 67.5
12 16.0 5.2 256 27.04 83.2
13 17.0 5.0 289 25 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum (Y2)=171.3 Sum(XY)=514.8
4/15/2024 74
Computation of R2
Solution
❑ Coefficient of determination can be obtained as;
4/15/2024 75
Computation of R2
Solution
4/15/2024 77