0% found this document useful (0 votes)
38 views

Statistical Analysis: Linear Regression

This document discusses linear regression analysis. It distinguishes between correlation, which deals with the relationship between two variables, and regression, which uses a relationship to predict or estimate values. The document explains how to fit a linear regression line to data using the least squares method to minimize vertical deviations from the line. It describes how to compute the regression coefficients to generate the linear regression equation and make predictions. Examples are provided on using linear regression to model relationships, such as fetal growth based on age.

Uploaded by

Dafter Khembo
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Statistical Analysis: Linear Regression

This document discusses linear regression analysis. It distinguishes between correlation, which deals with the relationship between two variables, and regression, which uses a relationship to predict or estimate values. The document explains how to fit a linear regression line to data using the least squares method to minimize vertical deviations from the line. It describes how to compute the regression coefficients to generate the linear regression equation and make predictions. Examples are provided on using linear regression to model relationships, such as fetal growth based on age.

Uploaded by

Dafter Khembo
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

STATISTICAL ANALYSIS

LINEAR REGRESSION
Dr. Dafter Khembo

1
Learning Objectives

By the end of this lesson, students should be able to:


1.Distinguish between correlation and regression
2.Describe the Linear Regression Model
3.Explain Least Squares
4.Compute Regression Coefficients
5.Predict Response Variable

2
Correlation & Regression
 In the previous lesson, we studied the linear correlation between two
variables.
Correlation: Deals with relationship between two quantitative variables
(measured on Same Person)
Discusses
1) the direction of the relationship (+ve or. -ve)
2) the strength of the relationship (r: from –1 to +1)
3) in hypothesis testing, whether | r | > critical value

In Linear Regression, the concept is developed further in two respects.


If you have a significant correlation:
1. You can build statistical model (draw straight line) of relationship between
the variables.
2. You can use the model to forecast / predict / estimate.
 Regression is the formal statistical procedure for using a relationship to
forecast / predict / estimate score (data point).

3
Correlation vs. Regression

4
Regression Analysis
• Regression analysis means estimating, forecasting or
predicting the unknown value of one variable from the
known value(s) of the other variable(s).
• It involves:
1. Checking for a significant linear correlation (r) between
x and y
2. If there is NOT a significant linear correlation between x
and y, then we CANNOT USE REGRESSION to predict y.
3. IF THERE IS a significant linear correlation between x
and y, the best predicted y-value is found by putting
the x-value into the regression equation and calculating
y

5
The Independent and Dependent Variables
• x is the independent variable (predictor variable). It is the variable under
the investigator’s control.
• y-hat is the dependent variable (response variable). It is the variable
which the investigator is trying to estimate or predict.
• Note the different formats that are used

y = a + bx
ŷ  b0  b1 x
y-hat is the b0 is the y- b1 is the slope of x is the “independent”
“dependent” or intercept or the the regression line or “predictor” variable
“response” variable value at which or the amount of because it acts
because it depends the regression change in y for independently to
on, or responds to line crosses the every 1 unit change predict the value of y-
the value of x vertical axis in x hat
6
Regression Line
• To obtain a straight line relationship, consider the sample paired
data on sales of each of the n = 5 months of a year and the
advertising expenditure incurred in each month

Month Expenditure (K1000) (x) Sale (K1000) (y)


April 10 14
May 12 17
June 15 23
July 20 21
August 23 25

First, do scatter diagram to ascertain if relationship is linear

7
Scatter Diagram

• We note a positive relationship, and possibly high degree of correlation


• Various points on the diagram can best be explained by a straight line

8
Fitting a Regression Line
 If all the data points lay on a straight line, it would be simple
to draw an approximate straight line on the scatter-plot
 This is not the case with real data
 If the points on the scatter diagram can best be described by
a straight line, the next step is to fit a straight line on the
scatter diagram.
 On the whole, the line must lie as close as possible to every
data point on the scatter diagram.
 Since a straight line so fitted best approximates all the points
on the scatter diagram, it is better known as the line of best
fit.
 A line of best fit can be fitted by means of:
 1. Free hand drawing method, and
 2. Least squares method

9
Fitting a Regression Line: Free hand
drawing
• After a careful inspection of the spread of various data
points on the scatter diagram, a straight line can be
drawn through the points such that on the whole it is
closest to every point.
• The major drawback is that the slope of the line so
drawn will vary from person to person because of the
influence of subjectivity.
• Consequently, the values of the dependent variable
estimated on the basis of such a line may not be as
accurate and precise as those based on the line of best
fit.

10
Fitting a Regression Line: Least Square
Method
• The line of best fit has all the data points as close to it as possible
• The least square method of fitting a line of best fit requires
minimizing the sum of the squares of vertical deviations of each
observed y-value from the fitted line
• Derives an equation for the line that best models the relationship
between the two variables.
• The equation has the mathematical form: y = a + bx where, y is
the value of the dependent variable, x is the value of the
independent variable, a is the intercept of the regression line on
the y axis when x = 0, and b is the slope of the regression line.
• Since a straight line is completely defined by its intercept a and
slope b, the task of fitting the same reduces only to the
computation of the values of these two constants.
• Once these two values are known, the computed y values against
each value of x can be easily obtained by substituting x values in
the linear equation

11
Scatter Diagram

12
Least Squares Line

y
y=a+bx

ei
a

13
Modelling a Straight Line

y
y=a+bx

b
a 1 unit

14
• By using the least squares method (a procedure that minimizes the
vertical deviations of plotted points surrounding a straight line) we are
able to construct a best fitting straight line to the scatter diagram points
and then formulate a regression equation in the form of:

n( xy)   x  y
y  a  bx b 2
n ( x 2
) ( x)
a  y  bx

15
Expenditure Sale (K1000)
Month xy x2 y2
(K1000) (x) (y)

April 10 14 140 100 196


May 12 17 204 144 289
June 15 23 345 225 529
July 20 21 420 400 441
August 23 25 575 529 625

∑x = 80 ∑y = 100 ∑xy = 1684 ∑x2 = 1398 ∑y2 = 2080

16
Generating The Least Squares Equation

17
Uses of Regression Lines
 The least squares regression line may be used to
estimate a value of the dependent variable given a value
of the independent variable
 The value of the independent variable (x) should be within
the range of the given data
 The predicted value of the dependent variable (y) is only
an estimate
 Even though the fit of the regression line is good, it does
not prove there is a relationship between the variables
outside of the values from the given experiment

18
Assumptions
1. We are investigating only linear relationships
2. For each x value, y is a random variable
having a normal (bell-shaped) distribution. All
of these y distributions have the same
variance
 Results are not seriously affected if
departures from normal distributions and
equal variances are not too extreme

19
Practical Example (from Medical Field)
 Data from a study of foetal development
 Date of conception (and hence age) of the foetus is
known accurately
 Height of the foetus (excluding the legs) is known from
ultrasound scan
 Age and length of the foetus are clearly related
 Aim is to model the length and age data and use this to
assess whether a foetus of known age is growing at an
appropriate rate

20
Growth of a Foetus

21
Graphical Assessment of Data

22
Linear Regression Model
 From the scatterplot, it would appear that age and
length are strongly related, possibly in a linear way
 A straight line can be expressed mathematically in
the form
y  a  bx
 Where b is the slope, or gradient of the line, and a is
the intercept of the line with the y-axis

23
Fitted Line

24
Interpretation of Results
 The regression equation is …
length  2.66  0.12  age
 This implies that as the age of the foetus increases by one day,
the length increases by 0.12cm for a foetus of age 85 days, the
estimated length would be
length  2.66  (0.12  85)  7.51
 A prediction interval gives the range of values between which the
value for an individual is likely to lie: (7.01 to 8.08cm).
 Outside this range, the foetus of a known age is probably not
growing at an appropriate rate
 If measured length is <7.01cm, there is evidence that the foetus is
not growing as it should
 If measured length is >8.08cm, is the foetus larger than expected?
Is the actual age (and due date) wrong?

25
Exercise
 use the calculated least squares linear regression line
to estimate the size of a foetus at the following
gestation times:
(a) 2 days
(b) 60 days
(c) 100 days
(d) 300 days

26
Exercise
A sample of 6 persons was selected the value of their age ( x
variable) and their weight are presented in the following table.
Find the regression equation and what is the predicted weight
when age is (i) 8.5 years and (ii) 7.5 years?

.Serial no Age (x) Weight (y)


1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13

27
Answer

Serial Age (x) Weight (y) xy X2 Y2


.no
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169

Total 41 66 461 291 742

28
Answer
n( xy )   x  y
b 2

x )  (  x)
n ( 2

n( xy )   x  y 6 * 461  41* 66 60


b    0.923
n ( x 2) ( x)
2
6 * 291  1681 65

a  y  bx  11  0.923 * 6.8333  4.693

29
y (x)  4.693  0.923x

y (7.5)  4.693  0.923 * 7.5  11.62Kg

y (7.5)  4.693  0.923 * 7.5  11.62Kg

30
Regression Line
Example:
The following data represents the number of hours 12 different
students watched television during the weekend and the scores of
each student who took a test the following Monday.

a.) Find the equation of the regression line.


b.) Use the equation to find the expected test score for a
student who watches 9 hours of TV.
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

 x  54  y  908  xy  3724  x 2  332  y 2  70836


31
Residuals
After verifying that the linear correlation between two variables is
significant, next we determine the equation of the line that can be
used to predict the value of y for a given value of x.
Observed y-
y
value

d2 For a given x-value,


d1
d = (observed y-value) – (predicted y-value)

Predicted y- d
3
value
x
Each data point di represents the difference between the observed
y-value and the predicted y-value for a given x-value on the line.
These differences are called residuals.
32
Multiple Regression Equation
Multiple regression analysis is a straightforward extension of
simple regression analysis which allows more than one
independent variable.
This is because, in some instances, a better prediction can be found
for a dependent (response) variable by using more than one
independent (explanatory) variable.
For example, a more accurate prediction of Monday’s test grade
from the previous slide might be made by considering the number
of other classes a student is taking as well as the student’s previous
knowledge of the test material.

* Because the mathematics associated with this concept is complicated,


technology is generally used to calculate the multiple regression equation.

33
Multiple Regression

34
Coefficient of Determination
The coefficient of determination is the portion of the total
variation in the dependent variable that is explained by
variation in the independent variable

E xpla in ed va r ia t ion
r2 
Tot a l va r ia t ion
Example:
The correlation coefficient for the data that represents the number
of hours students watched television and the test scores of each
student is r  0.831. Find the coefficient of determination.

r 2  (0.831)2 About 69.1% of the variation in the test scores can be


explained by the variation in the hours of TV watched.
 0.691 About 30.9% of the variation is unexplained.

35
References

• Anderson D et al (2015) Quantitative Methods for Business, Boston, Cengage Learning


• Burton, G., Carrol, G. & Wall, S (2002), Quantitative Methods for Business and Economics, Prentice Hall,
Harlow
• Curwin, J. & Slater, R. (2008), Quantitative Methods for Business Decisions, 6th Edition, Thomson
Learning, London
• Dewhurst, F (2002) Quantitative Methods for Business and Management, McGraw-Hill, London
• Heiman, G.W. (20000), Basic Statistics for the Behavioral Sciences, 3rd Edition, Houghton Mifflin
Company, New York
• Marchal L and Wathen (2005). Statistical Techniques in Business and Economics 12th Edition, New York
McGraw-Hill.
• Newbold P (1991) Statistics for business and Economics, Prentice hall, 1991, New Jersey
• Swift L, Piffs, (2005), Quantitative methods for business management and finance Pelgrave Macmillan
• Waters D (2011), Quantitative Methods for Business, 5th edition, Philadelphia Trans-Atlantic
Publications, Inc.
• Wegner T (2006). Applied Business Statistics: Methods and Applications, Juta & Co, Ltd: Cape Town

36

You might also like