0% found this document useful (0 votes)
75 views33 pages

L5 - Simple Linear Regression Students

The document summarizes key concepts in linear regression and correlation analysis including: 1) Scatter plots are used to show the relationship between two variables and correlation measures the strength of this linear relationship from -1 to 1. 2) Simple linear regression finds the linear relationship between one independent and dependent variable using the estimated regression line equation ŷ = b0 + b1x. 3) The coefficient of determination, R2, indicates how well the regression line represents the data, with values closer to 1 showing a better fit.

Uploaded by

Kelyn Kok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views33 pages

L5 - Simple Linear Regression Students

The document summarizes key concepts in linear regression and correlation analysis including: 1) Scatter plots are used to show the relationship between two variables and correlation measures the strength of this linear relationship from -1 to 1. 2) Simple linear regression finds the linear relationship between one independent and dependent variable using the estimated regression line equation ŷ = b0 + b1x. 3) The coefficient of determination, R2, indicates how well the regression line represents the data, with values closer to 1 showing a better fit.

Uploaded by

Kelyn Kok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 5

Linear Regression
and Correlation
Analysis
Chapter 14, Groebner

1
After completing this chapter, you should be able to:
• Understand and calculate
correlation between two variables
• Understand scatter plots Calculate
Chapter and interpret the simple correlation

Goals between two variables


• Calculate and interpret the intercept
and gradient of a simple linear
regression equation for a set of data
• Calculate coefficient of
determination, R2 and indicate
• if it’s a good fit data.

2
Scatter Plots and Correlation
• A scatter plot (or scatter diagram) is used to show the
relationship between two variables
• Correlation analysis is used to measure strength of
the association (linear relationship) between two
variables
– Only concerned with strength of the relationship

3
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x
4
Scatter Plot Examples
Strong relationships Weak relationships

y y

x x

y y

x x
5
Scatter Plot Examples
No relationship

x
6
Correlation Coefficient, r
• Correlation measures the strength of the linear
association between two variables
• The sample correlation coefficient, r is a
measure of the strength of the linear
relationship between two variables, based on
sample observations

7
Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• If r > |0.5|, indicates strong relationship
• The closer to 0, the weaker the linear relationship

8
Examples of Approximate
r Values y
y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1 9
Calculating the Correlation Coefficient

Sample correlation coefficient:

r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[n(  x 2 )  (  x )2 ][n(  y 2 )  (  y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
10
 Example:
Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Find the coefficient of correlation.

Number of Number of
TV Ads Cars Sold

1 14
3 24
2 18
1 17
3 27
Significance Test for Correlation
• Hypotheses
H: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)

• Test statistic
r The Greek letter ρ (rho) represents
t the population correlation coefficient
1 r 2
n2 (with n – 2 degrees of freedom)
t-distribution table
Conf. Level 0.1 0.3 0.5 0.7 0.8 0.9 0.95 0.98 0.99
One Tail 0.45 0.35 0.25 0.15 0.10 0.05 0.025 0.01 0.005
Two Tails 0.90 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01
d.f.       Values of t        
1 0.1584 0.5095 1.0000 1.9626 3.0777 6.3137 12.7062 31.8200 63.6559
2 0.1421 0.4447 0.8165 1.3860 1.8860 2.9200 4.3030 6.9650 9.9250
3 0.1366 0.4242 0.7650 1.2500 1.6380 2.3530 3.1820 4.5410 5.8410
. . . . . . . . . .
. . . . . . . . . .
12 0.1283 0.9470 0.6950 1.0830 1.3560 1.7820 2.1790 2.6810 3.0550
13 0.1281 0.3940 0.6940 1.0790 1.3500 1.7710 2.1600 2.6500 3.0120
14 0.1280 0.3933 0.6920 1.0760 1.3450 1.7610 2.1450 2.6240 2.9770
15 0.1278 0.3928 0.6910 1.0740 1.3410 1.7530 2.1310 2.6020 2.9470
16 0.1277 0.3923 0.6900 1.0710 1.3370 1.7460 2.1200 2.5830 2.9210
17 0.1276 0.3919 0.6890 1.0690 1.3330 1.7400 2.1100 2.5670 2.8980
18 0.1274 0.3915 0.6880 1.0670 1.3300 1.7340 2.1010 2.5520 2.8780
19 0.1274 0.3912 0.6880 1.0660 1.3280 1.7290 2.0930 2.5390 2.8610
20 0.1273 0.3909 0.6870 1.0640 1.3250 1.7250 2.0860 2.5280 2.8450
21 0.1272 0.3906 0.6860 1.0630 1.3230 1.7210 2.0800 2.5180 2.8310
22 0.1271 0.3904 0.6860 1.0610 1.3210 1.7170 2.0740 2.5080 2.8190
23 0.1271 0.3902 0.6850 1.0600 1.3190 1.7140 2.0687 2.5000 2.8070
24 0.1270 0.3900 0.6850 1.0590 1.3180 1.7109 2.0639 2.4922 2.7970
25 0.1269 0.3898 0.6840 1.0580 1.3160 1.7080 2.0595 2.4850 2.7870
26 0.1269 0.3896 0.6840 1.0580 1.3150 1.7060 2.0560 2.4790 2.7790

14
Example
Test the coefficient of correlation between the number of TV
ads and number of cars sold at 1% level of significance?
16
17
Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value
of at least one independent variable (regression equation)
– Explain the impact of changes in an independent variable on
the dependent variable (R2 or r2)

Dependent variable: the variable we wish to explain


Independent variable: the variable used to explain the
dependent variable

18
Simple Linear Regression Model

• Only one independent variable, x


• Relationship between x and y is described
by a linear function
• Changes in y are assumed to be caused by
changes in x

19
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

20
 The estimated simple linear regression equation

ŷ  b0  b1 x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
The Regression Equation
• The formulas for b1 (slope) and b0 (intercept) are:
 
b1 
 (x  x)(y  y) algebraic equivalent for b1:

 (x  x) 
2

 xy   x y
and b1  n

 x 2

(  x ) 2

b0  y  b1x n

22
Simple Linear Regression
Example
• A real estate agent wishes to examine the relationship
between the number of tv ads and number of cars
sold
• A random sample of 5 cars is selected
– Dependent variable (y) = number of cars sold
– Independent variable (x) = number of tv ads

23
Find the least square regression equation
Number of Number of
TV Ads Cars Sold

1 14
3 24
2 18
1 17
3 27
25
26
Coefficient of Determination, R2
• The coefficient of determination or R2 is the percentage of the total
variation in the dependent variable(y) that is explained by the
independent variable(x).If the value of R2 close to 1, it indicates a
‘good fit’ model.
• From example, R2 =

SSR
R 2
where 0 R 1 2
SST

27
• SST = total sum of squares  ( y  y) 2

– Measures the variation of the yi values
around their mean y
• SSE = error sum of squares  ( y  ˆ
y ) 2

– Unexplained variation attributable to
factors other than the relationship
between x and y
• SSR = regression sum of squares  ( ˆ
y  y ) 2

– Explained variation attributable to the
relationship between x and y

28
Significance of the regression slope
coefficient

 Ho: β = 0 ( no linear relationship between x and y)


H1: β ≠ 0 ( there is a linear relationship between x and y)

The test statistics is calculated using:

t= where b1 is the slope coefficient and sb1 is the


standard error of the slope

If t test statistics > t critical value or t test statistics < - t critical value ,
reject Ho.
SUMMARY OUTPUT

Regression Statistics  
Multiple R 0.762114627
R Square 0.580543792
Adjusted R Square 0.5283192906
Standard Error 41.330323650
Observations 10

ANOVA
  df SS MS F Significance F
Regression SSR   ( yˆ 1 y ) 
2
18934.9352 18934.935 11.08476 0.010394
Residual SSE   ( y 7 yˆ ) 
2
13665.577 1708.19565
Total SST   ( y 
8 y)2  32600.5000      

  Coefficients Standard Error t Stat P-value


Intercept 98.248333 58.033478584 1.69200588 0.12896016
Sq Feet 0.109768 0.032969 3.2943067 0.010394
30
A study was recently done in which the following regression output
was generated using Excel.
SUMMARY OUTPUT

31
32
33

You might also like