0% found this document useful (0 votes)
109 views

Simple Linear Regression Analysis

Regression analysis is used to understand the relationship between two variables and predict the value of one variable based on another. A regression model contains an independent (predictor) variable and a dependent (response) variable. Linear regression estimates the coefficients of the linear equation that best predicts the dependent variable from the independent variables. The method of least squares is used to determine the regression line that minimizes the sum of the squared residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Simple Linear Regression Analysis

Regression analysis is used to understand the relationship between two variables and predict the value of one variable based on another. A regression model contains an independent (predictor) variable and a dependent (response) variable. Linear regression estimates the coefficients of the linear equation that best predicts the dependent variable from the independent variables. The method of least squares is used to determine the regression line that minimizes the sum of the squared residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Regression Analysis

Regression Analysis
 Regression Analysis is used to:
1)understand the relation between two variables
2)predict the value of one variable based on another
variable.
 A regression model is comprised of a dependent
(response) variable and an independent (predictor)
variable.

Independent Variable(s) Dependent Variable

Prediction Relationship
Regression Analysis

 Linear regression estimates the coefficients of the


linear equation, involving one or more independent
variables that best predict the value of the
dependent variable.

 If you believe that none of your predictor variables


is correlated with the errors in your dependent
variable, you can use the linear regression
procedure
Simple Linear Regression
The Scatter Diagram – used to graphically
investigate relationship between the dependent
and independent variables

100
Y 50
0
0 20 X 40 60
Plot of all (Xi , Yi) pairs
Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


Simple Linear Regression Model
 Regression models are used to test if a relationship
exist between variables; that is to use one variable to
predict another. However, there is a random error that
cannot be predicted.

Y intercept Random
Error

Yi   0  1 X i   i
Dependent
(Response)
Slope
Variable Independent
(Predictor/Explanatory)
Variable
Population Linear Regression
Model

Y Yi   0  1X i   i Observed
Value

i = Random Error

   0  1X i
YX

X
Observed Value
Sample Linear Regression Model

yˆ i  b0  b1 xi

yi = Predicted Value of Y for observation i

xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate


of the population 0
b1 = Sample Slope used as estimate of the
population 1
Sample Linear Regression Model

Sample data are used to estimate the true


values for the intercept and slope.

yˆ i  b0  b1 xi
The difference between the actual value of Y
and the predicted value (using sample data) is
known as the error.
Error = actual value – predicted value
 Yi
Sample Linear Regression Model

yˆ i  b0  b1 xi
n
 n  n 
n  xi yi    xi   yi 
b1  i 1  i 1  i 1 
2
n
 n

n  xi    xi 
2

i 1  i 1 
b0  y  b1 x
Table 3.1. Intelligence Test Scores and Freshmen Chemistry Grades
Test Score Chemistry
Student (x) Grade (y)
1 65 85
2 50 74
3 55 76
4 65 90
5 55 85
6 70 87
7 65 94
8 70 98
9 55 81
10 70 91
11 50 76
12 55 74
Figure 3.1. Scatter Diagram with regression line

100

95
yˆ i  b0  b1 xi
Chemistry Grade

90

85
Determining point
80 estimate of b0 and b1
Using the Method of
75
Least Squares
70
40 45 50 55 60 65 70 75
Intelligence Test Score
Measures of Variation: The
Sum of Squares

Y 
SSE =(Yi - Yi )2
_ b Xi
 b0 + 1
SST = (Yi - Y) 2
Yi =
 _
SSR = (Yi - Y)2
_
Y

X
Xi
Method of Least Squares
n n
SSE   e   ( yi  b0  b1 xi )
2
i
2

i 1 i 1
n

The process of differentiating i 1 i


e 2
with respect
to b0 and b1 and equating the derivatives to zero
n
 n  n 
n xi yi    xi   yi 
b1  i 1  i 1  i 1  b0  y  b1 x
2
n
 n

n xi    xi 
2

i 1  i 1 
Method of Least Squares

n
 n  n 
n xi yi    xi   yi 
b1  i 1  i 1  i 1 
2
n
 n

n xi    xi 
2

i 1  i 1 

 (x i  x )( yi  y )
b1  i 1
n

 i
( x
i 1
 x ) 2
Table 3.1. Intelligence Test Scores and Freshmen Chemistry Grades
Test Score Chemistry
Student (x) Grade (y)
1 65 85
2 50 74
b1  0.897
3 55 76
b0  30.056
4 65 90
5 55 85
yˆ i  b0  b1 xi
6 70 87
yˆ i  30.056  0.897 xi
7 65 94
8 70 98
9 55 81
10 70 91
11 50 76
12 55 74
Figure 3.1. Scatter Diagram with regression line

100

95 yˆ i  30.056  0.897 xi
Chemistry Grade

90

85

80

75 The slope of 0.897 means for


each increase of one unit in
70
intelligence Test Score (X),
40 45 50 55 60 65 70 75
the Chemistry Grade (Y) is
Intelligence Test Score estimated to increase by
0.897 units.
Using SPSS
Graphs To add regression line Use
Scatter
Simple SPSS Chart Editor

100 Chart
Options
Fit Line

90

Regression
Line
80
Chemistry Grade

Regression
70 Rsq = 0.7438 Prediction
40 50 60 70 80
Line
Test Score
Using SPSS
Analyze
Regression
Linear

Coefficientsa a
Coefficients
Unstandardized Standardized
Unstandardized Standardized
Coefficients Coefficients
Coefficients Coefficients
Model B Std. Error Beta t Sig.
Model B Std. Error Beta t Sig.
1 (Constant) 30.043 10.137 2.964 .014
1 (Constant) 30.043 10.137 2.964 .014
Test Score .897 .167 .862 5.389 .000
Test Score .897 .167 .862 5.389 .000
a. Dependent Variable: Chemistry Grade
a. Dependent Variable: Chemistry Grade

yˆ i  30.043  0.897 xi
Using SPSS Standard Deviation
Analyze Coefficient of
Regression Correlation Determination around the
Linear regression line

Model Summaryb b
Model Summary
Adjusted Std. Error of
Adjusted Std. Error of
Model R R Square R Square the Estimate
Model R a R Square R Square the Estimate
1 .862 .744 .718 4.319
1 .862a .744 .718 4.319 Measures of
a. Predictors: (Constant), Test Score
a. Predictors: (Constant), Test Score
b. Dependent Variable: Chemistry Grade
Variation
b. Dependent Variable: Chemistry Grade

ANOVAb b
ANOVA
Sum of
Sum of
Model Squares df Mean Square F Sig.
Model Squares df Mean Square F Sig. a
1 Regression 541.693 1 541.693 29.036 .000 a
1 Regression 541.693 1 541.693 29.036 .000
Residual 186.557 10 18.656
Residual 186.557 10 18.656
Total 728.250 11
Total 728.250 11
a. Predictors: (Constant), Test Score
a. Predictors: (Constant), Test Score
b. Dependent Variable: Chemistry Grade
b. Dependent Variable: Chemistry Grade
Testing the Significance of b

 Similar to a test on r in the one-predictor case

t =(0.8972136-0)/0.1665043 = 5.39 H0 is rejected,


i.e. the regression line has a nonzero slope
2
Variance Explained – r
r2 tells us the proportion of variance in Y which is
explained by X

r 
2
SS regression

SSYˆ

 Yˆ  Y 2

 Y  Y 
2
SS total SSY

• a ratio reflecting the proportion of variance captured


by our model relative to the overall variance in our
data
• highly interpretable: r2 =.50 means 50% of the
variance in Y is explained by X
Linear Regression Assumptions

For Linear Models


 1. Normality
 Y Values Are Normally Distributed For Each
X
 Probability Distribution of Error is Normal

 2. Homoscedasticity (Constant Variance)


 3. Independence of Errors
Variation of Errors Around the
Regression Line

y values are normally distributed


f(e) around the regression line.
For each x value, the “spread” or
variance around the regression line
is the same.

Y
X2
X1
X
Regression Line
Residual Analysis

 Purposes
 Examine Linearity
 Evaluate violations of assumptions

 Graphical Analysis of Residuals


 Plot residuals Vs. Xi values 
 Difference between actual Yi & predicted Yi

 Studentized residuals:
 Allows consideration for the magnitude of the
residuals
Residual Analysis for Linearity

Not Linear
 Linear
e e

X X
Residual Analysis for Homoscedasticity

Heteroscedasticity 
SR
Homoscedasticity
SR

X X

Using Standardized Residuals


• Predict Chemistry Grade
• Predict residual
• Predict studentized residual
• Predict standardized residual
Residual Analysis for
Normality

kdensity r, normal swilk r  Normal


kernel = epanechnikov, bandwidth = 2.25
kernel = epanechnikov, bandwidth = 2.25
Kernel density estimate
Kernel density estimate
Normal density
Normal density
.1
.1

.08
.08
Density

.06
Density

.06

.04
.04

Kernel density estimate


Kernel density estimate .02
.02
-10 -5 0 5 10
-10 -5 0
Residuals 5 10
Residuals
Residual Analysis for Linearity

scatter r X, yline(0)

5
5
 Linear

Residuals
Residuals
0
0

-5
-5
50 55 60 65 70
50 55 60
Test Score 65 70
Test Score
Residual Analysis for
Homoscedasticity

 Homoscedasticity
scatter r1 X, yline(0) scatter sr X, yline(0)
2 2
2 2

1 1
1 Standardized residuals 1

Studentized residuals
Standardized residuals

Studentized residuals
0 0
0 0

-1 -1
-1 -1

-2 -2
50 55 60 65 70 -2 50 55 60 65 70 -2
50 55 60
Test Score 65 70 50 55 60
Test Score 65 70
Test Score Test Score

Using Standardized Residuals Using Studentized Residuals


Residual Analysis for
Homoscedasticity

hettest
 Homoscedasticity
Residual Analysis for
Independence

scatter r obs, yline(0)


 Independent
5
5

Residuals
Residuals
0
0

-5
-5
0 5 10 15
0 5 obs 10 15
obs
Residual Analysis for
Independence

Durbin-Watson Statistic.
The D-W statistic is
defined as:
 Independent

You might also like