0% found this document useful (0 votes)

15 views

Lecture 12 Regression Edited

The document discusses using simple linear regression to model the relationship between psychology students' final grades in Psych 284 and their starting salaries. It finds a significant positive correlation between the two variables and calculates a regression equation to predict starting salary based on final grade. The regression equation and a graph of the best fit line are presented to demonstrate how final grades can be used to predict starting salary.

Uploaded by

lamita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lecture 12 Regression Edited

Uploaded by

lamita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Regression

✤ Correlational analyses quantify the relationship

between two or more naturally occurring variables

✤ Simple linear regression analyses produce a

hypothetical model of the relationship between 2
variables that allows us to predict that value of one
variable based on the other.
Regression Grades
Annual Salary
(USD)

51 13,508
✤ What if AUB was interested in
84 26,156
predicting the starting salaries
of psychology students based 81 26,000
on their final grades in Psych 63 17,156
284?
74 22,256
✤ This would only be relevant 88 29,600
given a significant correlation
53 15,680
between the variables.
79 22,940

r=.97, p<.05, right tailed 86 29,000

71 19,544
Linear Model

Relationship beween Psyc 284 Final Grades and Starting Salary

30000

25000

Regression analyses give us a

Annual Salary (USD)

way to model the relationship
between two or more 20000

variables using a straight line.

15000

50 60 70 80
Final Grades
Linear Model

Relationship beween Psyc 284 Final Grades and Starting Salary

30000

25000

No line can fit all the data

Annual Salary (USD)

perfectly. The deviation
between a line and the 20000

datapoint is called a residual.

15000

50 60 70 80
Final Grades
Linear Model

Relationship beween Psyc 284 Final Grades and Starting Salary

30000

Residuals

25000
Annual Salary (USD)

20000

15000

50 60 70 80
Final Grades
Line of Best Fit…

…is the line that

Relationship beween Psyc 284 Final Grades and Starting Salary

minimizes the total 30000

residuals.
25000

y^i=b0 + b1(xi)

Annual Salary (USD)

20000

There are formulas for obtaining a

slope and intercept for a given set of
data. 15000

or, 50 60 70 80
Final Grades

We can use R.
Line of Best Fit

Relationship beween Psyc 284 Final Grades and Starting Salary

Regression equation 30000

25000

y^i=b0 + b1(xi)

Annual Salary (USD)

20000

Predicted value of Y

15000

50 60 70 80
Final Grades
Line of Best Fit

Relationship beween Psyc 284 Final Grades and Starting Salary

Regression equation 30000

25000

y^i=b0 + b1(xi)

Annual Salary (USD)

20000

Y-intercept- the predicted

value of y when x is 0 15000

50 60 70 80
Final Grades
Line of Best Fit

Relationship beween Psyc 284 Final Grades and Starting Salary

Regression equation 30000

25000

y^i=b0 + b1(xi)

Annual Salary (USD)

20000

slope - the change in the

predicted value of y for 15000

every 1 unit change in x

50 60 70 80
Final Grades
Regression Equation

Relationship beween Psyc 284 Final Grades and Starting Salary

^
30000

yi=b0 + b1(xi)
25000

Annual Salary (USD)

In this example, the regression
equation is: 20000

^
yi= -7634.4 + 408.5(xi) 15000

50 60 70 80
Final Grades
Regression Equation

Relationship beween Psyc 284 Final Grades and Starting Salary

30000

y^i= -7634.4 + 408.5(xi) 25000

Annual Salary (USD)

20000

You can use this regression line to

predict the salary of someone based
on their Psych 284 final grade. 15000

50 60 70 80
Final Grades
Regression Equation Annual Salary
Grades
(USD)

51 13,508
^
yi= -7634.4 + 408.5(xi) 84 26,156

81 26,000
Based on the data, what would we predict
is the salary for someone who completed 63 17,156
Psyc 284 with:
74 22,256
a 51? ^
yi= -7634.4 + 408.5(51) = $13,199.10/yr
88 29,600
a 88? ^ 53 15,680
yi= -7634.4 + 408.5(88) = $28,313.60/yr
79 22,940

86 29,000

71 19,544
Regression Equation

Predicted Value of Y Relationship beween Psyc 284 Final Grades and Starting Salary
30000

^
yi=b0 + b1(xi)
25000

Annual Salary (USD)

Obtained Value of Y 20000

yi=b0 + b1(xi) + ei 15000

50 60 70 80
Final Grades
Regression Equation

Predicted Value of Y Relationship beween Psyc 284 Final Grades and Starting Salary
30000

^
yi=b0 + b1(xi)
25000

Annual Salary (USD)

Obtained Value of Y 20000

yi=b0 + b1(xi) + ei 15000

50 60 70 80
Final Grades
Residual, difference between
predicted Y and obtained Y
Regression Equation

Predicted Value of Y Relationship beween Psyc 284 Final Grades and Starting Salary
30000

^
yi=b0 + b1(xi)
25000

Annual Salary (USD)

Obtained Value of Y 20000

yi=b0 + b1(xi) + ei 15000

50 60 70 80
Final Grades
A regression equation
produces a line that
minimizes ei
Regression Analysis

The regression line is only a model based on the data.

This model might not reflect reality.

We need some way of testing how well the model fits the observed
data.

How?
Regression Analysis

Measure how much better our regression model is than a model based
on the mean.

The mean represents the best guess for Y when we have no other
information.

If grades are a significant predictor of salary, then we would expect our

regression line to better fit the data than the mean line.

If grades are NOT a significant predictor of salary, then modeling the

relationship using a regression line will not be an improvement over the
mean.
SUM OF
SQUARES
Comparing residuals of two
competing models
1. mean
2. regression equation
TOTAL SUM OF SQUARES
A number that represents total variation in Y irrespective of X. i.e., b1 = 0
Relationship beween Psyc 284 Final Grades and Starting Salary
30000

25000
Annual Salary (USD)

Total squared
Average
deviations between
data and mean
20000 Salary = $22,184

15000

50 60 70 80
Final Grades

SST
MODEL SUM OF SQUARES
A number that represents systematic variation
Relationship beween Psyc 284 Final Grades and Starting Salary
30000

25000

Total squared
Annual Salary (USD)

deviations between
regression line and 20000

mean

15000

50 60 70 80
Final Grades
RESIDUAL SUM OF SQUARES
A number that represents unsystematic variation
Relationship beween Psyc 284 Final Grades and Starting Salary
30000

25000

Total squared
Annual Salary (USD)

deviations between
data and regression 20000

line

15000

50 60 70 80
Final Grades

SSR
TOTAL VARIANCE
IN Y, THE OUTCOME VARIABLE

SST
=
SYSTEMATIC VARIANCE
ACCOUNTED FOR BY THE REGRESSION MODEL

+
UNSYSTEMATIC VARIANCE
NOT ACCOUNTED FOR BY THE MODEL

SSR
HYPOTHESIS TESTING

Null Hypothesis H0 : There is no relationship between the two

variables. b1 = 0

Alternate Hypothesis H1 : There is a significant relationship

between the two variables. b1 ≠ 0

1) Overall regression model fits the data better than a model based on the
mean (F-test).
2) Individual predictors are significantly related to the outcome variable
(t-test).
HYPOTHESIS TESTING

Null Hypothesis H0 : There is no relationship between the two

variables. b1 = 0

Alternate Hypothesis H1 : There is a significant relationship

between the two variables. b1 ≠ 0

If b1 = 0, then SSM will also be close to 0 => applying a regression equation

does not systematically explain much of the total variation in the outcome.

If b1 ≠ 0, then SSM will be larger than 0, implying that a portion of the total
variation in the outcome is systematically explained by applying a
regression equation.
EXPLAINING VARIATION
b1 will rarely be exactly 0. We need to figure out whether the amount of
systematic, explained variation is substantially greater than the leftover,
unsystematic, unexplained variation.

In other words, is SSM larger than SSR?

To test whether the regression model accounts for more systematic

variation (SSM) than unsystematic, error variance (SSR), we use an F test.

Systemic Variance
F =Unsystematic
———— Variance
MEAN SQUARES
Because Sum of Squares are affected by the sample size, we use the
“average” sum of squares, called mean squares.

SSM
MSM = ——— dfM = k =
dfM #of predictors
MSM
F = ———
SSR
MSR MSR = ———
dfR dfR = n - k - 1
F test
If H0 is true, the regression model does not account for significantly more systematic
variation than unsystematic variation. Thus, we would expect MSM to be no larger than MSR,
and F less than or equal to 1.
If H0 is false, the regression model does account for significantly more systematic variation
than unsystematic variation. Thus, we would expect MSM to be significantly larger than MSR,
and F greater than 1.
SSM
MSM = ———
dfM dfM = 1
MSM
F= ———
MSR SSR
MSR = ———
dfR
dfR = n - 2
F distribution
The F distribution is a family of distributions based on the degrees of
freedom associated with MSM and MSR (and therefore the sample size).

Our F test is comparing the obtained F against a critical F associated with an

given alpha (typically .05).

If the null hypothesis is true, what is the probability of obtaining our F value?
If it is less than alpha (.05), I would infer that the null hypothesis is not true.
Assessing the importance of the
predictor variable
It is not enough to establish that our regression model results in a better fit
than a model based on the mean. We also need to evaluate the importance of
the actual predictor.

If the predictor is not related to the outcome, we would expect b1 = 0.

Again, b1 is rarely equal to zero. We need a way to figure out whether b1 is

significantly different from 0.
Is the coefficient significantly
For this, we use a t-test. different from zero?

yi= -7634.4 + 408.5 (xi)

t value associated with our b1

We convert b1 into a t-value, and that t-value is assessed against a t-distribution.

b1 - 0 b1 under H0
t = ———
SEb1
Standard error of b1 is the measure of variability between all possible b1
that could be obtained if we collected all possible samples of X and Y

If the b1 are similar, SE will be small.

T distribution
The t distribution is a family of distributions based on the degrees of freedom.
In simple linear regression analyses, the degrees of freedom associated with the
t-test is N - 2.

Our t test is comparing the obtained t against a critical t associated with an given
alpha (typically .05).
Assuming the null hypothesis is true, what is the probability of obtaining our t
value? If it is less than alpha (.05), I would infer that the null hypothesis is not true.
Regression in R - Function and
Output
lm(Outcome ~ Predictor)
Reporting and Interpreting
Regression Output
Based on these data, we reject the null hypothesis. Psyc
284 grades significantly predict one’s starting salary,
t(8)=12.12, p<.001, two-tailed. 94.8% of the variability in
starting salary is accounted for by Psyc 284 grade. The
regression model fits the data well overall, F(1,8)=146.9,
p<.001.
Assumptions of a hypothesis test for
a simple linear regression

Outcome data are independent.

Relationship between the variables is linear.
Outcome variable must be continuous.

Residuals are normally distributed.

Equality of variance (homoscedasticity).

BCO - 2011 - Good Practice in The Selection of Construction Materials - March 2011
No ratings yet
BCO - 2011 - Good Practice in The Selection of Construction Materials - March 2011
50 pages
Makesupply Tri Fold Wallet Template Set - A4 1 PDF
No ratings yet
Makesupply Tri Fold Wallet Template Set - A4 1 PDF
3 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Aqt 1
No ratings yet
Aqt 1
33 pages
300b-l1 2017 CV Notes
No ratings yet
300b-l1 2017 CV Notes
11 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Psych Stat Reviewer Midterms
No ratings yet
Psych Stat Reviewer Midterms
10 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Lecture 25 - Multiple Regression
No ratings yet
Lecture 25 - Multiple Regression
34 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
CH 14 Ans
No ratings yet
CH 14 Ans
3 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Lab 9 Report
No ratings yet
Lab 9 Report
5 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
3 Simple Linear Regression
No ratings yet
3 Simple Linear Regression
71 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
C2 English
No ratings yet
C2 English
34 pages
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
No ratings yet
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
39 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Lecturer 10 UET
No ratings yet
Lecturer 10 UET
54 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
P4 New - CHeat Sheet End-Term
No ratings yet
P4 New - CHeat Sheet End-Term
7 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Chapter14
No ratings yet
Chapter14
65 pages
Quants
No ratings yet
Quants
8 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
F_Regression
No ratings yet
F_Regression
65 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Econometrics Chapter 8 PPT Slides
100% (1)
Econometrics Chapter 8 PPT Slides
42 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Lecture 2 - LRM
No ratings yet
Lecture 2 - LRM
43 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
No ratings yet
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
37 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
Lecture Note 7 - Part 2
No ratings yet
Lecture Note 7 - Part 2
4 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
C2-English
No ratings yet
C2-English
33 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Click - March - 2021 Tu 2-38
No ratings yet
Click - March - 2021 Tu 2-38
37 pages
Presidential Decree To The Avaition Act (KOR Merged)
No ratings yet
Presidential Decree To The Avaition Act (KOR Merged)
178 pages
Preparation and Reaction Mechanism of Alcohol
No ratings yet
Preparation and Reaction Mechanism of Alcohol
26 pages
2024 KPS Service Training Schedule Q2
No ratings yet
2024 KPS Service Training Schedule Q2
11 pages
Dystopia Rising Evolution Wasteland Wanderers - 1 - 15 - 2022
No ratings yet
Dystopia Rising Evolution Wasteland Wanderers - 1 - 15 - 2022
5 pages
Modifiers 1
No ratings yet
Modifiers 1
35 pages
Direct Acting Excess Pressure Valves
No ratings yet
Direct Acting Excess Pressure Valves
8 pages
Module 7 - Geothermal Power Plant
100% (1)
Module 7 - Geothermal Power Plant
20 pages
Fixtures Dimensions & Drawings
No ratings yet
Fixtures Dimensions & Drawings
6 pages
PV Elite 2008
No ratings yet
PV Elite 2008
3 pages
Turning Torso 3 Print
No ratings yet
Turning Torso 3 Print
16 pages
Baby Names
No ratings yet
Baby Names
10 pages
Network Programmability and Automation 2nd Edition (Fifth Early Release) Christian Adell 2024 Scribd Download
100% (1)
Network Programmability and Automation 2nd Edition (Fifth Early Release) Christian Adell 2024 Scribd Download
55 pages
(Ebook) ECG Mastery: The Simplest Way to Learn the ECG by Kühn MD, Peter, Lang, Clemens, Wiesbauer MD MPH, Franz ISBN 9783950394405, 3950394400 - The latest updated ebook version is ready for download
100% (2)
(Ebook) ECG Mastery: The Simplest Way to Learn the ECG by Kühn MD, Peter, Lang, Clemens, Wiesbauer MD MPH, Franz ISBN 9783950394405, 3950394400 - The latest updated ebook version is ready for download
46 pages
Categorisation of Farmers
No ratings yet
Categorisation of Farmers
4 pages
ASTM-A877-A877M-10
No ratings yet
ASTM-A877-A877M-10
2 pages
Edward William Lane's Lexicon - Volume 3 - Page 333 To 446
100% (1)
Edward William Lane's Lexicon - Volume 3 - Page 333 To 446
113 pages
Geiger Counter (Practical-Electronics-1975)
No ratings yet
Geiger Counter (Practical-Electronics-1975)
4 pages
The Degradation of Coatings by Ultraviolet Light PDF
No ratings yet
The Degradation of Coatings by Ultraviolet Light PDF
4 pages
Đề cương ôn tậ
No ratings yet
Đề cương ôn tậ
3 pages
The Threshing Floor
No ratings yet
The Threshing Floor
4 pages
AP Biology Course at A Glance 0
No ratings yet
AP Biology Course at A Glance 0
3 pages
Dilucion Combustile Revelador
No ratings yet
Dilucion Combustile Revelador
6 pages
The Cirata Floating Solar Photovoltaic Power Plant Project: Comments or Notes
No ratings yet
The Cirata Floating Solar Photovoltaic Power Plant Project: Comments or Notes
2 pages
What Is Geophysics?: Comes in Two Basic Flavors
100% (1)
What Is Geophysics?: Comes in Two Basic Flavors
17 pages
Esu Plates
No ratings yet
Esu Plates
3 pages
The Influence of Rotor Downwash On Spray Distribution Under A Quadrotor Unmanned Aerial System
No ratings yet
The Influence of Rotor Downwash On Spray Distribution Under A Quadrotor Unmanned Aerial System
17 pages