0% found this document useful (0 votes)

62 views26 pages

Lecture 4. Part 1 - Regression Analysis

The document discusses assumptions that must be satisfied in regression analysis, including independence, normality, homoscedasticity, linearity, and no multicollinearity. It provides steps to check each assumption, such as the Durbin-Watson test for independence, histograms and chi-square tests for normality, and plots or Levene's test for homoscedasticity. Scatterplots are shown to check for linear relationships between variables.

Uploaded by

Richelle Pausang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views26 pages

Lecture 4. Part 1 - Regression Analysis

Uploaded by

Richelle Pausang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

STT153A

Lecture 4. Regression Analysis

Assumption Checking
The following are assumptions about the error terms (residuals) in a regression
model:
• Independence (through Durbin-Watson test; Satisfied)
• Normality (through histogram (visual) or chi-square test (formal)) (Satisfied)
• Chi-square test:
H0: the residuals follow a normal distribution
Ha: the residuals do not follow a normal distribution
• Homoscedasticity = constant variance (check through plots or levene’s test)

Additionally, we also have these assumptions

• Linearity (through plots)
• No multicollinearity (Satisfied)
Homoscedasticity
Graphical Summary for JOB_PROF
Shapiro-Wilk p: 0.642
7

Your data needs to show 6

Mean: 92.20

homoscedasticity, which is where Std.Dev.:

Variance:
19.42
377

the variances along the line of best 5 Std.Err.Mean

Skewness:
3.885
0.107
fit remain similar as you move 4
Valid N: 25.00
along the line. 3
Minimum: 58.00
Lower Quartile 78.00
2
Median: 94.00
Upper Quartile 109
Statistics-> Basic Statistics-> 1
Maximum: 127

Descriptive -> Graph 2 -> Variable 0

50 60 70 80 90 100 110 120 130 140
95% Confidence for Std Dev

Job_prof
Lower 15.17
Upper 27.02
Median, Inter-quartile Range & Non-outlier Range
95% Confidence for Mean
Mean & 95% Confidence Interval
Lower 84.18
Upper 100
Mean & 95% Prediction Interval 95% Prediction for Observation
Lower 51.32
Upper 133
50 60 70 80 90 100 110 120 130 140
Homoscedasticity or Homogeneity of
Variances (Levene’s Test)
Homogeneous means the same in structure or
composition. This test gets its name from the null
hypothesis, where we claim that the distribution of
the responses are the same (homogeneous) across
groups.
Ho: The distribution of responses are the same
across groups
Ha: At least one of the distribution is different.
Decision: P-value is 0.886 which is not significant,
hence fail to reject the Null hypothesis.

Conclusion: The distribution of responses for Job

proficiency rating is the same across groups. In other
words, the variance is the same through out the
population.
-> Statistics-> General
Linear Models (GLM)-
>Advance Linear or
Non linear model->
General Linear Model
-> click General linear
Models -> ok

Variables -> Select

dependent and
independent
variables -> ok –> ok
More results ->
Assumptions tab -
> Levene’s test
(ANOVA)
Linearity
There needs to be a linear relationship between (a) the dependent variable and each of your
independent variables, and (b) the dependent variable and the independent variables collectively.

Scatterplot: JOB_PROF vs. TEST1

Scatterplot: JOB_PROF vs. TEST2
TEST1 = 53.796 + .53757 * JOB_PROF
TEST2 = 65.921 + .44250 * JOB_PROF
Correlation: r = .51441
160 Correlation: r = .49701
140
150

140 130

130
120
120

110
TEST1

110

TEST2
100
100
90

80 90
70
80
60

50
50 60 70 80 90 100 110 120 130 70
50 60 70 80 90 100 110 120 130
JOB_PROF 0.95 Conf.Int.
JOB_PROF 0.95 Conf.Int.
Linearity
There needs to be a linear relationship between (a) the dependent variable and each of your
independent variables, and (b) the dependent variable and the independent variables collectively.

Scatterplot: JOB_PROF vs. TEST3 Scatterplot: JOB_PROF vs. TEST4

TEST3 = 63.091 + .40899 * JOB_PROF TEST4 = 50.621 + .47787 * JOB_PROF
Correlation: r = .89706 Correlation: r = .86939
120 115

115 110

110
105

105
100
100
TEST3

TEST4
95
90
90
85
85
80
80
75
75
50 60 70 80 90 100 110 120 130
70
JOB_PROF 0.95 Conf.Int. 50 60 70 80 90 100 110 120 130
JOB_PROF 0.95 Conf.Int.
Comparison of Model
Consider the following variables from Jobprof

Dependent Variable:
Job Proficiency Rating
Independent Variables :
Test 1
Test 2
Test 3
Test 4
Model Comparison
MODEL 1: SLRM (Test 1)
• R-Sq = 26.46%
• Adj R-Sq = 23.26%
MODEL 2: (Test 1 and Test 2)
• R-Sq = 46.41%
• Adj R-Sq = 41.54%
MODEL 3: full model
• R-Sq = 96.29%
• Adj R-Sq = 95.54%
Model Comparison
MODEL 4: sig predictors (Test 1,3, and 4) /
FORWARD STEPWISE
• R-Sq = 96.15%
• Adj R-Sq = 95.60%

MODEL 5: Backward Stepwise Model (Test

1,3,and 4)
• R-Sq = 96.15%
• Adj R-Sq = 95.60%
Comparison of Model
Consider the following variables from GPA data

Dependent Variable:
First-Year GPA (𝑌)
Independent Variables :
SATMath (𝑋1 )
SATVerbal(𝑋2 )
HSMath (𝑋3 )
HSEnglish (𝑋4 )
Model Comparison
MODEL 1: SLRM (SATMath (𝑋1 ))
• R-Sq = 72.17%
• Adj R-Sq = 70.62%
• RMSE =
MODEL 2: (SATMath (𝑋1 ) and SATVerbal (𝑋2 ))
• R-Sq = 81.10%
• Adj R-Sq = 78.88%
• RMSE =
MODEL 3: full model
• R-Sq = 85.28%
• Adj R-Sq = 81.35%
• RMSE =
Model Comparison
MODEL 4: FORWARD STEPWISE
• R-Sq = 85.04%
• Adj R-Sq = 82.23% (OPTIMAL R-Sq with
least number of predictors)
• RMSE = 0.23

MODEL 5: Backward Stepwise Model

• R-Sq = 71.17%
• Adj R-Sq =70.62 %
• RMSE =
Residuals Analysis and Model
Diagnostics
Root Mean Squared Error
The Root Mean Squared Error (RMSE) is one of the two main performance indicators for a
regression model. The other one is Mean Square Error (MSE). RMSE measures the average
difference between values predicted by a model and the actual values. It provides an estimation
of how well the model is able to predict the target value (accuracy).

The lower the value of the Root Mean Squared Error, the better the model is. A perfect
model (a hypothetic model that would always predict the exact expected value) would have a
Root Mean Squared Error value of 0.

The Root Mean Squared Error has the advantage of representing the amount of error in the
same unit as the predicted column making it easy to interpret. If you are trying to predict an
amount in dollars, then the Root Mean Squared Error can be interpreted as the amount of error
in dollars.
RMSE value interpretation
The closer RMSE is to 0, the more accurate the model is. But RMSE is returned on the same
scale as the target you are predicting for and therefore there isn’t a general rule for how to
interpret ranges of values. The interpretation of your value can only be evaluated within your
dataset.

Let’s try to unpack this more by looking at an example.

An RMSE of 1,000 for a house price prediction model is most likely seen as good because house
prices tend to be over $100,000. However, the same RMSE of 1,000 for a height prediction model
is terrible as the average height is around 175cm.
Root Mean Squared Error
𝑛
𝑂𝑏𝑠 − 𝑃𝑟𝑒𝑑 2
𝑅𝑀𝑆𝐸 = ෍
𝑛
𝑖=1
Example 1
Temperature Ice Cream
°C (x) Sales(y)

14.2 215
16.4 325
11.9 185
15.2 332
18.5 406
22.1 522
19.4 412
25.1 614 𝑛
2
𝑂𝑏𝑠 − 𝑃𝑟𝑒𝑑
23.4 544 𝑅𝑀𝑆𝐸 = ෍
𝑛
𝑖=1
18.1 421
22.6 445
17.2 408
Interpretation: RMSE of 34 dollars is good enough when the sample mean of ice
cream sales is 402 dollars..
Example 2
Use the GPA data and the best model earlier which is Model 4 with
predictors SATMath(𝑥1 ) and SATVerbal(𝑥2 ) for the dependent variable
Firs Year GPA(𝑦).
The model is 𝑦ො = 0.002185𝑥1 + 0.001312𝑥2 .

Interpretation: RMSE of 0.23 is good enough when the

sample mean of First Year GPA is 2.50 given that the scale
for GPA is 1.0 to 4.0.
Obtaining the Difference of Observed and
Predicted or the residual using Statistica.
Statistics -> Multiple Regression -> Variable Dependent First Year GPA,
Independent (SAT Math, SAT Verbal, HSMath, HSEnglish) ->Ok-
>Advance -> Click Advance Options box ->Ok
Stepwise tab-> Method: Forward Stepwise (Model4) ->ok
Residual/ assumptions /Prediction Tab -> Click Perform residual analysis
->
Save tab-> Save residuals and Predicted -> First year GPA ->Ok
-> Copy columns 1 to 3 and paste in excel to solve for RMSE
Cook’s distance
A measure for identifying outliers/influential points.

Threshold (rule of thumb) = 4/n

Example. GPA data
Compute for cook’s distance for Model 4 of GPA data. 4/n=4/20=0.2
Remove observations with Cook’s D greater than 4/n=4/20=0.2.
Fit model 4 again to the new dataset.

Calculate R-Sq, Adj R-Sq and RMSE

MODEL 4: FORWARD STEPWISE (GPA after removal of outliers)

R-Sq = 90.72%
Adj R-Sq = 87.86%
RMSE = 0.168
Interpretation: RMSE of 0.168 is good when the sample
mean of First Year GPA is 2.51 given that the scale for GPA
is 1.0 to 4.0.
Homework Number 2: Due date May 26
1. Compute for the RMSE for Models 1, 2, 3, and 5 of GPA data.
1. Groups 1-3 Model1 (Simple Linear Model Using SATMath (𝑋1 ))
2. Groups 4-5 Model 2 (SATMath and SATEnglish)
3. Groups 6-7 Model 3 Full Model
4. Groups 8-10 Model 5 Backward Selection
2. Remove observations with Cook’s D greater than 4/n.
3. Fit model 4 again to the new dataset.
4. Calculate R-Sq, Adj R-Sq and RMSE

Hannah Arendt-Banality of Evil
50% (2)
Hannah Arendt-Banality of Evil
2 pages
Catalog Amp Ruang Teknik Group
100% (1)
Catalog Amp Ruang Teknik Group
23 pages
Technical Delay Report
100% (1)
Technical Delay Report
1 page
Self Assessment and Reflection 1
100% (2)
Self Assessment and Reflection 1
7 pages
Chapter 13 Homeostasis & Urinary System
No ratings yet
Chapter 13 Homeostasis & Urinary System
5 pages
Nova Southeastern Dissertation Guide
100% (2)
Nova Southeastern Dissertation Guide
4 pages
Msi Ms 16r41
100% (1)
Msi Ms 16r41
58 pages
Wind Energy
No ratings yet
Wind Energy
26 pages
Insit of Medicine Members 2008
No ratings yet
Insit of Medicine Members 2008
33 pages
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
No ratings yet
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
12 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
02 Electrochemistry Ques. Final E PDF
No ratings yet
02 Electrochemistry Ques. Final E PDF
21 pages
LSB Exercise 1 Boot Sequence
No ratings yet
LSB Exercise 1 Boot Sequence
11 pages
Examen Parcial AMERICA
No ratings yet
Examen Parcial AMERICA
11 pages
The Salvatore Saga Part
No ratings yet
The Salvatore Saga Part
45 pages
Tutorial Benzene and Phenol
No ratings yet
Tutorial Benzene and Phenol
4 pages
SurgeTesting EARbasics 0716
100% (1)
SurgeTesting EARbasics 0716
2 pages
After You Graduate You Get A Job in A Small
No ratings yet
After You Graduate You Get A Job in A Small
2 pages
Strategy Formulation
No ratings yet
Strategy Formulation
17 pages
Mapreduce Join Document
No ratings yet
Mapreduce Join Document
4 pages
Sir Sanny DLP
No ratings yet
Sir Sanny DLP
8 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Questionnaire For Employees
No ratings yet
Questionnaire For Employees
7 pages
VetcoGray S-Series SVXT
No ratings yet
VetcoGray S-Series SVXT
2 pages
Data Sheet 80x65 FS2GA 6 15
No ratings yet
Data Sheet 80x65 FS2GA 6 15
5 pages
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
100% (1)
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
6 pages
Strategic Management Notes 3-4
No ratings yet
Strategic Management Notes 3-4
7 pages
Eaton DS 265760 NZMN4 AE1000 en - GB 20241113
No ratings yet
Eaton DS 265760 NZMN4 AE1000 en - GB 20241113
4 pages
2025-26 10th Science 1 Pratical (Journal Writing0 - Ex 1-4,6,8,9,10,11.1745912089
No ratings yet
2025-26 10th Science 1 Pratical (Journal Writing0 - Ex 1-4,6,8,9,10,11.1745912089
15 pages
Ithm 605 Global Foodservice and Lodging Operations Syllabus
No ratings yet
Ithm 605 Global Foodservice and Lodging Operations Syllabus
16 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Lecture 4. Part 1 - Regression Analysis

Uploaded by

Lecture 4. Part 1 - Regression Analysis

Uploaded by

STT153A

Lecture 4. Regression Analysis

Additionally, we also have these assumptions

Your data needs to show 6

homoscedasticity, which is where Std.Dev.:

the variances along the line of best 5 Std.Err.Mean

Descriptive -> Graph 2 -> Variable 0

Conclusion: The distribution of responses for Job

Variables -> Select

Scatterplot: JOB_PROF vs. TEST1

Scatterplot: JOB_PROF vs. TEST3 Scatterplot: JOB_PROF vs. TEST4

MODEL 5: Backward Stepwise Model (Test

MODEL 5: Backward Stepwise Model

Let’s try to unpack this more by looking at an example.

Interpretation: RMSE of 0.23 is good enough when the

Threshold (rule of thumb) = 4/n

Calculate R-Sq, Adj R-Sq and RMSE

MODEL 4: FORWARD STEPWISE (GPA after removal of outliers)

You might also like