0% found this document useful (0 votes)

75 views

Lecture 3. Part 1 - Regression Analysis

Using stepwise regression on data from an aptitude test measuring job proficiency, the analysis found: - Forward and backward stepwise selection identified the same best fitting model using Test 3, Test 1, and Test 4 as predictors. - The final regression equation explained 96.15% of variation in job proficiency. - Assumption checks found residuals were independent and normally distributed with constant variance and predictors were not multicollinear.

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

Lecture 3. Part 1 - Regression Analysis

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

STT153A

Lecture 3. Regression Analysis

Variable Selection and Model Building
• Forward stepwise
Starting with a single predictor, we add predictors until the added
“explanatory power” is negligible.

• Backward stepwise
Starting with all possible predictors (full model), we delete
“insignificant” predictors
Forward stepwise
• Data File. This example is based on the examples data file
Job_prof.sta (from Neter, Wasserman, and Kutner, 1989, page
473). Open this data file by selecting Open Examples from the
File menu (classic menus) or by selecting Open Examples from
the Open menu on the Home tab (ribbon bar); it is in the
Datasets folder. The first four variables (Test1-Test4) represent
four different aptitude tests that were administered to each of
the 25 applicants for entry-level clerical positions in a company.
Regardless of their test scores, all 25 applicants were hired. Once
their probationary period had expired, each of these employees
was evaluated and given a job proficiency rating (variable
Job_prof).

• Research problem. Using stepwise regression, the variables (or

subset of variables) that best predict job proficiency will be
analyzed. Thus, the dependent variable will be Job_prof and
variables Test1-Test4 will be the independent or predictor
variables.

When Test2 was evaluated, the F value was less than the F to enter
value of 1.0, therefore, it was not entered into the model.
Forward stepwise
Now, according to the Forward stepwise regression procedure, the
subset of aptitude tests (independent variables) that best predicts the
job proficiency score (dependent variable) contains Test3, Test1, and
Test4. Therefore, the regression equation appears as follows:
𝑦 = 𝑏0 + 𝑏1 𝑥3 + b2 𝑥1 + 𝑏3 𝑥4

The final regression equation is:

𝑦 = −124.200 + 1.357∗ 𝑥3 + 0.296∗ 𝑥1 + 0.517∗ 𝑥4

p<0.000 which is less than 0.05 meaning model is significant

R-squared is 0.9615

Interpretation: 96.15% of the variation of the job proficiency rating can be

explained by the model with variables Test 3, Test 1 and Test 4.
Backward stepwise
Using the same data from
Forward Stepwise apply
backward stepwise and
observe the result.
The final regression equation is:
𝑦 = −124.382 + 1.306∗ 𝑥3 + 0.296∗ 𝑥1 + 0.520∗ 𝑥4
p<0.000 which is less than 0.05 meaning model is
significant
R-squared is 0.9555
Interpretation: 95.55% of the variation of the job proficiency
rating can be explained by the model with variables Test 3, Test
1 and Test 4.
Example 2.
Compare the forward stepwise and backward stepwise of FIES data
forward
backward
Model Diagnostics/Assumption
Checking
Assumption Checking
The following are assumptions about the error terms (residuals) in a regression
model:
• Independence (through Durbin-Watson test; Satisfied)
• Normality (through histogram (visual) or chi-square test (formal)) (Satisfied)
• Chi-square test:
H0: the residuals follow a normal distribution
Ha: the residuals do not follow a normal distribution
• Homoscedasticity = constant variance (check through plots or levene’s test)

Additionally, we also have these assumptions

• Linearity (through plots)
• No multicollinearity (Satisfied)
Independence of observation

A rule of thumb is that test statistic values in the range of 1.5

to 2.5 are relatively normal. Values outside of this range could
be cause for concern. Field(2009) suggests that values under 1
or more than 3 are a definite cause for concern
Independence of observation using Durbin-Watson
Statistics->Multiple Regression
-> Variables->Dependent variable
(job_prof) –independent variable- (1 2 3 4)-
>Ok
->Advance tab, click Advance Options box-
>Ok-> Ok
Click Residual/assumption/prediction tab->
click performance residual analysis
Click Advanced -> Durbin-Watson Statistic
Independence of observation

Interpretation: The Durbin-Watson is 1.148347 which is below the rule of thumb (1.5-2/5) for job
proficiency rating. The respondents in the data are different individuals with the assumption that
they do not affect the answer of one another. The value 1.14 can be tolerated.

Note: That the assumptions for independence of observation should be satisfied upon data
collection such as the respondents should only answer the questionnaire once, otherwise a
respondent answering more than once can lead to dependence of observation.
Normality of residuals by Histogram
Statistics->Multiple Regression
-> Variables->Dependent variable
(job_prof) –independent variable- (1 2 3 4)-
>Ok
->Advance tab, click Advance Options box-
>Ok-> Ok
Click Residual/assumption/prediction tab->
click performance residual analysis
Click Normal Plot of residuals
Normality of residual by scatter plot of
residuals Normal Probability Plot of Residuals
2.5
H0: the residuals follow a
normal distribution 2.0

1.5

1.0

Expected Normal Value

0.5

Therefore, the residuals 0.0

follow a normal
distribution. -0.5

-1.0

-1.5

-2.0

-2.5

https://online.stat.psu.ed -8 -6 -4 -2 0 2 4 6 8

u/stat501/lesson/4/4.6 Residuals
Normality of residual by histogram
Distribution of Raw residuals
Expected Normal
8

No of obs
4

0
-8 -6 -4 -2 0 2 4 6 8

https://online.stat.psu.edu/stat501/lesson/4/4.6/4.6.1
Normality using
chi-square
Statistics ->Basic Statistics-
>Tables and banners

Specify tables(select
variables)-> Job_prof ->
Test1 Test 2 Test3 Test 4 Ok-
>ok
Normality using chi-square

Options-> Expected Frequencies

-> Pearson &M-L Chi-square

->Advanced -> Detailed two-way

tables-> ok
Normality using chi-square
Chi-square test:
H0: the residuals follow a normal distribution
Ha: the residuals do not follow a normal distribution

Interpretation: None of the p-value is less than 0.05 hence not statistically
significant, therefore we fail to reject the null hypothesis. The residuals
follow a normal distribution.

https://www.youtube.com/watch?v=vn5a5lAL54I
Multicollinearity
This is when predictors (independent variables) are correlated with
each other.

Redundance.

Depending on the situation, it may not be a problem for your model if

only slight or moderate collinearity issue occurs. However, it is strongly
advised to solve the issue if severe collinearity issue exists(e.g.
correlation >0.8 between 2 variables or Variance inflation factor(VIF)
>20
Multicollinearity

• Looking among the independent variables, Test 1 to Test 4, none of

the independent variable has correlation more than 0.80. Although,
Test3 and Test4 have correction of 0.7820, this is acceptable.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Problem Set 4
No ratings yet
Problem Set 4
7 pages
The Concept of Education and Social Research
No ratings yet
The Concept of Education and Social Research
278 pages
Resolve Complex Problems With: The Pyramid Principle
97% (35)
Resolve Complex Problems With: The Pyramid Principle
22 pages
2017 - OPUS Quant Advanced PDF
100% (1)
2017 - OPUS Quant Advanced PDF
205 pages
Logistic Regression: Prof. Andy Field
No ratings yet
Logistic Regression: Prof. Andy Field
34 pages
Logic Philosophy and Human Existence I
100% (1)
Logic Philosophy and Human Existence I
37 pages
Chapter 3 - Basic Logical Concepts - For Students PDF
100% (1)
Chapter 3 - Basic Logical Concepts - For Students PDF
65 pages
Chapter 5 - Sensitvity Analysis
No ratings yet
Chapter 5 - Sensitvity Analysis
53 pages
SPSS Workshop PDF
No ratings yet
SPSS Workshop PDF
24 pages
A7 - One Way Anova
No ratings yet
A7 - One Way Anova
7 pages
1 T-Test
No ratings yet
1 T-Test
36 pages
Soft Sensor For Faulty Measurements Detection and Reconstruction in Urban Traffic
No ratings yet
Soft Sensor For Faulty Measurements Detection and Reconstruction in Urban Traffic
22 pages
Classic Assumption Testing
No ratings yet
Classic Assumption Testing
29 pages
Ch4 Independent Samples T-Test HO
No ratings yet
Ch4 Independent Samples T-Test HO
11 pages
Soal Statistika
No ratings yet
Soal Statistika
8 pages
Module-6-A-Inferential-Statistics-Non-parametric
No ratings yet
Module-6-A-Inferential-Statistics-Non-parametric
88 pages
0feaf24f-6a96-4279-97a1-86708e467593 (1)
No ratings yet
0feaf24f-6a96-4279-97a1-86708e467593 (1)
7 pages
How To Read A Scientific Paper - Large Herbivores and Snakes 5
No ratings yet
How To Read A Scientific Paper - Large Herbivores and Snakes 5
49 pages
Pages From SPSS For Beginners
No ratings yet
Pages From SPSS For Beginners
58 pages
Session 2 Non-Para Tests
No ratings yet
Session 2 Non-Para Tests
9 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
workshop notes
No ratings yet
workshop notes
16 pages
Anova
No ratings yet
Anova
22 pages
Probit_Logit_Analysis
No ratings yet
Probit_Logit_Analysis
3 pages
Introduction To SEM
No ratings yet
Introduction To SEM
64 pages
Unitroot Coint Causality EViews
No ratings yet
Unitroot Coint Causality EViews
25 pages
Task 4 ANOVA
No ratings yet
Task 4 ANOVA
4 pages
Factor Analysis (DR See Kin Hai)
No ratings yet
Factor Analysis (DR See Kin Hai)
15 pages
Nama: Suryaningtyas Dharma Putri NRP: 150114390 1. Uji Beda Between Design Eksperimen Kontrol 3 2 4 3 6 2 8 2
No ratings yet
Nama: Suryaningtyas Dharma Putri NRP: 150114390 1. Uji Beda Between Design Eksperimen Kontrol 3 2 4 3 6 2 8 2
9 pages
Analysing Panel Data
No ratings yet
Analysing Panel Data
25 pages
Biomechanical Analysis of Movement Laboratory Report Assessment Resit Information 1 1
No ratings yet
Biomechanical Analysis of Movement Laboratory Report Assessment Resit Information 1 1
6 pages
BB Multiple Regression
No ratings yet
BB Multiple Regression
59 pages
ANOVA Calculator - One Way ANOVA and Tukey HSD test
No ratings yet
ANOVA Calculator - One Way ANOVA and Tukey HSD test
5 pages
5224 Measure Phase 2
No ratings yet
5224 Measure Phase 2
41 pages
Chapter IV
No ratings yet
Chapter IV
18 pages
Regn_lect_4
No ratings yet
Regn_lect_4
9 pages
Descriptive Statistics Using Microsoft Excel
No ratings yet
Descriptive Statistics Using Microsoft Excel
5 pages
04.11. Assignment 1 Write-Up
No ratings yet
04.11. Assignment 1 Write-Up
15 pages
SPSS-handouts
No ratings yet
SPSS-handouts
6 pages
Practice Hands-On Activity-Independent T-Test and Mann Whitney U Test
No ratings yet
Practice Hands-On Activity-Independent T-Test and Mann Whitney U Test
46 pages
Out of Trend Results
No ratings yet
Out of Trend Results
21 pages
SPC Training
100% (1)
SPC Training
35 pages
ECS4863 - Solutions To Activity 1.3
No ratings yet
ECS4863 - Solutions To Activity 1.3
16 pages
DOC-20231219-WA0019.
No ratings yet
DOC-20231219-WA0019.
34 pages
TUTORIAL 6_Capability analysis
No ratings yet
TUTORIAL 6_Capability analysis
36 pages
Unit 5. Model Selection: María José Olmo Jiménez
No ratings yet
Unit 5. Model Selection: María José Olmo Jiménez
15 pages
Basic Inferential Statistics Nov. 5
No ratings yet
Basic Inferential Statistics Nov. 5
50 pages
Lecture set 4
No ratings yet
Lecture set 4
39 pages
LM09 Parametric and Non-Parametric Tests of Independence IFT Notes
No ratings yet
LM09 Parametric and Non-Parametric Tests of Independence IFT Notes
7 pages
1 Research III Chapter 4 Student
No ratings yet
1 Research III Chapter 4 Student
78 pages
Operations Management: Statistical Process Control Supplement 6
No ratings yet
Operations Management: Statistical Process Control Supplement 6
34 pages
Empirical Data Analysis in Accounting and Finance
No ratings yet
Empirical Data Analysis in Accounting and Finance
37 pages
Ancova
100% (1)
Ancova
20 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
Batengi Chapter 4 & 5
No ratings yet
Batengi Chapter 4 & 5
25 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
Lesson 4: Independent-Samples T Test: Objectives
No ratings yet
Lesson 4: Independent-Samples T Test: Objectives
7 pages
Arima Model
No ratings yet
Arima Model
30 pages
ARIMA Model
No ratings yet
ARIMA Model
30 pages
Big Data Analysis Assig.2
100% (1)
Big Data Analysis Assig.2
5 pages
3 - Measure
No ratings yet
3 - Measure
39 pages
Jans - Exposure Data Analysis (BOHS)
No ratings yet
Jans - Exposure Data Analysis (BOHS)
20 pages
10 Minute Guide to Orthogonal Array Test Strategy
From Everand
10 Minute Guide to Orthogonal Array Test Strategy
Rajeev Nair Raman
No ratings yet
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Section 1.1 Patterns and Numbers in Nature and The World
No ratings yet
Section 1.1 Patterns and Numbers in Nature and The World
25 pages
Design and Analysis of Experiments
No ratings yet
Design and Analysis of Experiments
3 pages
Quantitative Methods For Business
No ratings yet
Quantitative Methods For Business
15 pages
Final Exam - Stat101 - SpringA
No ratings yet
Final Exam - Stat101 - SpringA
5 pages
CFA LVL II Quantitative Methods Study Notes
No ratings yet
CFA LVL II Quantitative Methods Study Notes
10 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
58 pages
Allama Iqbal Open University: Assignment# 1
No ratings yet
Allama Iqbal Open University: Assignment# 1
19 pages
Hypothsis Testing - One Sample
100% (1)
Hypothsis Testing - One Sample
26 pages
Labor Welfare Policy Practices and Deficiencies With the ILO Evidence From the Garment Industry of Bangladesh
No ratings yet
Labor Welfare Policy Practices and Deficiencies With the ILO Evidence From the Garment Industry of Bangladesh
18 pages
Research Methodology MCQ 400
70% (20)
Research Methodology MCQ 400
190 pages
Regressi On
No ratings yet
Regressi On
16 pages
Inferential Statistics: (Parametric Data)
No ratings yet
Inferential Statistics: (Parametric Data)
46 pages
Upgrad
No ratings yet
Upgrad
8 pages
War and The State The Theory of International Politics
No ratings yet
War and The State The Theory of International Politics
266 pages
Tes10 ch08
No ratings yet
Tes10 ch08
123 pages
MGT555 Test June 2020 PDF
No ratings yet
MGT555 Test June 2020 PDF
11 pages
CertDA WS
No ratings yet
CertDA WS
13 pages
17 Success Principles by Napoleon Hill PDF Free
No ratings yet
17 Success Principles by Napoleon Hill PDF Free
22 pages
STA 117 Module 1 CODeL
No ratings yet
STA 117 Module 1 CODeL
39 pages
UNIT ONE Introduction To Research
100% (1)
UNIT ONE Introduction To Research
49 pages
A Bayesian Framework For Concept Learning: Joshua B. Tenenbaum
No ratings yet
A Bayesian Framework For Concept Learning: Joshua B. Tenenbaum
0 pages
K-S Test
No ratings yet
K-S Test
34 pages
Notes On Scientific Methodology For Phil 1030
No ratings yet
Notes On Scientific Methodology For Phil 1030
21 pages
Philosophy Bullshit and Peer Review
No ratings yet
Philosophy Bullshit and Peer Review
74 pages
Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
No ratings yet
Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
24 pages

Lecture 3. Part 1 - Regression Analysis

Uploaded by

Lecture 3. Part 1 - Regression Analysis

Uploaded by

STT153A

Lecture 3. Regression Analysis

• Research problem. Using stepwise regression, the variables (or

The final regression equation is:

𝑦 = −124.200 + 1.357∗ 𝑥3 + 0.296∗ 𝑥1 + 0.517∗ 𝑥4

p<0.000 which is less than 0.05 meaning model is significant

Interpretation: 96.15% of the variation of the job proficiency rating can be

Additionally, we also have these assumptions

A rule of thumb is that test statistic values in the range of 1.5

Expected Normal Value

Therefore, the residuals 0.0

Options-> Expected Frequencies

->Advanced -> Detailed two-way

Depending on the situation, it may not be a problem for your model if

• Looking among the independent variables, Test 1 to Test 4, none of

You might also like