0% found this document useful (0 votes)

10 views

Assignment 3

The document outlines a series of regression analyses performed on various datasets to determine the best predictors for cost and salary. It discusses model selection using AIC, the significance of variables, and the presence of multicollinearity in the data. Additionally, it evaluates the impact of smoking on lung capacity, concluding that smoking has a significant overall effect despite individual variables not being significant at the 5% level.

Uploaded by

詠芯謝

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Assignment 3

Uploaded by

詠芯謝

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Assignment 3

Question 1

(a) The best model for the predictors is contain PAPER and MACHINE, since these
two predictors gives lowest AIC.

(b) Step 1:

-COST ~ 1

-COST ~ MACHINE

-COST ~ PAPER

-COST ~ OVERHEAD

--COST ~ LABOR

After we compare above models, we found that the best model is COST ~
MACHINE, because the corresponding AIC is the smallest and less than the current
model.

Step 2:

-COST ~ MACHINE

-COST ~ PAPER+ MACHINE

-COST ~ MACHINE+OVERHEAD

-COST ~ MACHINE+LABOR

We future add PAPER to the model, the best model is COST ~ MACHIN+ PAPERE,
because the AIC is the smallest and is also less than the current model.

Step 3:

-COST ~ MACHINE+PAPER

-COST ~ MACHINE+PAPER+OVERHEAD

-COST ~ MACHINE+PAPER+LABOR

Final models consist of MACHINE and PAPER, we terminate the procedure, because
the current model has the lowest AIC.
(c) the estimated regression line of the final model:

COST=59.4318+0.9489(PAPER)+2.3864(MACHINE)

(d)

R2=0.9987, adjusted R2 = 0.9986

residual standard error=10.98

(e) Yes, the same variables included in the final regression model, both from all
possible regression and backward elimination procedures return the same model.
Question 2

(a) Because of the large number of variables, this approach may not be practically
feasible, because there are 28=256 candidate models to be considered, the running
time will be long. Therefore, it is better to use forward selection.

(b)

PROD, FOV and HOUSE these three variables suggested to develop the most suitable
model, since the model has an R2 = 0.7613, showing a fair fit.
(c)

With AIC as the criterion, the best set of variables are PROD FOV and HOUSE in the
final regression. Yes, I agree with my method because it return the same model.
Question 3

(a)

dataset <- read.csv("hwk3q3.csv")

model <- lm(SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) + as.factor(GENDER), data = dataset)
summary(model)

##
## Call:
## lm(formula = SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) +
## as.factor(GENDER), data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1410.3 -204.5 -103.4 230.3 752.1
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 1320.86 411.76 3.208 0.00272

**
## YEARS 20.38 41.65 0.489 0.62736

## as.factor(POSITION)2 186.91 479.54 0.390 0.69889

## as.factor(POSITION)3 -223.54 409.34 -0.546 0.58820

## as.factor(POSITION)4 1437.47 521.08 2.759 0.00888

**
## as.factor(POSITION)5 2301.07 518.38 4.439 7.52e-05
***
## as.factor(EDUCAT)2 133.16 321.02 0.415 0.68063

## as.factor(EDUCAT)3 -685.85 477.76 -1.436 0.15932

## as.factor(EDUCAT)4 NA NA NA NA

## as.factor(GENDER)1 231.36 338.49 0.684 0.49842

## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
' 1
##
## Residual standard error: 495 on 38 degrees of freedom
## Multiple R-squared: 0.7504, Adjusted R-squared: 0.6979
## F-statistic: 14.28 on 8 and 38 DF, p-value: 2.407e-09

The estimated value is 231.36, which means that the average monthly salary of male
employees is 231.36 higher than that of female employees, holding other variables
unchanged.

(b) The remaining degrees of freedom of the model can be d.f. = 47 − 1− 1−1-4 – 3 =
37, which does not match with the R output since the R output equals 38.

## [1] 4 7 8 10 15 16 20 21 24 26 30 33 34 35 41 42 43 45
46 47

which(dataset$EDUCAT == 3 | dataset$EDUCAT == 4)

## [1] 4 7 8 10 15 16 20 21 24 26 30 33 34 35 41 42 43 45
46 47

I(POSITION = 4) + I(POSITION = 5) = I (EDUCAT = 3) + I(EDUCAT = 4)

These two outputs are the same, which indicates that 20 chemists or management
employees are the same as 20 employees with a bachelor's or master's degree.
Therefore, the model has a perfect multicollinearity.

(d)
full_model <- lm(SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) + as.factor(GENDER), data = dataset)
reduced_model <- lm(SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT), data = dataset)
anova(reduced_model, full_model)

## Analysis of Variance Table

##
## Model 1: SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT)
## Model 2: SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) + as.factor(GENDER)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 39 9424448
## 2 38 9309984 1 114464 0.4672 0.4984

H0: βMALE =0, H1: βMALE ≠ 0

F= 0.4672, p-value:0.4984,

α =0.05 , d . f =[ 1 , 38 ] ,C .V .=4.10

This P-value is much higher than the common significance level of 0.05. Therefore,
do not reject H0. The GENDER is not significant at 5% level, which means that there
is no significant evidence of gender discrimination against employees in the provided
dataset.
Question 4

(a)

For smokers estimated regression lines:

LungCap = (1.05157 + 0.22601) + (0.55823 − 0.0597) Age

=1.27758 + 0.49853Age

For non-smokers estimated regression lines:

LungCap = 1.05157 + 0.55823Age

(b) estimated slope coefficient of Age=0.55823

When non-smokers increase their age by 1, their lung capacity increases by 0.55823
on average.

(d) No, there not significant individually at 5% level since p-values of smoke is 0.823
and p-value of age is 0.377 both are greater than the significant level 5%.
(e)

H0: βSmoke =βAge:Smoke =0

H1: at least one β≠ 0

F = 6.4186, p-value = 0.001726

α =0.05 , d.f. = 2, 721

Therefore, reject H0 (p-value<α ) because the overall impact of smoking is significant.

Change; Principles of Problem Formation and Problem -- Watzlawick, Paul, 1921-2007; Weakland, John H_, Joint -- 1st_ Ed_, New York, 1974 -- New York, -- 0393011046 -- 63f06a252241e2b65d39b8dab427f94d -- Anna’s Arch
No ratings yet
Change; Principles of Problem Formation and Problem -- Watzlawick, Paul, 1921-2007; Weakland, John H_, Joint -- 1st_ Ed_, New York, 1974 -- New York, -- 0393011046 -- 63f06a252241e2b65d39b8dab427f94d -- Anna’s Arch
198 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
BS 3424 - 5 PDF
100% (1)
BS 3424 - 5 PDF
12 pages
Assignments
No ratings yet
Assignments
6 pages
S Doc1
100% (1)
S Doc1
7 pages
PDF
No ratings yet
PDF
9 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Regression hw3
No ratings yet
Regression hw3
3 pages
PS3 Stata
No ratings yet
PS3 Stata
3 pages
Regression in R
No ratings yet
Regression in R
40 pages
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
No ratings yet
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
9 pages
Dummy Variable Ques
No ratings yet
Dummy Variable Ques
7 pages
HW 3
No ratings yet
HW 3
9 pages
Group4
No ratings yet
Group4
9 pages
27.12.10h15 KTLTC De-1
No ratings yet
27.12.10h15 KTLTC De-1
6 pages
Problem 4.1 A)
No ratings yet
Problem 4.1 A)
11 pages
Categorical Predictor S
No ratings yet
Categorical Predictor S
41 pages
Text On Class
No ratings yet
Text On Class
18 pages
Lecture 01
No ratings yet
Lecture 01
26 pages
Homework 3
No ratings yet
Homework 3
10 pages
Assignment 2 Course: QTMS Submitted By: Zoya Palijo (8211) Submitted To: Dr. Arsalan Hashmi
No ratings yet
Assignment 2 Course: QTMS Submitted By: Zoya Palijo (8211) Submitted To: Dr. Arsalan Hashmi
5 pages
Solutions Week 10
No ratings yet
Solutions Week 10
7 pages
3334 Exam Cheat Sheet
No ratings yet
3334 Exam Cheat Sheet
26 pages
Test 3
No ratings yet
Test 3
3 pages
Assignment-15 BA
No ratings yet
Assignment-15 BA
11 pages
ps5 Fall+2015
No ratings yet
ps5 Fall+2015
9 pages
Assignment No.2: Jameel Ahmed (8513) To: Sir Arsalan Hashmi
No ratings yet
Assignment No.2: Jameel Ahmed (8513) To: Sir Arsalan Hashmi
7 pages
Homework 3
No ratings yet
Homework 3
10 pages
A2 copy 2
No ratings yet
A2 copy 2
8 pages
1 ORSolution Manual Ch01
No ratings yet
1 ORSolution Manual Ch01
8 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
True Regression Model: C - Logincome = Β + Β · B - Years Of Schooling + Β · D - Age + Β · E - Female + Β ·H - Smoker + Β · D - Age ·E - Female + Β · D - Age · H - Smoker + Ε
No ratings yet
True Regression Model: C - Logincome = Β + Β · B - Years Of Schooling + Β · D - Age + Β · E - Female + Β ·H - Smoker + Β · D - Age ·E - Female + Β · D - Age · H - Smoker + Ε
7 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Problem Set
No ratings yet
Problem Set
8 pages
Ecotrix Assignment
No ratings yet
Ecotrix Assignment
5 pages
Term Paper Sample PDF
No ratings yet
Term Paper Sample PDF
10 pages
ECON3208 Past Paper 2008
No ratings yet
ECON3208 Past Paper 2008
9 pages
Problem CH 3
No ratings yet
Problem CH 3
3 pages
Additional Problem Set Units I and II
No ratings yet
Additional Problem Set Units I and II
8 pages
Math Bach 07
No ratings yet
Math Bach 07
24 pages
Quantitative Methods Ii Quiz 1: Saturday, October 23, 2010
No ratings yet
Quantitative Methods Ii Quiz 1: Saturday, October 23, 2010
14 pages
HW4 Solutions: Problem 6.2
No ratings yet
HW4 Solutions: Problem 6.2
8 pages
Assignment3 05.01.24
No ratings yet
Assignment3 05.01.24
4 pages
Statistics Econometrics Exam Feb
No ratings yet
Statistics Econometrics Exam Feb
8 pages
AE Week 3
No ratings yet
AE Week 3
3 pages
HW3 Solutions - Stats 500: Problem 1
No ratings yet
HW3 Solutions - Stats 500: Problem 1
4 pages
강준혁 회귀분석 과제 4
No ratings yet
강준혁 회귀분석 과제 4
10 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Panel Data Problem Set 2
No ratings yet
Panel Data Problem Set 2
6 pages
Example Econometrics
No ratings yet
Example Econometrics
6 pages
MEM Group Problem Set 2022
No ratings yet
MEM Group Problem Set 2022
3 pages
Text - On - Class Econometrics
No ratings yet
Text - On - Class Econometrics
17 pages
Econometrics Trial exam 1
No ratings yet
Econometrics Trial exam 1
15 pages
1
No ratings yet
1
5 pages
Econometrics II ReExam
No ratings yet
Econometrics II ReExam
8 pages
PracticeforTest3.s24
No ratings yet
PracticeforTest3.s24
6 pages
ansprac2
No ratings yet
ansprac2
6 pages
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
From Everand
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
Tracy L. Owens
No ratings yet
Data Science Using Python and R
From Everand
Data Science Using Python and R
Chantal D. Larose
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
T2B-Tutorial Problem
No ratings yet
T2B-Tutorial Problem
2 pages
L2A-Multiple Regression a 2022-03-01 15-52-48
No ratings yet
L2A-Multiple Regression a 2022-03-01 15-52-48
25 pages
L2D-Multiple Regression D 2022-03-03 21_20_03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21_20_03
31 pages
L2B-Multiple Regression B 2022-03-02 08_50_53 2022-03-03 21_20_02
No ratings yet
L2B-Multiple Regression B 2022-03-02 08_50_53 2022-03-03 21_20_02
23 pages
ch09_banking_mutual
No ratings yet
ch09_banking_mutual
52 pages
L2C-Multiple Regression C 2022-03-03 21_20_04
No ratings yet
L2C-Multiple Regression C 2022-03-03 21_20_04
24 pages
MS4226 Project Progress Report
No ratings yet
MS4226 Project Progress Report
3 pages
Chapter1_2024
No ratings yet
Chapter1_2024
94 pages
CB3044 Midterm Ch6 Answer.docx
No ratings yet
CB3044 Midterm Ch6 Answer.docx
10 pages
Chapter2_2024
No ratings yet
Chapter2_2024
66 pages
Lecture 7 Examples
No ratings yet
Lecture 7 Examples
24 pages
ch08_money_mortgage
No ratings yet
ch08_money_mortgage
52 pages
Group Assignment
No ratings yet
Group Assignment
7 pages
Lecture - 4 - Modeling of DC Machines
No ratings yet
Lecture - 4 - Modeling of DC Machines
23 pages
Castillejos Cad 322 D
No ratings yet
Castillejos Cad 322 D
18 pages
LP DAY 1 For Advanced Complex Formulas and Computations
100% (2)
LP DAY 1 For Advanced Complex Formulas and Computations
12 pages
300+ Top Surveying Lab Viva Questions and Answers: Loading..
No ratings yet
300+ Top Surveying Lab Viva Questions and Answers: Loading..
4 pages
IA Checklist
No ratings yet
IA Checklist
2 pages
IEEE 754 Tutorial - Converting To IEEE 754 Form
No ratings yet
IEEE 754 Tutorial - Converting To IEEE 754 Form
2 pages
mth101 PDF
100% (1)
mth101 PDF
165 pages
The Role of Redundancy and Overstrength in Earthquake Resistant Design
No ratings yet
The Role of Redundancy and Overstrength in Earthquake Resistant Design
8 pages
Part 1
No ratings yet
Part 1
38 pages
Surprise Evaluation Exam
No ratings yet
Surprise Evaluation Exam
10 pages
Fuzzy Geographically Weighted Clustering
No ratings yet
Fuzzy Geographically Weighted Clustering
8 pages
Contact Angle Wettability And Adhesion Volume 6 Mittal Kl instant download
No ratings yet
Contact Angle Wettability And Adhesion Volume 6 Mittal Kl instant download
81 pages
Communication Theory by Simon Haykin PDF
No ratings yet
Communication Theory by Simon Haykin PDF
2 pages
Practice Sheet System of Particles and Centre of Mass Anil Sir Vinay
No ratings yet
Practice Sheet System of Particles and Centre of Mass Anil Sir Vinay
7 pages
Very Short-Term Electricity Load Demand Forecasting Using Support Vector Regression
No ratings yet
Very Short-Term Electricity Load Demand Forecasting Using Support Vector Regression
7 pages
Xin Zhong Primary School School Year 2022 - 2023: Round To The Nearest Ten
No ratings yet
Xin Zhong Primary School School Year 2022 - 2023: Round To The Nearest Ten
2 pages
Stacky Intersection Theory
No ratings yet
Stacky Intersection Theory
8 pages
MODULE 2 - Factoring Polynomials (Part II)
No ratings yet
MODULE 2 - Factoring Polynomials (Part II)
12 pages
Worksheet Grade 10 Villa
No ratings yet
Worksheet Grade 10 Villa
1 page
Two-Term and Three-Term Ratios: Key Ideas Review
No ratings yet
Two-Term and Three-Term Ratios: Key Ideas Review
2 pages
Ms Excel MCQ Bank
100% (3)
Ms Excel MCQ Bank
25 pages
Autocad 2D Content
No ratings yet
Autocad 2D Content
2 pages
Tree Concepts & Definitions Graph
No ratings yet
Tree Concepts & Definitions Graph
31 pages
DRM148
No ratings yet
DRM148
41 pages
Excel CheatSheet The Microsoft Excel Formulas Cheat Sheet
No ratings yet
Excel CheatSheet The Microsoft Excel Formulas Cheat Sheet
5 pages
2D Composite Transformation
100% (1)
2D Composite Transformation
9 pages
Quantitative Research
No ratings yet
Quantitative Research
8 pages
Focus Groups Theory and Practice 3rd Edition David W. Stewart - Instantly access the complete ebook with just one click
100% (1)
Focus Groups Theory and Practice 3rd Edition David W. Stewart - Instantly access the complete ebook with just one click
47 pages

Assignment 3

Uploaded by

Assignment 3

Uploaded by

Assignment 3

-COST ~ PAPER+ MACHINE

R2=0.9987, adjusted R2 = 0.9986

residual standard error=10.98

dataset <- read.csv("hwk3q3.csv")

## (Intercept) 1320.86 411.76 3.208 0.00272

## as.factor(POSITION)2 186.91 479.54 0.390 0.69889

## as.factor(POSITION)3 -223.54 409.34 -0.546 0.58820

## as.factor(POSITION)4 1437.47 521.08 2.759 0.00888

## as.factor(EDUCAT)3 -685.85 477.76 -1.436 0.15932

## as.factor(GENDER)1 231.36 338.49 0.684 0.49842

I(POSITION = 4) + I(POSITION = 5) = I (EDUCAT = 3) + I(EDUCAT = 4)

## Analysis of Variance Table

H0: βMALE =0, H1: βMALE ≠ 0

For smokers estimated regression lines:

LungCap = (1.05157 + 0.22601) + (0.55823 − 0.0597) Age

For non-smokers estimated regression lines:

LungCap = 1.05157 + 0.55823Age

(b) estimated slope coefficient of Age=0.55823

H0: βSmoke =βAge:Smoke =0

H1: at least one β≠ 0

F = 6.4186, p-value = 0.001726

α =0.05 , d.f. = 2, 721

You might also like