Assignment 4

This document describes an assignment involving building a logistic regression model to predict extramarital affairs using demographic and survey data. Students are asked to perform variable selection, make predictions on new data, and evaluate a research paper involving predictive modeling.

Uploaded by

colelavigne000

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Assignment 4

Uploaded by

colelavigne000

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

WEEK 11 Activity (Assignment 4) – 14 marks

Group members: Serag Elganga, Mustafa Haider, Cole Lavoie

1) You previously used the dataset called “affairs”, which contains cross-sectional data from a survey conducted by
Psychology Today in 1969. The variables include:

Variable Description
affairs number of extramarital affairs within the past year
gender factor indicating gender
age age in years
yearsmarried number of years in current marriage
children whether the person has children in the current marriage
religiousness religiousness scale: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = very
education education scale: 9 = grade school, 12 = high school graduate, 14 = some college, 16
= college graduate, 17 = some graduate work, 18 = master’s degree, 20 = Ph.D.,
M.D., or other advanced degree
occupation occupation coded according to Hollingshead classification
rating self rating of marriage: 1 = very unhappy, 2 = somewhat unhappy, 3 = average, 4 =
happier than average, 5 = very happy

You used the glm() function and the “affairs” dataset to make a logistic regression model with a new variable
(binaryaffair) as the outcome and age, yearsmarried, religiousness, and rating as the significant predictors. You started
with a model that used all of the predictors except for occupation (Week 7 activity).

a) Start again with all of the predictors except for occupation. Use backward selection to identify the important
predictors, using p<0.05 for the significance level. Which predictors are retained in your final model? (2 marks)

ANSWER:

1. Gender
2. Age
3. Years married
4. Religiousness
5. Rating

b) Paste the script for your final model here. (1 mark)

ANSWER:

Assuming that the affairs dataset and the dplyr package have been loaded, this is the script for the final model:

initial_model <- glm(binaryaffair ~ gender + age + yearsmarried + children +

religiousness + education + rating,
family = binomial, data = affairs)

final_model <- step(initial_model, direction = "backward")

summary(final_model)
2) Can we use a model to make predictions in R? Can we find out likelihood that someone has had an affair? An example
of how to do this (with the affairs dataset) can be found at the following website. Scroll down to the section called
“Predict the outcome using new data”. Your prediction script will not be the same, but you can use the codes presented
to figure out how to do it. (Note that fit.reduced is the name of the model used in the example.)

https://towardsdatascience.com/how-to-do-logistic-regression-in-r-456e9cfec7cd

a) Let’s use sample data to make a prediction:

gender male
age 41
yearsmarried 7
children yes
religiousness 2
education 18
occupation (missing data – the coding system was confusing)
rating 5

Based on your model, what is the probability that the sample had an affair within the past year? (1 mark)

ANSWER:
The value is 0.1448248, so 14%.

b) Paste your script here. If I already have the affairs dataset with the binaryaffair variable, I should be able to
paste your script into RStudio and get the same results that you have. (3 marks)

ANSWER:
Note: the model that was used was the model created in question 3 e) from Week 7 activity where only the significant
variables were included. In the script, its called “logistic_model_2”.

logistic_model_2 <- glm(binaryaffair ~ age + yearsmarried + religiousness + rating,

family = binomial, data = affairs)

newdataset <- data.frame(

gender = "male",
age = 41,
yearsmarried = 7,
children = "yes",
religiousness = 2,
education = 18,
rating = 5
)

newdataset$prob <- predict(logistic_model_2, newdata=newdataset,

type="response")

print(newdataset$prob)
c) Do you think that this prediction is valid? Do you have any concerns about the data used to make the model or
about the sample? Explain. (2 marks)

ANSWER:

The validity of this prediction depends on various factors, including the representativeness and quality of the data
used to train the model, potential biases in the sample, and the assumptions underlying the logistic regression
framework. I would have concerns about the data/sample if the original dataset is not sufficiently diverse or if there
are unaccounted-for variables that could influence the outcome but are not included in the model. Additionally, the
model's predictive accuracy should be assessed using independent validation data to ensure its generalizability to new
observations.

3) For this last question, each group is required to go and find a new research paper that involves building a model with
multiple predictors. Each group’s paper must be unique; any group that selects the same paper as another group will be
given a 0. Papers that have been used for presentations are not eligible. If your paper does not allow you to answer the
following questions, I suggest finding another paper. Attach a PDF of the paper with your assignment submission.
For your model, answer:

a) If you had to remove one predictor/feature/variable from the model, which one would it be and why? (1 mark)

ANSWER:

Based on the research paper, if I had to remove one predictor/variable from the model, I would consider removing
"marijuana use" as a predictive feature for both generalized anxiety disorder (GAD) and major depressive disorder
(MDD). The inclusion of marijuana use as a predictor could introduce bias into the model and affect the validity of the
predictions. Factors such as varying frequencies of use, different strains of marijuana, and individual differences in
response to marijuana could complicate the interpretation of this variable. Additionally, the legality and social
acceptance of marijuana use may vary across different populations, further complicating its predictive value.

b) If you could replace one predictor/feature/variable in the model, which one would it be and why? (1 mark)

ANSWER:

If I could replace one predictor/feature/variable in the model described in the research paper, I would consider
replacing "satisfaction with living conditions" with a more objective measure of socioeconomic status or household
environment. While satisfaction with living conditions may capture some aspects of well-being, it is inherently
subjective and may not fully reflect the broader socioeconomic and environmental factors that can significantly impact
mental health outcomes such as GAD and MDD.

c) Critically evaluate the predictors in the model; what issues do you see with data collection, accuracy, careless
responding, or any other issues with data integrity or missingness? This could include biases in the sampled
population, for example. (2 marks)
ANSWER:

The predictive model described in the research paper exhibits several notable issues that warrant critical evaluation.
The reliance on a sample of undergraduate students from a single university introduces sampling bias, potentially
limiting the generalizability of the findings to the broader population. Self-reported measures, particularly for sensitive
variables like lifestyle behaviors, may be prone to response bias and inaccuracies. Missing data, ranging from <1% to
36%, raises concerns about data completeness and the potential impact of imputation methods on the results.
Feature engineering, while beneficial for enhancing model performance, introduces the risk of overfitting and spurious
correlations if not carefully validated. Moreover, the limited sensitivity and specificity of the screening questionnaire
for MDD and GAD may affect the accuracy of the predictive model. Confounding variables and the complexity of
machine learning models further complicate interpretation and generalizability.

Introduction to Microcontroller Programming for Power Electronics Control Applications: Coding with MATLAB® and Simulink® 1st Edition Mattia Rossi instant download
100% (1)
Introduction to Microcontroller Programming for Power Electronics Control Applications: Coding with MATLAB® and Simulink® 1st Edition Mattia Rossi instant download
44 pages
Assignment 3: Logistic Regression (Individual Submission)
0% (1)
Assignment 3: Logistic Regression (Individual Submission)
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Logistic Regression: Prof. Andy Field
No ratings yet
Logistic Regression: Prof. Andy Field
34 pages
Advanced Statistics Demystified
From Everand
Advanced Statistics Demystified
Larry J. Stephens
3.5/5 (3)
GED® Math Test Tutor, For the New 2014 GED® Test
From Everand
GED® Math Test Tutor, For the New 2014 GED® Test
Sandra Rush
No ratings yet
Assignment # 1
No ratings yet
Assignment # 1
28 pages
VGLM Cbind Family Data : G G G G
No ratings yet
VGLM Cbind Family Data : G G G G
4 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Big Data Report
No ratings yet
Big Data Report
15 pages
Ester Paksuniemi Assignment5
No ratings yet
Ester Paksuniemi Assignment5
9 pages
2021 Quiz2 Problems
No ratings yet
2021 Quiz2 Problems
13 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Perfect 800: SAT Math: Advanced Strategies for Top Students
From Everand
Perfect 800: SAT Math: Advanced Strategies for Top Students
Dan Celenti
3.5/5 (11)
Week 8 - Logistic Regression
No ratings yet
Week 8 - Logistic Regression
67 pages
GRE: A Strategic Approach with online diagnostic
From Everand
GRE: A Strategic Approach with online diagnostic
Doug Tarnopol
No ratings yet
PG IV 1110 Online Predictive Modelling End Term Paper
No ratings yet
PG IV 1110 Online Predictive Modelling End Term Paper
3 pages
Logistic SPSS
100% (1)
Logistic SPSS
29 pages
ho_ancova example
No ratings yet
ho_ancova example
3 pages
Scientific Management of the Classroom
From Everand
Scientific Management of the Classroom
Pernell Hodges
No ratings yet
Cross Section Answers
No ratings yet
Cross Section Answers
22 pages
Number & Operations - Task Sheets Gr. 3-5
From Everand
Number & Operations - Task Sheets Gr. 3-5
Nat Reed
No ratings yet
How to Use Total Quality Techniques in Your Job?
From Everand
How to Use Total Quality Techniques in Your Job?
Darlene B. Martinez
No ratings yet
DOC-20240107-WA0007
No ratings yet
DOC-20240107-WA0007
3 pages
Econometrics-CH-4 (1)
No ratings yet
Econometrics-CH-4 (1)
14 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Binary Logistic Regression: Main Effects Model
No ratings yet
Binary Logistic Regression: Main Effects Model
5 pages
Set 1 Part I: Multiple Choice This Scenario Applies To Questions 1 and 2: A Study Was Done To Compare The Lung
100% (1)
Set 1 Part I: Multiple Choice This Scenario Applies To Questions 1 and 2: A Study Was Done To Compare The Lung
2 pages
ST201 Project Report 2023 Mark72
No ratings yet
ST201 Project Report 2023 Mark72
26 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Mock Exam 2 - Solutions
No ratings yet
Mock Exam 2 - Solutions
6 pages
Data Analysis & Probability - Drill Sheets Gr. 6-8
From Everand
Data Analysis & Probability - Drill Sheets Gr. 6-8
Chris Forest
No ratings yet
Statistics and Data Analysis Essentials
From Everand
Statistics and Data Analysis Essentials
Jayant Ramaswamy
No ratings yet
Sociology 592 - Research Statistics I Exam 1 September 27, 2002
No ratings yet
Sociology 592 - Research Statistics I Exam 1 September 27, 2002
3 pages
Logistics Regression
No ratings yet
Logistics Regression
30 pages
Logistic SPSS (pg1 14)
No ratings yet
Logistic SPSS (pg1 14)
14 pages
Au953721103009 Font
No ratings yet
Au953721103009 Font
26 pages
Problem Set 1 Solutions: Statistics 104 Due February 6, 2020 at 11:59 PM
No ratings yet
Problem Set 1 Solutions: Statistics 104 Due February 6, 2020 at 11:59 PM
18 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Teaching the Common Core Math Standards with Hands-On Activities, Grades 9-12
From Everand
Teaching the Common Core Math Standards with Hands-On Activities, Grades 9-12
Gary R. Muschla
5/5 (1)
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Homework 3 Due Tues, June 9, 10 AM (PDF File Please) : Newsom Psy 523/623 Structural Equation Modeling, Spring 2020 1
No ratings yet
Homework 3 Due Tues, June 9, 10 AM (PDF File Please) : Newsom Psy 523/623 Structural Equation Modeling, Spring 2020 1
3 pages
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
No ratings yet
Exam PA June 18, 2020 Project Solution: Task 1 - Explore The Data (8 Points)
20 pages
Sociology 592 - Research Statistics I Exam 1 Answer Key September 27, 2002
No ratings yet
Sociology 592 - Research Statistics I Exam 1 Answer Key September 27, 2002
7 pages
Assignment 3
No ratings yet
Assignment 3
10 pages
ID Maxel Test
No ratings yet
ID Maxel Test
4 pages
16F-Stats20 Final Exam PDF
No ratings yet
16F-Stats20 Final Exam PDF
3 pages
Algebra - Task Sheets Gr. 6-8
From Everand
Algebra - Task Sheets Gr. 6-8
Nat Reed
No ratings yet
Module 4 - Logistic Regression - Afterclass1b
No ratings yet
Module 4 - Logistic Regression - Afterclass1b
54 pages
A+B A-B B AB A: Homework 1
No ratings yet
A+B A-B B AB A: Homework 1
3 pages
Taller6 Econometria2
No ratings yet
Taller6 Econometria2
3 pages
Complete Exam Ec
No ratings yet
Complete Exam Ec
6 pages
Econometrics Assignment HW4
No ratings yet
Econometrics Assignment HW4
8 pages
Data Management and Analysis Using JMP: Health Care Case Studies
From Everand
Data Management and Analysis Using JMP: Health Care Case Studies
Jane E Oppenlander
No ratings yet
Assignment 4 - BUS 336
No ratings yet
Assignment 4 - BUS 336
4 pages
Lab Assignment-3 Logistic Regression
No ratings yet
Lab Assignment-3 Logistic Regression
2 pages
Notes Quantitative Methods - Final
No ratings yet
Notes Quantitative Methods - Final
25 pages
Universiti Tunku Abdul Rahman Faculty of Business and Finace ACADEMIC YEAR 2022/2023
No ratings yet
Universiti Tunku Abdul Rahman Faculty of Business and Finace ACADEMIC YEAR 2022/2023
3 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Logistic Regression
100% (1)
Logistic Regression
56 pages
1 Final-Exam
No ratings yet
1 Final-Exam
6 pages
Books Shalakya Tantra
75% (4)
Books Shalakya Tantra
4 pages
Percutaneous Coronary Interventions For Chronic Total Occlusion A Guide To Success Yangsoo Jang 2024 Scribd Download
100% (2)
Percutaneous Coronary Interventions For Chronic Total Occlusion A Guide To Success Yangsoo Jang 2024 Scribd Download
52 pages
LVT - Vibrant Floors
No ratings yet
LVT - Vibrant Floors
8 pages
Balança 4182-A (Mecanica) - Revisão 02.08.2007 PDF
No ratings yet
Balança 4182-A (Mecanica) - Revisão 02.08.2007 PDF
12 pages
CLASS 8TH CHEMICAL EFFECT OF CURRENT NOTES
No ratings yet
CLASS 8TH CHEMICAL EFFECT OF CURRENT NOTES
8 pages
Team Work in Palliative Care: Ns. Endah Panca Lydia F, Mkep
No ratings yet
Team Work in Palliative Care: Ns. Endah Panca Lydia F, Mkep
15 pages
Biomixer Bm330-1: Manual
No ratings yet
Biomixer Bm330-1: Manual
10 pages
Job Description: Job Title: Electrical Supervisor Ma/Mr
0% (1)
Job Description: Job Title: Electrical Supervisor Ma/Mr
2 pages
Moncada
No ratings yet
Moncada
3 pages
Penetrometros para Suelos Analogos 08180 Agratronix Manual Ingles
No ratings yet
Penetrometros para Suelos Analogos 08180 Agratronix Manual Ingles
8 pages
Release Note WPS3.0 WCDMA - 3.0.26.1 - External-Ed 22.01 PDF
No ratings yet
Release Note WPS3.0 WCDMA - 3.0.26.1 - External-Ed 22.01 PDF
17 pages
Okta
No ratings yet
Okta
17 pages
Linked List Assesment Questions and Asnswers 10-03-2025
No ratings yet
Linked List Assesment Questions and Asnswers 10-03-2025
46 pages
Grade 9 Lesson 3 System Specifications Part 2 Central Processing Unit Students
No ratings yet
Grade 9 Lesson 3 System Specifications Part 2 Central Processing Unit Students
29 pages
Checklist of Purchase Manager
No ratings yet
Checklist of Purchase Manager
3 pages
1 Minute Scalping Strategy
No ratings yet
1 Minute Scalping Strategy
10 pages
(Dahulu Dikenali Sebagai OUB Finance (Malaysia) Berhad (No Syarikat: 271809-K) ..
No ratings yet
(Dahulu Dikenali Sebagai OUB Finance (Malaysia) Berhad (No Syarikat: 271809-K) ..
15 pages
MSP430™ Hardware Tools: User's Guide
No ratings yet
MSP430™ Hardware Tools: User's Guide
191 pages
Dq&iq (Hvac) - 2
100% (1)
Dq&iq (Hvac) - 2
38 pages
SPL Ree Fracas
0% (1)
SPL Ree Fracas
1 page
Vacancy Identification Form - For Existing Position
No ratings yet
Vacancy Identification Form - For Existing Position
2 pages
April Church Enteral Nutrition Product Formulations A
100% (1)
April Church Enteral Nutrition Product Formulations A
24 pages
Pce21cs178 Vandita Goyal Old Age Home Management System
No ratings yet
Pce21cs178 Vandita Goyal Old Age Home Management System
13 pages
Steel Reinforcement For Concrete - BS 8666:2005
No ratings yet
Steel Reinforcement For Concrete - BS 8666:2005
3 pages
Annexure B Course Structure CSE and ECE
No ratings yet
Annexure B Course Structure CSE and ECE
2 pages
Chapter 3 Consumer Buying Behavior
No ratings yet
Chapter 3 Consumer Buying Behavior
15 pages
Mobilink Management Report
No ratings yet
Mobilink Management Report
76 pages
Fenton (2012) - The Cost of Satisfaction - A National Study of Patient Satisfaction, Health Care Utilization, Expenditures, and Mortality
No ratings yet
Fenton (2012) - The Cost of Satisfaction - A National Study of Patient Satisfaction, Health Care Utilization, Expenditures, and Mortality
7 pages