SDSC3006 - Assignment 2

The document outlines Assignment 2 for SDSC 3006 Fundamentals of Machine Learning I, due on October 29, 2023. It includes various questions related to logistic regression, LDA, QDA, bootstrap sampling, k-fold cross-validation, and simulated data analysis. The assignment requires students to perform calculations, produce summaries, and analyze data using different machine learning techniques.

Uploaded by

jackyko0319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views3 pages

SDSC3006 - Assignment 2

Uploaded by

jackyko0319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

SDSC 3006 Fundamentals of Machine Learning I

Assignment 2

Deadline: October 29, Sunday@ 10:00 PM

1. Suppose we collect data for a group of students in a statistics class with variables 𝑋1 = hours
studied, 𝑋2 = undergrad GPA, and 𝑌 = receive an A. We fit a logistic regression and produce
estimated coefficient, 𝛽̂0 = −6, 𝛽̂1 = 0.05, 𝛽̂2 = 1.
(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5
gets an A in the class.
(b) How many hours would the student in part (a) need to study to have a 50% chance of getting
an A in the class?

2. Answer the following questions about the differences between LDA and QDA.
(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the
training set? On the test set?
(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on
the training set? On the test set?
(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA
relative to LDA to improve, decline, or be unchanged? Why?
(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will
probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible
enough to model a linear decision boundary. Justify your answer.

3. This question should be answered using the Weekly data set (ISLP package). This data is similar
in nature to the Smarket data from this chapter’s lab, except that it contains 1089 weekly returns
for 21 years, from the beginning of 1990 to the end of 2010.
(a) Produce some numerical and graphical summaries of the Weekly data. Do there appear to be
any patterns?
(b) Use the full data set to perform a logistic regression with Direction as the response and the five
lag variables plus Volume as predictors. Use the summary function to print the results. Do any of
the predictors appear to be statistically significant? If so, which ones?
(c) Compute the confusion matrix and overall fraction of correct predictions. Explain what the
confusion matrix is telling you about the types of mistakes made by logistic regression.
(d) Now fit the logistic regression model using a training data period from 1990 to 2008, with Lag2
as the only predictor. Compute the confusion matrix and the overall fraction of correct predictions
for the held out data (that is, the data from 2009 and 2010).
(e) Repeat (d) using LDA.
1
(f) Repeat (d) using QDA.
(g) Repeat (d) using KNN with K = 1.
(h) Which of these methods appears to provide the best results on this data?

4. Suppose that we obtain a bootstrap sample from a set of n observations.

(a) What is the probability that the first bootstrap observation is not the jth observation from the
original sample? Justify your answer.
(b) What is the probability that the second bootstrap observation is not the jth observation from
the original sample?
(c) Argue that the probability that the jth observation is not in the bootstrap sample is (1 − 1/n)n.
(d) When n = 5, what is the probability that the jth observation is in the bootstrap sample?

5. Answer the following questions about k-fold cross-validation.

(a) Explain how k-fold cross-validation is implemented.
(b) What are the advantages and disadvantages of k-fold cross validation relative to:
i. The validation set approach?
ii. LOOCV?

6. Perform cross-validation on a simulated data set and answer the questions.

(a) Generate a simulated data set as follows:
> set.seed (1)
> y=rnorm (100)
> x=rnorm (100)
> y=x−2*x^2+rnorm (100)
In this data set, what is n and what is p? Write out the model used to generate the data in equation
form.
(b) Create a scatterplot of X against Y. Comment on what you find.
(c) Set a random seed, and then compute the LOOCV errors that result from fitting the following
four models using least squares:
i. Y = β0 + β1X + ε
ii. Y = β0 + β1X + β2X2 + ε
iii. Y = β0 + β1X + β2X2 + β3X3 + ε
iv. Y = β0 + β1X + β2X2 + β3X3 + β4X4 + ε.
Note you may find it helpful to use the data.frame() function to create a single data set containing
both X and Y.

2
(d) Repeat (c) using another random seed, and report your results. Are your results the same as
what you got in (c)? Why?
(e) Which of the models in (c) had the smallest LOOCV error? Is this what you expected? Explain
your answer.
(f) Comment on the statistical significance of the coefficient estimates that results from fitting each
of the models in (c) using least squares. Do these results agree with the conclusions drawn based
on the cross-validation results?

Bonnie L. Yegidis - Robert W. Weinbach - Laura L. Myers - Research Methods For Social Workers-Pearson (2017)
100% (8)
Bonnie L. Yegidis - Robert W. Weinbach - Laura L. Myers - Research Methods For Social Workers-Pearson (2017)
361 pages
ML FinalUpdated 1
No ratings yet
ML FinalUpdated 1
45 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Section 3
No ratings yet
Section 3
29 pages
HW4 Solution
No ratings yet
HW4 Solution
13 pages
HW1
No ratings yet
HW1
18 pages
Text
No ratings yet
Text
9 pages
Assignment Solution 2
No ratings yet
Assignment Solution 2
8 pages
Model Selection
No ratings yet
Model Selection
11 pages
Practice Problems Note
No ratings yet
Practice Problems Note
9 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Q and A BIS
No ratings yet
Q and A BIS
7 pages
Fall 2023-2024 IE 451 Homework 3 Solutions
No ratings yet
Fall 2023-2024 IE 451 Homework 3 Solutions
15 pages
SDSC3006 - Assignment 1
No ratings yet
SDSC3006 - Assignment 1
3 pages
Exam Final 1 Exam
No ratings yet
Exam Final 1 Exam
12 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
No ratings yet
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
6 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Questo Es
No ratings yet
Questo Es
8 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Bits For Mid2
No ratings yet
Bits For Mid2
14 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
The University of Auckland: Second Semester, 2004 Campus: City
No ratings yet
The University of Auckland: Second Semester, 2004 Campus: City
23 pages
2021 Quiz2 Problems
No ratings yet
2021 Quiz2 Problems
13 pages
Render Qam13e PPT 01
100% (1)
Render Qam13e PPT 01
41 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
Quiz 2 2021 Sol
No ratings yet
Quiz 2 2021 Sol
8 pages
SLA Mid-termV2 Soln
No ratings yet
SLA Mid-termV2 Soln
5 pages
SDSC3006 - Assignment 1
No ratings yet
SDSC3006 - Assignment 1
2 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
Activity 7
No ratings yet
Activity 7
5 pages
SDSC3006 - Assignment 3
No ratings yet
SDSC3006 - Assignment 3
4 pages
Uct633 Est 23
No ratings yet
Uct633 Est 23
3 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
Marketing Strategy of Flipkard
100% (1)
Marketing Strategy of Flipkard
97 pages
HW 02
No ratings yet
HW 02
3 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Solution 2
0% (1)
Solution 2
6 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
PracRsearch2 Gr12 Q1 Mod1 Nature of Inquiry and Research Ver3
No ratings yet
PracRsearch2 Gr12 Q1 Mod1 Nature of Inquiry and Research Ver3
39 pages
Assignment III
No ratings yet
Assignment III
3 pages
Instructions: Answer Each of The Following Questions and Justify Your Answer (Write It)
No ratings yet
Instructions: Answer Each of The Following Questions and Justify Your Answer (Write It)
3 pages
Eca Report On Youth Academies
No ratings yet
Eca Report On Youth Academies
164 pages
Practical Research 2-Writing Background of The Study
No ratings yet
Practical Research 2-Writing Background of The Study
50 pages
Principles of Psychological Assessment With Applied Examples in R - 1st Edition Entire Ebook Download
100% (19)
Principles of Psychological Assessment With Applied Examples in R - 1st Edition Entire Ebook Download
17 pages
DSE Math
No ratings yet
DSE Math
3 pages
DSE Math
No ratings yet
DSE Math
3 pages
Philosophy of Science (Summary)
No ratings yet
Philosophy of Science (Summary)
5 pages
Level - K Thinking
No ratings yet
Level - K Thinking
19 pages
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
No ratings yet
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
3 pages
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
2 pages
LAS in PRACTICAL RESEARCH 2 QUARTER 2 Week 5
No ratings yet
LAS in PRACTICAL RESEARCH 2 QUARTER 2 Week 5
16 pages
Graduate Studies - Mpa
100% (1)
Graduate Studies - Mpa
4 pages
Case Study On Daraz - Com Sample
No ratings yet
Case Study On Daraz - Com Sample
25 pages
A Review of Process Fault Detection and Diagnosis Part I Quantitative Model-Based Methods (2003, Venkat Venkatasubramanian, Raghunathan Rengaswamy, Kewen Yin, Surya N. Kavuri)
No ratings yet
A Review of Process Fault Detection and Diagnosis Part I Quantitative Model-Based Methods (2003, Venkat Venkatasubramanian, Raghunathan Rengaswamy, Kewen Yin, Surya N. Kavuri)
19 pages
8 Balanced - BST - New
No ratings yet
8 Balanced - BST - New
78 pages
W1 - Network Basics
No ratings yet
W1 - Network Basics
38 pages
Akinlolu Mariam Temisola 2020
No ratings yet
Akinlolu Mariam Temisola 2020
257 pages
Prac. Research - 2 Exam 2nd Quarter 2024 - 2025
No ratings yet
Prac. Research - 2 Exam 2nd Quarter 2024 - 2025
7 pages
Chan Sui Ki Second Term Exam
No ratings yet
Chan Sui Ki Second Term Exam
14 pages
Practical Research I Differentiates Qualitative From Quantitative Research MELC4 LAS
No ratings yet
Practical Research I Differentiates Qualitative From Quantitative Research MELC4 LAS
8 pages
Module 2 Math Foundation II
No ratings yet
Module 2 Math Foundation II
24 pages
St. Francis Parochial School: Learning Modules For Practical Research 2
No ratings yet
St. Francis Parochial School: Learning Modules For Practical Research 2
8 pages
The Complex Field of Research: For Design, Through Design, and About Design
No ratings yet
The Complex Field of Research: For Design, Through Design, and About Design
12 pages
Econometrics
No ratings yet
Econometrics
18 pages
Final Research
No ratings yet
Final Research
47 pages
Anjo GROUP-5-Checked. Edited11
No ratings yet
Anjo GROUP-5-Checked. Edited11
22 pages
Download
No ratings yet
Download
10 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Assignment 2 Ans
No ratings yet
Assignment 2 Ans
6 pages
Impact of International Trade in India Economic Growth
No ratings yet
Impact of International Trade in India Economic Growth
7 pages
Introduction To Political Science Research Methods + Polimetrics - An Open Education Resource Textbook + Workbook
No ratings yet
Introduction To Political Science Research Methods + Polimetrics - An Open Education Resource Textbook + Workbook
1 page
Maxwell Atasha1st-Revision
No ratings yet
Maxwell Atasha1st-Revision
9 pages
Research Study Designs Experimental and Auqsi-Esperimental - Thompson and Panacek - 2020
No ratings yet
Research Study Designs Experimental and Auqsi-Esperimental - Thompson and Panacek - 2020
5 pages
Anachem 1
No ratings yet
Anachem 1
3 pages
Handout 1 - Introduction To Research
No ratings yet
Handout 1 - Introduction To Research
3 pages
Methods
No ratings yet
Methods
3 pages
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
From Everand
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
Manish Soni
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet

SDSC3006 - Assignment 2

Uploaded by

SDSC3006 - Assignment 2

Uploaded by

SDSC 3006 Fundamentals of Machine Learning I

Deadline: October 29, Sunday@ 10:00 PM

4. Suppose that we obtain a bootstrap sample from a set of n observations.

5. Answer the following questions about k-fold cross-validation.

6. Perform cross-validation on a simulated data set and answer the questions.

You might also like