0% found this document useful (0 votes)

63 views10 pages

HW5 Solution

The document presents a homework assignment for a Data Science course focused on quantitative finance, involving the application of support vector machines (SVM) and logistic regression on a dataset. It details the process of creating training and test sets, fitting classifiers, tuning parameters, and comparing the performance of different kernels. The results indicate that the radial basis kernel SVM generally yields the best classification performance compared to linear and polynomial kernels.

Uploaded by

Jake Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views10 pages

HW5 Solution

Uploaded by

Jake Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

Homework 5 and Answers

1. This problem involves the OJ data set which is part of the ISLR package.
(a) Create a training set containing a random sample of 80% observations, and a
test set containing the remaining observations.
Answer: Using the following code:
> library(ISLR)
> set.seed(1234)
> train = sample(dim(OJ)[1], dim(OJ)[1]*0.8)
> OJ.train = OJ[train, ]
> OJ.test = OJ[-train, ]

(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase
as the response and the other variables as predictors. Use the summary()
function to produce summary statistics, and describe the results obtained. What
are the training and test error rates?
Answer: Support vector classifier creates 465 support vectors out of 856 (80%
observations) training points. Out of these, 233 belong to level 𝙲𝙷 and remaining 232
belong to level 𝙼𝙼.

!
The train and test results are below.

DSA5205 A/P Chen Ying !1 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

!
We can thus compute the training error rate is 17.1% and test error rate is about
15%.

(c) Use the tune() function to select an optimal cost. Consider values in the range
0.01 to 10. Compute the training and test error rates using this new value for
cost.
Answer: We set a sequence of candidate cost as cost=10^seq(-2, 1, by = 0.25).
Tuning result shows that optimal cost is 0.316.

DSA5205 A/P Chen Ying !2 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

!
We run the support vector machine using the optimal cost.

!
The training error decreases to 16.7%, but test error slightly increases to 16.4% by
using best cost.

(d) Repeat parts (b) through (c) using a support vector machine with a radial kernel.
Use the default value for gamma.

DSA5205 A/P Chen Ying !3 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

Answer: The radial basis kernel with default gamma creates 400 support vectors, out
of which, 203 belong to level 𝙲𝙷 and remaining 197 belong to level 𝙼𝙼. The classifier
has a training error of 15.8% and a test error of 14.5%, which are both improved
compared with linear kernel.

!
We now use cross validation to find optimal gamma. The candidate gamma
sequence is set as 10^seq(-2, 1, by = 0.25). The optimal gamma is 1.78. Using this
optimal parameter, we obtain that tuning slightly decreases training error to 15.2%,
while slightly increases test error to 15.4% compared with the default gamma (1/
length(train)). But the performance is still better than linear kernel.

DSA5205 A/P Chen Ying !4 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

(e) Repeat parts (b) through (c) using a support vector machine with a polynomial
kernel. Set degree=2.
Answer: Summary shows that polynomial kernel produces 495 support vectors, out
of which, 252 belong to level 𝙲𝙷 and remaining 243 belong to level 𝙼𝙼. This kernel
produces a train error of 18.5% and a test error of 15.9% which are higher than the
errors produces by radial kernel and linear kernel.

DSA5205 A/P Chen Ying !5 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

Using the same cost candidate, we find the optimal cost for polynomial kernel is 10.
Tuning reduces the training error to 15.9% and test error to 15.4% which is similar as
radial kernel, but better than linear kernel.

(f) Overall, which approach seems to give the best results on this data?
Answer: Overall, radial basis kernel seems to be producing minimum
misclassification error on both train and test data.

2. We have seen that we can fit an SVM with a non-linear kernel in order to perform
classification using a non-linear decision boundary. We will now see that we can also
obtain a non-linear decision boundary by performing logistic regression using non-
linear transformations of the features.
(a) Generate a data set with n = 700 and p = 2, such that the observations belong to
two classes with a quadratic decision boundary between them. For instance, you
can do this as follows:

>x
! 1=runif (700) -0.5

>x
! 2=runif (700) -0.5

> y=1*(!x1^2-!x2^2 > 0)

Answer: Using the following code:

> set.seed(123)
> x1 = runif(700) - 0.5
> x2 = runif(700) - 0.5
> y = 1 * (x1^2 - x2^2 > 0)

(b) Plot the observations, colored according to their class labels. Your plot should
display x
! 1 on the x-axis, and x
! 2 on the y-axis.

Answer: The plot clearly shows non-linear decision boundary.

DSA5205 A/P Chen Ying !6 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

(c) Fit a logistic regression model to the data, using x! 1 and x! 2 as predictors. Apply
this model to the training data in order to obtain a predicted class label for each
training observation. Plot the observations, colored according to the predicted
class labels. The decision boundary should be linear.
Answer: The logistic regression output is below. It is shown that both variables are
insignificant for predicting y under significant level 0.05.

!
We plot the true labels and predicted labels below with a probability threshold of 0.5.
It is obvious that the boundary is linear, and the prediction is very bad.

DSA5205 A/P Chen Ying !7 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

(d) Now fit a logistic regression model to the data using non-linear functions of x! 1
and x! 2 as predictors (e.g. x! 12 , x! 1 ×!x2, log(!x2), and so forth). Apply this model to
the training data in order to obtain a predicted class label for each training
observation. Plot the observations, colored according to the predicted class
labels. The decision boundary should be obviously non-linear. If it is not, then
repeat (a)-(d) until you come up with an example in which the predicted class
labels are obviously non-linear.
Answer: We use poly(x1, 3), squares for x2, and product interaction terms to fit the
model. This non-linear decision boundary closely resembles the true decision
boundary.

(e) Fit a support vector classifier to the data with predictors x

! 1 and x
! 2. Obtain a class
prediction for each training observation. Plot the observations, colored according
to the predicted class labels.
Answer: Using a linear kernel in support vector classifier, even with low cost fails to
find non-linear decision boundary and classifies most points to a single class.

DSA5205 A/P Chen Ying !8 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

(f) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each
training observation. Plot the observations, colored according to the predicted
class labels.
Answer: We try SVM using radial kernel, and gamma=1. As shown, the non-linear
decision boundary on predicted labels closely resembles the true decision boundary.

!
(g) Comment on your results.

Answer: This experiment enforces the idea that SVMs with non-linear kernel are
extremely powerful in finding non-linear boundary. Both, logistic regression with non-
interactions and SVMs with linear kernels fail to find the decision boundary. Adding
interaction terms to logistic regression seems to give them same power as radial-
basis kernels. However, there is some manual efforts and tuning involved in picking
right interaction terms. This effort can become prohibitive with large number of
features. Radial basis kernels, on the other hand, only require tuning of one
parameter - gamma - which can be easily done using cross-validation.

DSA5205 A/P Chen Ying !9 | P a g e

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

DSA5205 A/P Chen Ying 10

! |Page

2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
MCQs (Machine Learning)
50% (22)
MCQs (Machine Learning)
7 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
28 pages
HW3 Solution
No ratings yet
HW3 Solution
10 pages
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
100% (1)
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
4 pages
HW4 Solution
No ratings yet
HW4 Solution
13 pages
Assignment Week 5
No ratings yet
Assignment Week 5
5 pages
Week 5 Prev & Current Assignments
No ratings yet
Week 5 Prev & Current Assignments
23 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
Assignment 5 Solution
No ratings yet
Assignment 5 Solution
6 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Practice 02 Nonlinear Regression
No ratings yet
Practice 02 Nonlinear Regression
3 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
MLT UNIT-2 Notes
No ratings yet
MLT UNIT-2 Notes
16 pages
d3 PDF
No ratings yet
d3 PDF
7 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Axioms:: Simultaneously Meannormalization
No ratings yet
Axioms:: Simultaneously Meannormalization
2 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
SVM1
No ratings yet
SVM1
4 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
ML 20231026 1
No ratings yet
ML 20231026 1
8 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Week 6 C
No ratings yet
Week 6 C
14 pages
Finals 19
No ratings yet
Finals 19
16 pages
ML Answers Updated
No ratings yet
ML Answers Updated
13 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
ML 2018
No ratings yet
ML 2018
2 pages
Q1. Explain Why SVM Is More Efficient Than Logistic Regression?
No ratings yet
Q1. Explain Why SVM Is More Efficient Than Logistic Regression?
6 pages
XII. Support Vector Machines
No ratings yet
XII. Support Vector Machines
6 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
2023 Machine Learning
No ratings yet
2023 Machine Learning
8 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Dda3020 22
No ratings yet
Dda3020 22
4 pages
ML Week 5
No ratings yet
ML Week 5
7 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
Unit 2
No ratings yet
Unit 2
133 pages
Questions and Solutions On Linear Regression
No ratings yet
Questions and Solutions On Linear Regression
5 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
SDSC3006 - Assignment 2
No ratings yet
SDSC3006 - Assignment 2
3 pages
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
No ratings yet
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
8 pages
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
From Everand
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
Manish Soni
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
CS3361 DS Lab-2021 R
No ratings yet
CS3361 DS Lab-2021 R
2 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Excel Beta Example
No ratings yet
Excel Beta Example
5 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
Math in Modern World
No ratings yet
Math in Modern World
7 pages
Random Motors Briefing
No ratings yet
Random Motors Briefing
43 pages
Defence and Strategic Studies
No ratings yet
Defence and Strategic Studies
5 pages
Types of Analytics
No ratings yet
Types of Analytics
4 pages
Mca 20 21
No ratings yet
Mca 20 21
90 pages
BA Sem III and IV
No ratings yet
BA Sem III and IV
15 pages
Load Cell Manual
100% (1)
Load Cell Manual
32 pages
ICH 2022 Validation-Analytical-Procedures-Step-2b - en
No ratings yet
ICH 2022 Validation-Analytical-Procedures-Step-2b - en
39 pages
Forecasting Techniques in Fast Moving Consumer Goods Supply Chain: A Model Proposal
No ratings yet
Forecasting Techniques in Fast Moving Consumer Goods Supply Chain: A Model Proposal
11 pages
Statistical Theories of Discrimination in Labor Market
No ratings yet
Statistical Theories of Discrimination in Labor Market
14 pages
Costing Methods (Manufacturing) - Benefits, Expenses
No ratings yet
Costing Methods (Manufacturing) - Benefits, Expenses
5 pages
Expe Psych Reviewer
No ratings yet
Expe Psych Reviewer
24 pages
Microbial Fuel Cell: Sustainable Approach For Reservoir Eutrophication
No ratings yet
Microbial Fuel Cell: Sustainable Approach For Reservoir Eutrophication
2 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
Biostatistics: A Refresher: Kevin M. Sowinski, Pharm.D., FCCP
100% (1)
Biostatistics: A Refresher: Kevin M. Sowinski, Pharm.D., FCCP
20 pages
Macroeconomic Regression Model in Excel
No ratings yet
Macroeconomic Regression Model in Excel
10 pages
102-Article Text-273-1-10-20201216
No ratings yet
102-Article Text-273-1-10-20201216
15 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Best Subset Reg 2
No ratings yet
Best Subset Reg 2
9 pages
Cars
No ratings yet
Cars
103 pages
Abcde
No ratings yet
Abcde
5 pages
AIB Case Study On Uber
No ratings yet
AIB Case Study On Uber
7 pages
Pharmacokinetics and Pharmacodynamics
No ratings yet
Pharmacokinetics and Pharmacodynamics
12 pages
DWDM Mid-1
No ratings yet
DWDM Mid-1
3 pages
Effectiveness of Paid Search Advertising: Experimental Evidence
No ratings yet
Effectiveness of Paid Search Advertising: Experimental Evidence
20 pages

HW5 Solution

Uploaded by

HW5 Solution

Uploaded by

DSA5205 Data Science in Quantitative Finance AY2021/22SEM1

Homework 5 and Answers

DSA5205 A/P Chen Ying !1 | P a g e

DSA5205 A/P Chen Ying !2 | P a g e

DSA5205 A/P Chen Ying !3 | P a g e

DSA5205 A/P Chen Ying !4 | P a g e

DSA5205 A/P Chen Ying !5 | P a g e

> y=1*(!x1^2-!x2^2 > 0)

Answer: Using the following code:

Answer: The plot clearly shows non-linear decision boundary.

DSA5205 A/P Chen Ying !6 | P a g e

DSA5205 A/P Chen Ying !7 | P a g e

(e) Fit a support vector classifier to the data with predictors x

DSA5205 A/P Chen Ying !8 | P a g e

DSA5205 A/P Chen Ying !9 | P a g e

DSA5205 A/P Chen Ying 10

You might also like