0% found this document useful (0 votes)

9 views7 pages

ISYE6501 Homework 1

The document discusses a classification model for auto insurance applications, highlighting key predictors such as driving history, age and gender, vehicle type, annual mileage, and credit score. It also details the implementation of a support vector machine (SVM) using the R package 'kernlab' to classify credit card application data, achieving an accuracy of 86.39%. Additionally, the document outlines a method to optimize the model by testing various cost parameters (C values) to improve accuracy.

Uploaded by

vitieubao083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

ISYE6501 Homework 1

Uploaded by

vitieubao083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ISYE 6501 - Homework 1

2024-01-15

Question 2.1 Describe a situation for which a classification model would be ap-
propriate. List some (up to 5) predictors that you might use.

Recently, I found myself in the process of shopping for new auto insurance and filled out an online intake
form on an insurance company’s website. What struck me was the commonality in the predictors used by
every auto insurance company. These factors play a pivotal role in shaping insurance premiums and coverage.
Here are five key aspects that stood out during the form-filling process:

1. Driving History: The questions about past driving records, encompassing accidents, traffic violations,
and claims history, were quite detailed. It was evident that this information significantly influences an
insurance company’s assessment of an applicant’s risk level.
2. Age and Gender: The intake form delved into specifics about age and gender. It appears that these
demographic factors consistently factor into insurers’ statistical evaluations of risk, potentially leading
to varied premiums for younger or male drivers.
3. Vehicle Type: Notably, there was a focus on the make and model of my vehicle. This suggests that
insurers use this data to gauge the overall risk associated with a particular vehicle, considering repair
costs and safety features.
4. Annual Mileage: Questions about the number of miles driven annually were prominent. The em-
phasis on mileage indicates that insurers view higher mileage as a potential risk factor, likely tied to
an increased likelihood of accidents and, subsequently, higher insurance rates.
5. Credit Score: Surprisingly, the intake form sought information on my credit score. It seems that
some insurers use credit history as a predictor, linking a higher credit score to responsible financial
behavior and, potentially, responsible driving habits.

Question 2.2

Q 2.2.1 Using the support vector machine function ksvm contained in the R package kernlab,
find a good classifier for this data. . .

Install and load “kernlab”

rm(list = ls())
#install.packages("kernlab")
if(!require(pacman)) install.packages("pacman")

## Loading required package: pacman

library(pacman)
p_load(kernlab, tinytex)

1
Load and read credit_card_data.txt to dataframe

file_path <- "~/Georgia Tech - OMSA/ISYE 6501/hw1/data 2.2/credit_card_data.txt"

data <- read.table(file_path, stringsAsFactors = FALSE, header=FALSE)
head(data)

## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 1 30.83 0.000 1.25 1 0 1 1 202 0 1
## 2 0 58.67 4.460 3.04 1 0 6 1 43 560 1
## 3 0 24.50 0.500 1.50 1 1 0 1 280 824 1
## 4 1 27.83 1.540 3.75 1 0 5 0 100 3 1
## 5 1 20.17 5.625 1.71 1 1 0 1 120 0 1
## 6 1 32.08 4.000 2.50 1 1 0 0 360 0 1

#tail(data)

Call ksvm
ksvm(x, y, type, kernel, . . . )

model <- ksvm(as.matrix(data[,1:10]), as.factor(data[,11]),

type="C-svc", kernel="vanilladot", C=99, scaled=TRUE)

## Setting default kernel parameters

model

## Support Vector Machine object of class "ksvm"

##
## SV type: C-svc (classification)
## parameter : cost C = 99
##
## Linear (vanilla) kernel function.
##
## Number of Support Vectors : 192
##
## Objective Function Value : -17709.08
## Training error : 0.136086

#attributes(model)

Calculate a1. . . am

• The model@xmatrix[[1]] contains the support vectors.

• The model@coef[[1]] contains the coefficients associated with the support vectors.
• The expression colSums(model@xmatrix[[1]] * model@coef[[1]]) computes the weighted sum of
the support vectors for each feature, which represents the coefficients (excluding the bias term) of the
decision function.

2
a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
a

## V1 V2 V3 V4 V5
## -0.0010701658 -0.0010813457 -0.0016245504 0.0027619336 1.0049404299
## V6 V7 V8 V9 V10
## -0.0027308622 0.0001028633 -0.0005668291 -0.0012766361 0.1063990379

Calculate a0

• The model@b contains the bias term of the decision function.

• The expression a0 <- -model@b calculates the negation of the bias term.

a0 <- -model@b
a0

## [1] 0.08152318

See what the model predicts

pred <- predict(model,data[,1:10])

pred

## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [260] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [297] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [556] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Levels: 0 1

See what fraction of the model’s predictions match the actual classification

sum(pred == data[,11]) / nrow(data)

## [1] 0.8639144

First observation: while an accuracy of 86.39% might be good in many contexts, it’s crucial
to consider the specific nature of credit card application decisions and any associated risks or
consequences.

3
#p_load(caret)
#Define a range of C values
C_values <- c(0.01, 0.1, 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# Initialize variables to store best accuracy and corresponding C value
best_accuracy <- 0
best_C <- NULL
# Loop over each C value
for (C in C_values){
# Train the model
svm_model <- ksvm(
x = as.matrix(data[, 1:10]),
y = as.factor(data[, 11]),
type = "C-svc",
kernel = "vanilladot",
C = C,
scaled = TRUE
)

# Evaluate on the validation set

predictions <- predict(svm_model, newdata = as.matrix(data[, 1:10]))
accuracy <- mean(predictions == data[, 11])

# Check if the current C value gives a better accuracy

if (accuracy > best_accuracy) {
best_accuracy <- accuracy
best_C <- C
}
}

My attempt to use a loop to find a better C using a loop

## Setting default kernel parameters

## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters
## Setting default kernel parameters

# Print results
print(paste("Best C Value:", best_C))

## [1] "Best C Value: 0.01"

4
print(paste("Test Accuracy with Best C:", best_accuracy))

## [1] "Test Accuracy with Best C: 0.863914373088685"

Conclusion: Even though my code seems to work properly without error, it was not able to find
a better accuracy than 86.39%, the the best C that I found is 0.01. I suspect that something
might be wrong with my logic, or perhaps I did not properly split my dataset into training
and test data.

Q 2.2.2 Try other (nonlinear) kernels as well; they can sometimes be useful and might provide
better predictions than vanilladot.

#attempt with rbfdot

rbfdot_model <- ksvm(as.matrix(data[,1:10]), as.factor(data[,11]),
type="C-svc", kernel="rbfdot", C=95, scaled=TRUE)

rbfdot_model

## Support Vector Machine object of class "ksvm"

##
## SV type: C-svc (classification)
## parameter : cost C = 95
##
## Gaussian Radial Basis kernel function.
## Hyperparameter : sigma = 0.101540052980955
##
## Number of Support Vectors : 246
##
## Objective Function Value : -8230.715
## Training error : 0.044343

#attempt with polydot

polydot_model <- ksvm(as.matrix(data[,1:10]), as.factor(data[,11]),
type="C-svc", kernel="polydot", C=95, scaled=TRUE)

## Setting default kernel parameters

polydot_model

## Support Vector Machine object of class "ksvm"

##
## SV type: C-svc (classification)
## parameter : cost C = 95
##
## Polynomial kernel function.
## Hyperparameters : degree = 1 scale = 1 offset = 1
##
## Number of Support Vectors : 189
##
## Objective Function Value : -16993.56
## Training error : 0.136086

5
Observation: “rbfdot” kernel seems to do a better job with training error of 0.053517 while
“polydot” kernel provide the same stats as “vanilladot”.

Q 2.2.3 Using the k-nearest-neighbors classification function kknn contained in the R kknn
package, suggest a good value of k, and show how well it classifies that data points in the full
data set.

p_load(kknn)
#Pick a random i can be any number in the dataset
i=200
# Specify the range of k values to test
k_values <- c(1, 5, 10, 15, 20, 25, 30)

# Initialize variables to store best accuracy and corresponding k value

best_accuracy <- 0
best_k <- NULL

# Iterate over each k value

for (k in k_values) {
# Train the model
model_knn <- kknn(
V11 ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10,
data[-i, ],
data[i, ],
k = k,
distance = 2,
kernel = "optimal",
scale = TRUE
)

# Obtain predicted values for the training set

predicted_values <- fitted.values(model_knn)

# Evaluate model performance (you can use other metrics as well)

accuracy <- mean(predicted_values == data[-i, ]$V11)

# Check if the current k value gives a better accuracy

if (accuracy > best_accuracy) {
best_accuracy <- accuracy
best_k <- k
}
}

# Print the best k value and corresponding accuracy

cat("Best k value:", best_k, "\n")

## Best k value: 1

cat("Best Accuracy:", best_accuracy, "\n")

## Best Accuracy: 0.4517611

6
kknn(formula, train, test, k = 7, distance = 2, kernel = “optimal”, scale = TRUE, . . . )

• Response Variable (V11): The variable you are trying to predict.

• Predictor Variables (V1 to V10): The variables used as features for prediction.

• Training Data (data[-i,]): The dataset used for training the model, excluding the observation at
index i.
• Test Data (data[i,]): The dataset containing a single observation (index i) for which you want to
predict the response variable.
• k (k = 10): The number of nearest neighbors to consider.

• Distance Metric (distance = 2): Euclidean distance is used as the distance metric.
• Kernel Function (kernel = "optimal"): The “optimal” kernel dynamically adjusts weights based
on local data density.
• Scaling (scale = TRUE): The predictor variables are scaled to have zero mean and unit variance.

• The fitted.values method is used to obtain the predicted values for the response variable based
on a trained k-nearest neighbors (KNN) model. Specifically, fitted.values(model_knn) returns the
predicted values for the response variable for the data points used to train the model.

Conclusion: Using the for loop with the range if k values (1, 5, 10, 15, 20, 25, 30), it seems
that the best k value is 1 and the best accuracy is 0.4517611.

ISYE6501 HW1 Kevin
No ratings yet
ISYE6501 HW1 Kevin
7 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
APPENDIX F and Enclosures 1A and B
No ratings yet
APPENDIX F and Enclosures 1A and B
4 pages
Assignment IV 2024 2
No ratings yet
Assignment IV 2024 2
5 pages
Solution 2.2
No ratings yet
Solution 2.2
4 pages
How To Really Reset Your Life
No ratings yet
How To Really Reset Your Life
29 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Lecture 11
No ratings yet
Lecture 11
32 pages
Becoming Top Ten: An Analysis of Michigan's ESSA Plan
0% (1)
Becoming Top Ten: An Analysis of Michigan's ESSA Plan
10 pages
Lasswell Analysis of Political Behaviour
No ratings yet
Lasswell Analysis of Political Behaviour
325 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Bowman K.S. Suarez V.D. Weiss M.J. Standards For Interprofessional Collaboration in The Treatment of Individuals With Autism. Behav Analysis Pra
No ratings yet
Bowman K.S. Suarez V.D. Weiss M.J. Standards For Interprofessional Collaboration in The Treatment of Individuals With Autism. Behav Analysis Pra
18 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
Chenhao HW1
No ratings yet
Chenhao HW1
5 pages
Analysis Course HW2
No ratings yet
Analysis Course HW2
13 pages
Definition, Division and Clasification
No ratings yet
Definition, Division and Clasification
9 pages
Experimental Psychology (Commonsense Psychology and Modern Science)
No ratings yet
Experimental Psychology (Commonsense Psychology and Modern Science)
3 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
Pisces Sign Dates
No ratings yet
Pisces Sign Dates
5 pages
Strengths of Character and Virtues
No ratings yet
Strengths of Character and Virtues
45 pages
DA Programs
No ratings yet
DA Programs
44 pages
Intern's Performance Evaluation Form - Supervisor
No ratings yet
Intern's Performance Evaluation Form - Supervisor
2 pages
Lemlem Abebaw Asaye Asignment 7
No ratings yet
Lemlem Abebaw Asaye Asignment 7
9 pages
Question 2.2
No ratings yet
Question 2.2
2 pages
Essay - On Her Knees
No ratings yet
Essay - On Her Knees
6 pages
HABIT 3 - Summary
No ratings yet
HABIT 3 - Summary
9 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
Willard 1977
100% (1)
Willard 1977
7 pages
Assignment III
No ratings yet
Assignment III
3 pages
MYP DESIGN TOOLKIT Vol. 1
100% (1)
MYP DESIGN TOOLKIT Vol. 1
30 pages
Brand-Building Implications: I. Salience
No ratings yet
Brand-Building Implications: I. Salience
3 pages
Saurabh
No ratings yet
Saurabh
22 pages
Forgiveness An Act of Defiance Sept2014
No ratings yet
Forgiveness An Act of Defiance Sept2014
3 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Classification
No ratings yet
Classification
11 pages
HSCAP Application Kerala
100% (2)
HSCAP Application Kerala
4 pages
Question 2.2
No ratings yet
Question 2.2
4 pages
Solution 1
No ratings yet
Solution 1
6 pages
ML Fundamentals
No ratings yet
ML Fundamentals
38 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
A Brief Introduction To Linear Models in R
No ratings yet
A Brief Introduction To Linear Models in R
21 pages
Wainer PDF
No ratings yet
Wainer PDF
13 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Murder Mystery Small Group Exercise
No ratings yet
Murder Mystery Small Group Exercise
4 pages
8503
No ratings yet
8503
9 pages
Perceiving Emotions Is About Being Aware of and Sensitive To Others' Emotions. in Other Words, It's
No ratings yet
Perceiving Emotions Is About Being Aware of and Sensitive To Others' Emotions. in Other Words, It's
9 pages
Teacher A Observation Form 2 Feb 24
No ratings yet
Teacher A Observation Form 2 Feb 24
7 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
Gay PDF
100% (2)
Gay PDF
11 pages
Entrepreneurial Competencies
No ratings yet
Entrepreneurial Competencies
3 pages
Makati City Government Individual Performance Commitment & Review (Ipcr)
No ratings yet
Makati City Government Individual Performance Commitment & Review (Ipcr)
2 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Week 1 HW
No ratings yet
Week 1 HW
3 pages
Welcome To Cmpe140 Final Exam: Studentid
No ratings yet
Welcome To Cmpe140 Final Exam: Studentid
21 pages
Movement Monotype Printsand Rubric
0% (1)
Movement Monotype Printsand Rubric
2 pages
Untitled
No ratings yet
Untitled
29 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
CRM Sow
No ratings yet
CRM Sow
27 pages
Introduction To Public Policy Analysis
No ratings yet
Introduction To Public Policy Analysis
13 pages
BPSM Case Study GJ
No ratings yet
BPSM Case Study GJ
6 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
7 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
Time Management and Academic Achievements of Medic
100% (1)
Time Management and Academic Achievements of Medic
19 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
R Assignment
No ratings yet
R Assignment
8 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Project
No ratings yet
Project
16 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
A Note On R
No ratings yet
A Note On R
90 pages
Assignment 11-17-15: Michael Petzold November 19, 2015
No ratings yet
Assignment 11-17-15: Michael Petzold November 19, 2015
4 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Topic 2 Matlab Examples
No ratings yet
Topic 2 Matlab Examples
5 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
No ratings yet
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
3 pages
Package Rminer': R Topics Documented
No ratings yet
Package Rminer': R Topics Documented
43 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
R Examples
No ratings yet
R Examples
56 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
HW 1
No ratings yet
HW 1
4 pages

ISYE6501 Homework 1

Uploaded by

ISYE6501 Homework 1

Uploaded by

ISYE 6501 - Homework 1

Install and load “kernlab”

## Loading required package: pacman

file_path <- "~/Georgia Tech - OMSA/ISYE 6501/hw1/data 2.2/credit_card_data.txt"

model <- ksvm(as.matrix(data[,1:10]), as.factor(data[,11]),

## Setting default kernel parameters

## Support Vector Machine object of class "ksvm"

• The model@xmatrix[[1]] contains the support vectors.

• The model@b contains the bias term of the decision function.

See what the model predicts

pred <- predict(model,data[,1:10])

sum(pred == data[,11]) / nrow(data)

# Evaluate on the validation set

# Check if the current C value gives a better accuracy

My attempt to use a loop to find a better C using a loop

## Setting default kernel parameters

## [1] "Best C Value: 0.01"

## [1] "Test Accuracy with Best C: 0.863914373088685"

#attempt with rbfdot

## Support Vector Machine object of class "ksvm"

#attempt with polydot

## Setting default kernel parameters

## Support Vector Machine object of class "ksvm"

# Initialize variables to store best accuracy and corresponding k value

# Iterate over each k value

# Obtain predicted values for the training set

# Evaluate model performance (you can use other metrics as well)

# Check if the current k value gives a better accuracy

# Print the best k value and corresponding accuracy

cat("Best Accuracy:", best_accuracy, "\n")

## Best Accuracy: 0.4517611

• Response Variable (V11): The variable you are trying to predict.

You might also like