0% found this document useful (0 votes)
29 views4 pages

Tutorial 9 - Questions 2023

The document discusses building a classification model using support vector machines (SVM) in R. It covers preparing training and testing datasets from a customer churn dataset, building an SVM model using the training data, and evaluating the model's performance on the testing data. Key steps include splitting data, training the SVM, making predictions on new data, and calculating performance metrics.

Uploaded by

ceewang23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

Tutorial 9 - Questions 2023

The document discusses building a classification model using support vector machines (SVM) in R. It covers preparing training and testing datasets from a customer churn dataset, building an SVM model using the training data, and evaluating the model's performance on the testing data. Key steps include splitting data, training the SVM, making predictions on new data, and calculating performance metrics.

Uploaded by

ceewang23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ICT583 Data Science Applications

Tutorial 9
Classification - Machine learning models
SVM

After completing this lab, you should know:


a. How to prepare the training and testing dataset
b. How to build support vector machines in R - libsvm

Do you still remember how support vector machines works?

The support vector machine constructs a hyperplane (or set of hyperplanes) that
maximize the margin width between two classes in a high dimensional space. In
these, the cases that define the hyperplane are support vectors, as shown in the
following figure:

Building a classification model requires a training dataset to train the


classification model, and testing dataset is needed to then evaluate the prediction
performance. We will use the customer churn dataset as the input data, and split
the data into training and testing datasets.

# Retrieve the churn dataset:


install.packages("modeldata")
library(modeldata)
data(mlc_churn)

Understand your dataset first!

# We can remove the state, area_code, and account_length attributes, which are not
appropriate for classification features:

mlc_churn = mlc_churn[,! names(mlc_churn) %in% c("state", "area_code",


"account_length") ]

# Then, we split 70 percent of the data into the training dataset and 30 percent of
the data into the testing dataset using a sample function:

# Set random seed


set.seed(123)

Note: set.seed() function in R is used to reproduce results i.e. it produces the same
sample again and again. When we generate randoms numbers without set.seed()
function it will produce different samples at different time of execution.

ind = sample(2, nrow(mlc_churn), replace = TRUE,


prob=c(0.7, 0.3))
trainset = mlc_churn[ind == 1,]
testset = mlc_churn[ind == 2,]

# Lastly, use dim to explore the dimensions of both the training and testing
datasets:

dim(trainset)
dim(testset)

What is the disadvantage of this training-testing data partition strategy? How can
we improve?

We train the SVM using the following steps:

# Load the e1071 package:


library(e1071)

# Train the support vector machine using the svm function with trainset as the
input dataset and use churn as the classification category:

model =svm(churn~., data = trainset, kernel="radial", cost=1,


gamma = 1/ncol(trainset))

# Finally, you can obtain overall information about the built model with summary:

summary(model)

Predict labels based on the model trained by support vector machine

svm.pred = predict(model, testset[, !names(testset)


%in% c("churn")])

# Then, you can use the table function to generate a classification table with the
prediction result and labels of the testing dataset:

svm.table=table(svm.pred, testset$churn)
svm.table
# Now, you can use confusionMatrix from package caret to measure the
prediction performance based on the classification table:

confusionMatrix(svm.table)

You might also like