SVM, Neural Network and Random Forest in R
SVM, Neural Network and Random Forest in R
SVM, Neural Network and Random Forest in R
Unit-3
EXPERIMENT-8
Import CSV data file and implement training and testing of Support Vector Machine.
Machine Learning
• Machine learning is a science of getting computers to act by feeding them data and letting
them learn a few tricks on their own, without being explicitly programmed to do so.
• The key to machine learning is the data. Machines learn just like us humans. We humans need
to collect information and data to learn, similarly, machines must also be fed data in order to
learn and make decisions.
• To understand Machine learning, let’s consider an example. Let’s say you want a machine to
predict the value of a stock. In such situations, you just feed the machine with relevant data.
After that, you must create a model which is used to predict the value of the stock.
Types Of Machine Learning
• 1. Supervised learning
• Supervised means to oversee or direct a certain activity and make sure it’s done correctly. In this
type of learning the machine learns under guidance.
• At school, our teachers guided us and taught us, similarly in supervised learning, you feed the
model a set of data called training data, which contains both input data and the corresponding
expected output. The training data acts as a teacher and teaches the model the correct output
for a particular input so that it can make accurate decisions when later presented with new data.
• 2. Unsupervised learning
• Unsupervised means to act without anyone’s supervision or direction.
• In unsupervised learning, the model is given a data set which is neither labeled nor classified. The
model explores the data and draws inferences from data sets to define hidden structures from
unlabeled data.
• An example of unsupervised learning is an adult like you and me. We don’t need a guide to help
us with our daily activities, we figure things out on our own without any supervision.
• 3. Reinforcement learning
• Reinforcement means to establish or encourage a pattern of behavior. Let’s say you were
dropped off at an isolated island, what would you do?
• Initially, you’d panic and be unsure of what to do, where to get food from, how to live and
so on. But after a while you will have to adapt, you must learn how to live in the island,
adapt to the changing climates, learn what to eat and what not to eat.
• You’re following what is known as the hit and trail concept because you’re new to this
surrounding and the only way to learn, is experience and then learn from your experience.
• This is what reinforcement learning is. It is a learning method wherein an agent (you, stuck
on an island) interacts with its environment (island) by producing actions and discovers
errors or rewards.
What Is SVM?
• SVM (Support Vector Machine) is a supervised machine learning algorithm which is mainly
used to classify data into different classes. Unlike most algorithms, SVM makes use of a
hyperplane which acts like a decision boundary between the various classes.
• SVM can be used to generate multiple separating hyperplanes such that the data is divided
into segments and each segment contains only one kind of data.
• Advantages
1. SVM is a supervised learning algorithm. This means that SVM trains on a set of labeled
data. SVM studies the labeled training data and then classifies any new input data
depending on what it learned in the training phase.
2. A main advantage of SVM is that it can be used for both classification and regression
problems. Though SVM is mainly known for classification, the SVR (Support Vector
Regressor) is used for regression problems.
3. SVM can be used for classifying non-linear data by using the kernel trick. The kernel trick
means transforming data into another dimension that has a clear dividing margin
between classes of data. After which you can easily draw a hyperplane between the
various classes of data.
How Does SVM Work?
• In order to understand how SVM works let’s consider a scenario.
• For a second, pretend you own a farm and you have a problem–you need to set up a fence to
protect your rabbits from a pack of wolves. But where do you build your fence?
• One way to get around the problem is to build a classifier based on the position of the
rabbits and wolves in your pasture.
• So if I do that, and try to draw a decision boundary between the rabbits and the wolves, it
looks something like this. Now you can clearly build a fence along this line.
• In simple terms, this is exactly how SVM works. It draws a decision boundary, i.e. a hyperplane
between any two classes in order to separate them or classify them.
• he basic principle behind SVM is to draw a hyperplane that best separates the 2 classes. In our case
the two classes are the rabbits and the wolves.
What is a Support Vector in SVM?
Implementation(BASIC)
Implementation(Train and Test)
• Link: dataset of Social network aids from file Social.csv (geeksforgeeks.org)
• #Importing the dataset
• dataset = read.csv('social.csv’) (If you are unable to import with this command, then
import directly from option)
• # Taking columns 3-5
• dataset = social[3:5]
• # Encoding the target feature as factor (Creating factor for Purchased)
• dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))
• # Splitting the dataset into the Training set and Test set
• install.packages('caTools')
• library(caTools)
• set.seed(123) (random generator)
• split = sample.split(dataset$Purchased, SplitRatio = 0.75) (split into two
sets)
• training_set = subset(dataset, split == TRUE)
• test_set = subset(dataset, split == FALSE)
• # Feature Scaling
• training_set[-3] = scale(training_set[-3])
• test_set[-3] = scale(test_set[-3])
• Then, we use lapply to run the function across our existing data (we have
termed the dataset loaded into R as mydata):
• maxmindf <- as.data.frame(lapply(mydata, normalize))
• We base our training data (trainset) on 80% of the observations. The test data
(testset) is based on the remaining 20% of observations.
• # Training and Test Data
• trainset <- maxmindf[1:160, ]