0% found this document useful (0 votes)
5 views86 pages

BE02000041 Funda of AI Unit 3 Basics of ML

This document provides an overview of the fundamentals of machine learning, including its definitions, types (supervised, unsupervised, and reinforcement learning), and the machine learning life cycle. It discusses the importance of learning from examples, the role of models, features, and the process of building and evaluating machine learning models. Additionally, it highlights practical applications of machine learning in various domains such as recommendation systems, spam filtering, and autonomous vehicles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views86 pages

BE02000041 Funda of AI Unit 3 Basics of ML

This document provides an overview of the fundamentals of machine learning, including its definitions, types (supervised, unsupervised, and reinforcement learning), and the machine learning life cycle. It discusses the importance of learning from examples, the role of models, features, and the process of building and evaluating machine learning models. Additionally, it highlights practical applications of machine learning in various domains such as recommendation systems, spam filtering, and autonomous vehicles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Fundamental of AI

(BE02000041)

Unit – 3
Basics of Machine Learning

Prof. Hitesh D. Rajput


Asst. Prof., Computer Engineering Department,
L. D. College of Engineering, Ahmedabad
Outline
• Learning from examples
• Forms of Learning -Supervised learning, Unsupervised
learning, Reinforcement learning
• Simple Models –Linear regression, Logistic regression,
Support Vector Machines (SVM) etc.
Learning from Examples
• An agent is learning if it improves its performance after making
observations about the world.
• Learning can range from the trivial, such as writing down a
shopping list, to the profound, as when Albert Einstein inferred
a new theory of the universe.
• When the agent is a computer, we call it machine learning: a
computer observes some data, builds a model based on the
data, and uses the model as both a hypothesis about the world
and a piece of software that can solve problems.
Learning from Examples (Contd..)
• Why would we want a machine to learn? Why not just
program it the right way to begin with?
• There are two main reasons:
• First, the designers cannot anticipate all possible future
situations. For example, a robot designed to navigate
mazes must learn the layout of each new maze it
encounters; a program for predicting stock market prices
must learn to adapt when conditions change from boom
to bust.
• Second, sometimes the designers have no idea how to
program a solution themselves.
• Most people are good at recognizing the faces of family
members, but they do it subconsciously, so even the best
programmers don’t know how to program a computer to
accomplish that task, except by using machine learning
algorithms.
Forms of Learning
• Any component of an agent program can be improved
by machine learning.
• The improvements, and the techniques used to make
them, depend on these factors:
• Which component is to be improved.
• What prior knowledge the agent has, which influences
the model it builds.
• What data and feedback on that data is available
Formal Definition of ML
• Tom Mitchell provides a more modern definition: “A
computer program is said to learn from experience E
with respect to some class of tasks T and
performance measure P, if its performance at tasks
in T, as measured by P, improves with experience E.”
Formal Definition of ML
The Task, T
• If we want a robot to be able to walk, then walking is the task.
• “Learning is our means of attaining the ability to perform the
task”
• We could program the robot to learn to walk, or we could
directly write a program that specifies how to walk manually.
The Performance Measure, P
• In order to evaluate a machine learning algorithm, we must
measure its performance.
• For tasks such as classification, we often measure the accuracy
of the model.
• Accuracy is just the proportion of examples for which the
model produces the correct output.
Formal Definition of ML
The Experience, E
• Machine learning algorithms can be broadly categorized as
unsupervised or supervised by what kind of experience they
are allowed to have during the learning process.
• Unsupervised learning algorithms experience a dataset
containing many features, then learn useful properties of the
structure of this dataset.
• Supervised learning algorithms experience a dataset
containing features, but each example is also associated with a
label or target.
Example
• Take the every-day case of the decision problem of
discriminating spam email from non-spam email.
• How would you write a program to filter emails as they come
into your email account and decide to put them in the spam
folder or the inbox folder?
• In our spam/non-spam example, the examples (E) are emails
we have collected. The task (T) was a decision problem (called
classification) of marking each email as spam or not, and
putting it in the correct folder. Our performance measure (P)
would be something like accuracy as a percentage (correct
decisions divided by total decisions made multiplied by 100)
between 0% (worst) and 100% (best).
Forms of Learning
• Supervised learning the agent observes input-output pairs and learns a
function that maps from input to output.
• For example, the inputs could be camera images, each one accompanied
by an output saying “bus” or “pedestrian,” etc. An output like this is called
a label. The agent learns a function that, when given a new image, predicts
the appropriate label.
• Unsupervised learning the agent learns patterns in the input without any
explicit feedback. The most common unsupervised learning task is
clustering: detecting potentially useful clusters of input examples.
• For example, when shown millions of images taken from the Internet, a
computer vision system can identify a large cluster of similar images which
an English speaker would call “cats.”
• Reinforcement learning the agent learns from a series of reinforcements:
rewards and punishments.
• For example, at the end of a chess game the agent is told that it has won (a
reward) or lost (a punishment). It is up to the agent to decide which of the
actions prior to the reinforcement were most responsible for it, and to
alter its actions to aim towards more rewards in the future.
Forms of Learning
• The input is a factored representation - a vector of attribute
values.
• It is also possible for the input to be any kind of data structure,
including atomic and relational.
• When the output is one of a finite set of values (such as
sunny/cloudy/rainy or true/false), the learning problem is
called classification.
• When it is a number (such as tomorrow’s temperature,
measured either as an integer or a real number), the learning
problem has the name regression.
Difference Between Artificial Intelligence,
Machine Learning and Deep Learning
How it differs from traditional programming
Once Learned/Trained

INPUT Learned ML OUTPUT


Program
Need for ML
Machine Learning Everywhere
• Netflix’s Recommendation Engine: The core of Netflix is its
infamous recommendation engine. Over 75% of what you watch is
recommended by Netflix and these recommendations are made by
implementing Machine Learning.

• Facebook’s Auto-tagging feature: The logic behind Facebook’s


DeepMind face verification system is Machine Learning and Neural
Networks. DeepMind studies the facial features in an image to tag
your friends and family.
• Amazon’s Alexa: The infamous Alexa, which is based on Natural
Language Processing and Machine Learning is an advanced level
Virtual Assistant that does more than just play songs on your
playlist. It can book you an Uber, connect with the other IoT devices
at home, track your health, etc.
• Google’s Spam Filter: Gmail makes use of Machine Learning to filter
out spam messages. It uses Machine Learning algorithms and
Natural Language Processing to analyze emails in real-time and
classify them as either spam or non-spam.
Machine Learning Everywhere
• Automatic Language • Netflix Recommendation
Translation in Google System
Translate • Auto friend tagging
• Faster route selection in suggestion in Facebook
Google Map • Stock market trading
• Driverless/Self-driving car • Fraud Detection
• Smartphone with face • Weather Prediction
recognition
• Medical Diagnosis
• Speech Recognition
• Chatbot
• Ads Recommendation
System • Machine Learning in
Agriculture
• Netflix Recommendation
System
ML Terminology
• Model: Also known as “hypothesis”, a machine learning
model is the mathematical representation of a real-world
process. A machine learning algorithm along with the training
data builds a machine learning model.
• Feature: A feature is a measurable property or parameter of
the data-set.
• Feature Vector: It is a set of multiple numeric features. We
use it as an input to the machine learning model for training
and prediction purposes.
• Training: An algorithm takes a set of data known as “training
data” as input. The learning algorithm finds patterns in the
input data and trains the model for expected results (target).
The output of the training process is the machine learning
model.
ML Terminology
• Prediction: Once the machine learning model is ready, it can
be fed with input data to provide a predicted output.
• Target (Label): The value that the machine learning model has
to predict is called the target or label.
• Overfitting: When a massive amount of data trains a machine
learning model, it tends to learn from the noise and
inaccurate data entries. Here the model fails to characterize
the data correctly.
• Underfitting: It is the scenario when the model fails to
decipher the underlying trend in the input data. It destroys
the accuracy of the machine learning model. In simple terms,
the model or the algorithm does not fit the data well enough.
Machine Learning Process
Machine Learning Life Cycle
Step 1: Define the objective of the Problem Statement
• At this step, we must understand what exactly needs to be
predicted.
• For example, the objective is to predict the possibility of rain by
studying weather conditions.
• At this stage, it is also essential to take mental notes on what
kind of data can be used to solve this problem or the type of
approach you must follow to get to the solution.
Step 2: Data Gathering
• At this stage, you must be asking questions such as,
• What kind of data is needed to solve this problem?
• Is the data available?
• How can I get the data?
• Once you know the types of data that is required, you must
understand how you can derive this data. Data collection can be
done manually or by web scraping.
Machine Learning Life Cycle

Step 3: Data Preparation


• The data you collected is almost never in the right format.
You will encounter a lot of inconsistencies in the data set
such as missing values, redundant variables, duplicate
values, etc.
• Removing such inconsistencies is very essential because
they might lead to wrongful computations and predictions.
• Therefore, at this stage, you scan the data set for any
inconsistencies and you fix them then and there.
Machine Learning Life Cycle

Step 4: Exploratory Data Analysis


• EDA or Exploratory Data Analysis is the brainstorming stage
of Machine Learning.
• Data Exploration involves understanding the patterns and
trends in the data. At this stage, all the useful insights are
drawn and correlations between the variables are
understood.
• For example, in the case of predicting rainfall, we know
that there is a strong possibility of rain if the temperature
has fallen low. Such correlations must be understood and
mapped at this stage.
Machine Learning Life Cycle
Step 5: Building a Machine Learning Model

• All the insights and patterns derived during Data Exploration are
used to build the Machine Learning Model.
• This stage always begins by splitting the data set into two parts,
training data, and testing data.
• The training data will be used to build and analyze the model.
The logic of the model is based on the Machine Learning
Algorithm that is being implemented.
• In the case of predicting rainfall, since the output will be in the
form of True (if it will rain tomorrow) or False (no rain
tomorrow), we can use a Classification Algorithm such as Logistic
Regression.
• Choosing the right algorithm depends on the type of problem
you’re trying to solve, the data set and the level of complexity of
the problem.
Machine Learning Life Cycle

Step 6: Model Evaluation & Optimization

• After building a model by using the training data set, it is


finally time to put the model to a test.
• The testing data set is used to check the efficiency of the
model and how accurately it can predict the outcome.
• Once the accuracy is calculated, any further improvements
in the model can be implemented at this stage.
• Methods like parameter tuning and cross-validation can be
used to improve the performance of the model.
Machine Learning Life Cycle

Step 7: Predictions
• Once the model is evaluated and improved, it is finally used
to make predictions.
• The final output can be a Categorical variable (eg. True or
False) or it can be a Continuous Quantity (eg. the predicted
value of a stock).
• In our case, for predicting the occurrence of rainfall, the
output will be a categorical variable.
Types of Machine Learning

• There are main 3 types:


– Supervised ML
– Unsupervised ML
– Reinforcement ML
Types of Machine Learning Algorithms
Supervised ML
• Supervised learning is a class of problems that uses a
model to learn the mapping between the input and
target variables.
• Applications consisting of the training data describing the
various input variables and the target variable are known
as supervised learning tasks.
• Let the set of input variable be (x) and the target variable
be (y). A supervised learning algorithm tries to learn a
hypothetical function which is a mapping given by the
expression y=f(x), which is a function of x.
Supervised ML
• The learning process here is monitored or supervised. Since
we already know the output the algorithm is corrected each
time it makes a prediction, to optimize the results.
• Models are fit on training data which consists of both the
input and the output variable and then it is used to make
predictions on test data.
• Only the inputs are provided during the test phase and the
outputs produced by the model are compared with the kept
back target variables and is used to estimate the performance
of the model.
• There are basically two types of supervised problems:
– Classification – which involves prediction of a class label
– Regression – that involves the prediction of a numerical
value
Supervised ML
• Examples of algorithms used include Logistic
Regression, Nearest Neighbor, Naive Bayes, Decision
Trees, Linear Regression, Support Vector Machines
(SVM), Neural Networks.
Unsupervised ML
• In Unsupervised learning, there is no prior information about
the data, hence the model tries to learn by itself and
recognize patterns and extract the relationships among the
data.
• As in case of a supervised learning there is no supervisor or a
teacher to drive the model. Unsupervised learning operates
only on the input variables.
• There are no target variables to guide the learning process.
The goal here is to interpret the underlying patterns in the
data in order to obtain more proficiency over the underlying
data.
Unsupervised ML
• There are two main categories in unsupervised learning;
– Clustering – where the task is to find out the different
groups in the data.
– Density Estimation – which tries to consolidate the
distribution of data.
Reinforcement Learning
• Reinforcement learning is type a of problem where there
is an agent and the agent is operating in an environment
based on the feedback or reward given to the agent by
the environment in which it is operating.
• The rewards could be either positive or negative. The
agent then proceeds in the environment based on the
rewards gained.
• The reinforcement agent determines the steps to
perform a particular task. There is no fixed training
dataset here and the machine learns on its own.
Reinforcement Learning

• Playing a game is a classic example of a reinforcement


problem, where the agent’s goal is to acquire a high
score.
• It makes the successive moves in the game based on the
feedback given by the environment which may be in
terms of rewards or a penalization.
Semi-Supervised ML

• The cost to label the data is quite expensive as it requires


the knowledge of skilled human experts.
• The input data is combination of both labeled and un-
labelled data.
• The model makes the predictions by learning the
underlying patterns on their own.
• It is a mix of both classification and clustering problems.
Regression vs Classification
Regression
• Regression finds correlations between dependent and
independent variables. Therefore, regression algorithms help
predict continuous variables such as house prices, market
trends, weather patterns, oil and gas prices (a critical task
these days!), etc.
• The Regression algorithm’s task is finding the mapping
function so we can map the input variable of “x” to the
continuous output variable of “y.”
• Examples:
– Weather forecasting
– House price prediction
– Loan amount prediction
– Car price prediction
Regression Algorithms

• Simple Linear Regression


• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
Classification
• On the other hand, Classification is an algorithm that finds
functions that help divide the dataset into classes based on various
parameters.
• When using a Classification algorithm, a computer program gets
taught on the training dataset and categorizes the data into various
categories depending on what it learned.
• Classification algorithms find the mapping function to map the “x”
input to “y” discrete output. The algorithms estimate discrete
values (in other words, binary values such as 0 and 1, yes and no,
true or false, based on a particular set of independent variables.
• To put it another, more straightforward way, classification
algorithms predict an event occurrence probability by fitting data to
a logit function.
• Example: email and spam classification, predicting the willingness
of bank customers to pay their loans, and identifying cancer tumor
cells.
Classification Algorithms

• Logistic Regression
• K-Nearest Neighbors
• Support Vector Machines
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
Difference
Regression Algorithm Classification Algorithm
In Regression, the output In Classification, the output
variable must be of variable must be a discrete
continuous nature or real value.
value.

The task of the regression The task of the classification


algorithm is to map the input algorithm is to map the input
value (x) with the continuous value(x) with the discrete
output variable(y). output variable(y).

Regression Algorithms are Classification Algorithms are


used with continuous data. used with discrete data.
Difference
Regression Algorithm Classification Algorithm

In Regression, we try to find the best In Classification, we try to find the


fit line, which can predict the output decision boundary, which can divide
more accurately. the dataset into different classes.
Regression algorithms can be used to Classification Algorithms can be used
solve the regression problems such to solve classification problems such
as Weather Prediction, House price as Identification of spam emails,
prediction, etc. Speech Recognition, Identification of
cancer cells, etc.
The regression Algorithm can be The Classification algorithms can be
further divided into Linear and Non- divided into Binary Classifier and
linear Regression. Multi-class Classifier.
Different Algorithms used in ML
• Regression Algorithm
• Instance based algorithm
• Regularization
• Decision Tree
• Bayesian
• Clustering
• Association rule learning
• Artificial Neural Network (ANN)
• Deep Learning (DL)
• Dimensionality Reduction
• Ensemble
Regression Algorithm

• Regression is a process that is


concerned with identifying the
relationship between the target
output variables and the input
features to make predictions about
the new data.
• Top six Regression algorithms are:
Simple Linear Regression, Lasso
Regression, Logistic regression,
Multivariate Regression algorithm,
Multiple Regression Algorithm.
Instance-based Algorithms:
• These belong to the family of learning
that measures new instances of the
problem with those in the training data
to find out a best match and makes a
prediction accordingly.
• The top instance-based algorithms are:
k-Nearest Neighbor, Learning Vector
Quantization, Self-Organizing Map,
Locally Weighted Learning, and
Support Vector Machines.
Regularization
• Regularization refers to the technique of regularizing the
learning process from a particular set of features. It
normalizes and moderates.
• The weights attached to the features are normalized, which
prevents in certain features from dominating the prediction
process.
• This technique helps to prevent the problem of overfitting in
machine learning.
• The various regularization algorithms are Ridge Regression,
Least Absolute Shrinkage and Selection Operator (LASSO) and
Least-Angle Regression (LARS).
Decision Tree Algorithms

• These methods construct a tree-based model constructed on


the decisions made by examining the values of the attributes.
• Decision trees are used for both classification and regression
problems.
• Some of the well-known decision tree algorithms are:
Classification and Regression Tree, C4.5 and C5.0,
Conditional Decision Trees, Chi-squared Automatic Interaction
Detection and Decision Stump.
Decision Tree Algorithms
Decision Tree Algorithms
Bayesian Algorithms

• These algorithms apply the


Bayes theorem for
classification and regression
problems.
• They include Naive Bayes,
Gaussian Naive Bayes,
Multinomial Naive Bayes,
Bayesian Belief Network,
Bayesian Network and
Averaged One-Dependence
Estimators.
Clustering Algorithms
• Clustering algorithms involve the
grouping of data points into clusters.
• All the data points that are in the same
group share similar properties and,
data points in different groups have
highly dissimilar properties.
• Clustering is an unsupervised learning
approach and is mostly used for
statistical data analysis in many fields.
• Algorithms like k-Means, k-Medians,
Expectation Maximization, Hierarchical
Clustering, and Density-Based Spatial
Clustering of Applications with Noise
fall under this category.
Association Rule Learning Algorithms:

• Association rule learning is a rule-


based learning method for
identifying the relationships
between variables in a very large
dataset.
• Association Rule learning is
employed predominantly
in market basket analysis.
• The most popular algorithms are:
Apriori algorithm and Eclat
algorithm.
Artificial Neural Network Algorithms:
• Artificial neural network algorithms
relies find its base from the
biological neurons in the human
brain.
• They belong to the class of complex
pattern matching and prediction
processes in classification and
regression problems.
• Some of the popular artificial
neural network algorithms are:
Perceptron, Multilayer Perceptrons,
Stochastic Gradient Descent, Back-
Propagation, , Hopfield Network,
and Radial Basis Function Network.
Deep Learning Algorithms:
• These are modernized versions of artificial
neural network, that can handle very large
and complex databases of labeled data.
• Deep learning algorithms are tailored to
handle text, image, audio and video data.
Deep learning uses self-taught learning
constructs with many hidden layers, to
handle big data and provides more
powerful computational resources.
• The most popular deep learning algorithms
are: Some of the popular deep learning ms
include Convolutional Neural Network,
Recurrent Neural Networks, Deep
Boltzmann Machine, Auto-Encoders Deep
Belief Networks and Long Short-Term
Memory Networks.
Dimensionality Reduction Algorithms:
• Dimensionality Reduction algorithms exploit the intrinsic structure
of data in an unsupervised manner to express data using reduced
information set.
• They convert a high dimensional data into a lower dimension
which could be used in supervised learning methods like
classification and regression.
• Some of the well known dimensionality reduction algorithms
include Principal Component Analysis, Principal Component
Regression, Linear Discriminant Analysis, Quadratic Discriminant
Analysis, Mixture Discriminant Analysis, Flexible Discriminant
Analysis and Sammon Mapping.
• It is commonly used in the fields that deal with high-dimensional
data, such as speech recognition, signal processing,
bioinformatics, etc. It can also be used for data visualization, noise
reduction, cluster analysis, etc.
Ensemble Algorithms:
• Ensemble methods are models made
up of various weaker models that are
trained separately and the individual
predictions of the models are
combined using some method to get
the final overall prediction.
• The quality of the output depends on
the method chosen to combine the
individual results.
• Some of the popular methods are:
Random Forest, Boosting,
Bootstrapped Aggregation, AdaBoost,
Stacked Generalization, Gradient
Boosting Machines, Gradient Boosted
Regression Trees and Weighted
Average.
Simple Linear Regression
• Our objective is to study the relationship between
two variables X and Y.
• One way is by means of regression.
• Regression analysis is the process of estimating a
functional relationship between X and Y. A
regression equation is often used to predict a
value of Y for a given value of X.
• Another way to study relationship between two
variables is correlation. It involves measuring the
direction and the strength of the linear
relationship.
First-Order Linear Model = Simple Linear
Regression Model

Yi   0  1 X i   i
where
y = dependent variable
x = independent variable
b0= y-intercept
b1= slope of the line
e = error variable
Simple Linear Model
Yi   0  1Xi   i
This model is
– Simple: only one X
– Linear in the parameters: No parameter appears
as exponent or is multiplied or divided by another
parameter
– Linear in the predictor variable (X): X appears
only in the first power.

60
Examples
• Multiple Linear Regression:
Yi  0  1X1i   2X 2i   i
• Polynomial Linear Regression:

Yi   0  1Xi   2 X 2i   i

• Linear Regression:
log 10 (Yi )  0  1Xi   2 exp( Xi )   i
• Nonlinear Regression:
Yi   0 /(1  1 exp(  2Xi ))   i
Linear or nonlinear in parameters 61
Deterministic Component of Model

50 y  ˆ0  ˆ1x
45
40
y-intercept 3530 ∆y
ˆ1 (slope)=∆y/∆x
25
∆x
ˆ020
15
10
5
0 x
0 5 10 15 20
Mathematical vs Statistical Relation

50 ^
45 y = - 5.3562 + 3.3988x
40
35
30
25
20
15
10
5
0 x
x
0 5 10 15 20
Error

• The scatterplot shows that the points are not on a


line, and so, in addition to the relationship, we also
describe the error:
yi   0  1 xi   i , i=1,2,...,n
• The Y’s are the response (or dependent) variable.
The x’s are the predictors or independent variables,
and the epsilon’s are the errors. We assume that the
errors are normal, mutually independent, and have
variance 2.
Features of Normal Error Regression
Model

• Yi = β0 + β1Xi + ei

• If ei is Normally distributed then


Yi is N(β0 + β1Xi , σ2)
• Does not imply the collection of Yi are
Normally distributed
Fitted Regression Equation and Residuals

• Ŷi = b0 + b1Xi
– b0 is the estimated intercept
– b1 is the estimated slope
• ei : residual for ith case
• ei = Yi – Ŷi = Yi – (b0 + b1Xi)
e82=Y82-Ŷ82
Ŷ82=b0 + b182

X=82
e82
Least Squares Solution

b1 
 (X  X )(Y  Y )
i i

 (X  X )
i
2

b 0  Y  b1 X

• These are also maximum likelihood estimators


for Normal error model
Least Squares:

 
n n
Minimize  i2  ( yi   0  i xi )2
i 1 i 1

yˆi  ˆ0  ˆ1 xi


• The quantitiesRi  yi  yˆi are called the residuals.
If we assume a normal error, they should look
normal.

Error: Yi-E(Yi) unknown;


Residual: Ri  yi  yˆi estimated, i.e. known
Logistic Regression
Binary classification:
• Two classes Y = {0,1}
• Goal is to learn how to correctly classify the
input into one of these two classes
– Class 0 – labeled as 0
– Class 1 – labeled as 1

• We would like to learn f : X → {0,1}


• Since the output must be 0 or 1, we cannot directly
use a linear model to estimate f(xi).
• Furthermore, we would like to estimate the
probability Pr(yi=1|xi) as f(xi).Lets call it pi.
• We model the log odds of the probability pi as:
ln (pi/(1-pi)) = .xi where ln (pi/(1-pi))=g(pi) is the
logit function
• By applying the inverse of g() – the logistic function,
we get back pi:
logit-1 [ ln (pi/(1-pi)) ] = pi
• Applying it on the RHS as well, we get
pi = logit-1 (.xi) = 1 / (1 + e-.xi)
Summary

• Representation:
Lecture Notes for E Alpaydın
2004 Introduction to Machine
74
Learning © The MIT Press
What is support vector?

• “Support Vector Machine” (SVM) is a supervised


machine learning algorithm which can be used for
both classification or regression challenges. However,
it is mostly used in classification problems.
• In this algorithm, we plot each data item as a point in
n-dimensional space (where n is number of features
you have) with the value of each feature being the
value of a particular coordinate.
• Then, we perform classification by finding the
hyperplane that differentiate the two classes very
well.
Support Vector Machine
• Generally, Support Vector Machines is considered to
be a classification approach, it but can be employed in
both types of classification and regression problems.
• It can easily handle multiple continuous and
categorical variables.
• SVM constructs a hyperplane in multidimensional
space to separate different classes. SVM generates
optimal hyperplane in an iterative manner, which is
used to minimize an error.
• The core idea of SVM is to find a maximum
marginal hyperplane(MMH) that best divides
the dataset into classes.
Decision Vectors
Definitions

Support Vectors
• Support vectors are the data points, which are
closest to the hyperplane. These points will
define the separating line better by calculating
margins.
• These points are more relevant to the
construction of the classifier.
Hyperplane
• A hyperplane is a decision plane which separates
between a set of objects having different class
memberships.
Definitions

• Margin
– A margin is a gap between the two lines on the
closest class points.
– This is calculated as the perpendicular distance
from the line to support vectors or closest
points.
– If the margin is larger in between the classes,
then it is considered a good margin, a smaller
margin is a bad margin.
How SVM works ?

• The main objective is to segregate the given dataset in the


best possible way.
• The distance between the either nearest points is known as
the margin.
• The objective is to select a hyperplane with the maximum
possible margin between support vectors in the given dataset.
SVM searches for the maximum marginal hyperplane in the
following steps:
• Generate hyperplanes which segregates the classes in the
best way.
• Select the right hyperplane with the maximum
segregation from the either nearest data points.
How SVM works ?
Non-linear and inseparable planes

• Some problems can’t be solved using linear


hyper-plane.
• In such situation, SVM uses a kernel trick to
transform the input space to a higher
dimensional space as shown on the right.
• The data points are plotted on the x-axis and z-
axis (Z is the squared sum of both x and y:
z=x^2=y^2).
• Now you can easily segregate these points using
linear separation.
Non-linear and inseparable planes
High Dimensional Space Mapping
High Dimensional Space Mapping
SVM Kernels

• The SVM algorithm is implemented in practice using a kernel.


A kernel transforms an input data space into the required
form.
• SVM uses a technique called the kernel trick. Here, the kernel
takes a low-dimensional input space and transforms it into a
higher dimensional space.
• In other words, you can say that it converts non-separable
problem to separable problems by adding more dimension to
it.
• It is most useful in non-linear separation problem. Kernel trick
helps you to build a more accurate classifier.

You might also like