0% found this document useful (0 votes)
16 views51 pages

Lecture 3

Support Vector Machines is a supervised machine learning algorithm used for classification and regression tasks. It performs classification by transforming data into a higher dimension and finding the optimal hyperplane that separates classes with the maximum margin. The objective is to maximize the margin between different classes to improve classification accuracy.

Uploaded by

Ratul- CSE-NUBTK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views51 pages

Lecture 3

Support Vector Machines is a supervised machine learning algorithm used for classification and regression tasks. It performs classification by transforming data into a higher dimension and finding the optimal hyperplane that separates classes with the maximum margin. The objective is to maximize the margin between different classes to improve classification accuracy.

Uploaded by

Ratul- CSE-NUBTK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Supervised Learning

Shahadat Hoshen
Lecturer,
Dept. of CSE, NUBTK
Support Vector Machines
 Support Vector Machines, a supervised machine learning
algorithm used for classification and regression tasks and
effective in dealing with complex, high-dimensional
datasets.
 SVM performs classification by first transforming the
training records into higher dimensions. Then, within the
new dimension, it searches for the best decision boundary
that separates the training records of one class from others.
 The main objective of SVMs is to find the best hyperplane
that separates the data points of different classes in a way
that maximizes the margin, or the distance, between the
classes. This hyperplane serves as the decision boundary
for classifying new, unseen data points.
Support Vector Machines

 Maximum marginal hyperplane is calculated from the


records that fall in the hyperplane pair (called “Support
Vectors”) to be used as the decision boundary.
Support Vector Machines

 From the Figure, we see that the records distributed between


two class values are linearly separable; however, there can be
an infinite number of lines separating them.
 The target of an SVM is to find the best line (best decision
boundary) that will help minimize classification error for
unseen records.
Non-linear SVM
 Non-linear SVMs extend the concept of SVMs to
handle non-linear decision boundaries by mapping the
input features into a higher-dimensional space.
 This is done through a process called the kernel trick.
 The kernel trick allows the algorithm to compute the

dot product of the data points in this higher-


dimensional space without explicitly calculating the
transformation, saving computational resources.
 The choice of kernel function determines the shape of

the decision boundary.


Different types of kernel
1. Linear Kernel
2. Polynomial Kernel
3. Radial Basis Function (RBF) or Gaussian Kernel

Study in details
Linear vs Non-linear SVM
When we can easily separate data with hyperplane by
drawing a straight line is Linear SVM. When we cannot
separate data with a straight line we use Non–Linear
SVM.
Linear vs Non-linear SVM

Linear SVM Non-Linear SVM

It can be easily separated with a It cannot be easily separated


linear line. with a linear line.

We use Kernels to make non-


Data is classified with the help
separable data into separable
of hyperplane.
data.

Data can be easily classified by We map data into high


drawing a straight line. dimensional space to classify.
Problem:1(Linear SVM)
Here are the data point:

Find out the appropriate decision boundary


among these data.
Solution:
Solution

Similarly,
S2 = ,S2 = S3 = ,S3 =

~
~
Linear SVM Solve
¿ 𝛼 1~
𝑠1 ⋅ ~
𝑠 1+ 𝛼2 ~
𝑠 2 ⋅~
𝑠 1+ 𝛼3 ~
𝑠 3 ⋅~
𝑠 1=− 1
¿ 𝛼 1~
𝑠1 ⋅ ~
𝑠 2+ 𝛼2 ~
𝑠2 ⋅ ~
𝑠 2+ 𝛼3 ~
𝑠3 ⋅ ~
𝑠 2=+ 1
¿ 𝛼1 ~
𝑠 1 ⋅~
𝑠3 + 𝛼2 ~
𝑠 2 ⋅~
𝑠 3 +𝛼3 ~
𝑠3 ⋅ ~
𝑠3 =+1

( )( ) ( )( ) ( )( )
1 1 3 1 3 1
𝛼1 0 0 +𝛼 2 1 0 +𝛼 3 − 1 0 =− 1
1 1 1 1 1 1

( )( ) ( )( ) ( )( )
1 3 3 3 3 3
¿ 𝛼1 0 1 + 𝛼2 1 1 +𝛼 3 − 1 1 =1
1 1 1 1 1 1

( )( ) ( )( ) ( )( )
¿1 ¿3 ¿3 ¿3 ¿3 ¿3
𝛼1 ¿0 ¿ − 1 +𝛼 2 ¿ 1 ¿ −1 +𝛼 3 ¿ − 1 ¿ − 1 = 1
¿1 ¿1 ¿1 ¿1 ¿1 ¿1
Linear SVM Solve
¿ 𝛼 1 ( 1+0 +1 ) +𝛼 2 ( 3 +0+ 1 )+ 𝛼3 ( 3+ 0+1 ) =−1
¿ 𝛼 1 ( 3 +0+ 1 ) +𝛼2 ( 9+1+ 1 )+ 𝛼3 ( 9 −1+ 1 )=1
¿ 𝛼 1 ( 3 +0+ 1 ) +𝛼2 ( 9 −1+1 ) + 𝛼3 ( 9+1+ 1 )=1
¿ 2 𝛼1 + 4 𝛼 2 +4 𝛼3 =−1
¿ 4 𝛼 1+11 𝛼2 +9 𝛼 3=1
¿ 4 𝛼 1+ 9 𝛼2 +11 𝛼 3=1
¿ 𝛼1=− 3.5
¿ 𝛼2= 0.75
¿ 𝛼 3= 0.75
Linear SVM Solve
𝑤∧¿ ∑ ❑𝛼 𝑖 ~
~ 𝑠𝑖
𝑖

() () ( )
¿1 ¿3 ¿3
¿=− 3.5 ¿ 0 +0.75 ¿ 1 + 0.75 ¿ −1
¿1 ¿1 ¿1

( )
¿1
¿= ¿ 0
¿ −2

W= , b= -2
Linear SVM Solve

Problem: 2
Here are some data point:

Find out the appropriate decision boundary among these data.


Ensemble learning
 Ensemble learning refers to the technique of combining
the predictions of multiple models to improve overall
performance and generalization.
 The idea is that by aggregating the predictions of diverse
models, the ensemble can often achieve better results
than any individual model.
 Ensemble methods are commonly used in various
machine learning tasks, including classification,
regression, and anomaly detection.
 There are two main types of ensemble learning: bagging
and boosting.
Ensemble learning
 Bagging (bootstrap aggregating)
Creating a different training subset from sample training data
with replacement is called Bagging. The final output is based on
majority voting.
 Boosting
Combining weak learners into strong learners by creating
sequential models such that the final model has the highest
accuracy is called Boosting. Example: ADA BOOST, XG
BOOST.
Random Forest
 Random Forest is an ensemble learning algorithm that combines
the predictions of multiple decision trees to improve the overall
accuracy and can handle complex problems.
 It contains a number of decision trees on various subsets of the
given dataset and takes the average to improve the predictive
accuracy of that dataset.
 Rather than depending on one tree it takes the prediction from
each tree and based on the majority votes of predictions,
predicts the final output. For regression, it's the average of the
predictions.
 It uses bagging methods.
Random Forest

HW: Working Procedure


Regression
 Regression determines the statistical relationship between a
dependent variable and one or more independent variables
which are used to predict real or continuous values.
 The ultimate goal of the regression algorithm is to plot a
best-fit line or a curve between the data.
 There are several types of regression. Linear Regression,
Multiple Linear Regression, and Polynomial Regression.
 In simple words, "Regression shows a line or curve that
passes through all the data points on target-predictor graph
in such a way that the vertical distance between the data
points and the regression line is minimum."
 The distance between data points and line tells whether a
model has captured a strong relationship or not.
Self-study: Classification vs Regression
Regression
 Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the
corresponding sales:

 Now, the company wants to do an advertisement for $200 in the year


2019 and wants to know the prediction about the sales for this year. So to
solve such type of prediction problems in machine learning, we need
regression analysis.
Application of regression
• Forecasting continuous outcomes like house prices,
stock prices, or sales.
• Predicting the success of future retail sales or marketing
campaigns to ensure resources are used effectively.
• Predicting customer or user trends, such as on streaming
services or e-commerce websites.
• Analyzing datasets to establish the relationships between
variables and output.
• Predicting interest rates or stock prices from a variety of
factors.
• Creating time series visualizations.
Simple Linear Regression
 Linear regression finds the linear relationship between
the dependent variable and one independent variable
using a best-fit straight line.
 Generally, a linear model predicts by simply computing

a weighted sum of the input features, plus a constant


called the bias term (also called the intercept term).
 In this technique, the dependent variable is continuous,

the independent variable(s) can be continuous or


discrete, and the nature of the regression line is linear.
Example-1
Consider the following table consisting of the five
weeks’ sales data.
Apply linear regression technique to predict sales of
the 7th and 12th week.
Linear Equation is:
Example-2

By using linear regression, predict the glucose


level for age 55.
Multiple Linear regression
• Multiple linear regression refers to a technique that uses two or
more independent variables to predict the outcome of a
dependent variable.
• It achieves a better fit in the comparison to simple linear
regression when multiple independent variables are involved.
• The equation for multiple linear regression is similar to the
equation for a simple linear equation, i.e., y(x) = p0 +
p1x1 plus the additional weights and inputs for the different
features which are represented by bnxn.
• The formula for multiple linear regression would look like,
y(x) = b0 + b1x1 + b2x2 + … + bnxn
Multiple Linear regression
• The formula for multiple linear regression would look
like,
y(x) = b0 + b1x1 + b2x2 + … + bnxn
Where,
Multiple Linear regression: Example
Suppose we have the following dataset containing the height,
width and price of carpets, Predict the price of the carpet whose
height 65 and width 23 using multiple linear regression.
Length (x1) Width (x2) Price (y)
60 22 140
62 25 155
67 24 159
70 20 179
71 15 192
72 14 200
75 14 212
78 11 215
Multiple Linear regression: Example
Multiple Linear regression: Solution
Multiple Linear regression: Solution

So, for height 65 and width 23the price will be,


= -6.867 + (3.14 x 65) – (1.65 x 23) = 159.665
Logistic Regression
 Logistic regression is a supervised machine learning
algorithm mainly used for classification tasks where the
goal is to predict the probability that an instance belongs to
a given class or not.
 It is a statistical method that is used for building machine
learning models where the dependent variable is
dichotomous: i.e. binary which works with categorical
variables such as 0 or 1, Yes or No, True or False, Spam or
not spam, etc.
 It has the ability to provide probabilities and classify new

data using continuous and discrete datasets.


Logistic Regression
 Logistic regression uses the sigmoid function which is a
mathematical function used to map the predicted values to
probabilities.
 It maps any real value into another value within a range of 0

and 1 and forms a curve like the "S" form that curve is
called the Sigmoid function or the logistic function.
 Probability is either 0 or 1, depending on whether the event

happens or not.
 For binary predictions, you can divide the population into

two groups with a cut-off of 0.5. If the hypothesis is above


0.5 is considered to belong to group A, and below is
considered to belong to Group B.
sigmoid function:
Type of Logistic Regression
On the basis of the categories, Logistic Regression can be
classified into three types:
1. Binomial: In binomial Logistic regression, there can be only
two possible types of dependent variables, such as 0 or 1,
Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can
be 3 or more possible unordered types of the dependent
variable, such as “cat”, “dogs”, or “sheep”. In this case, the
softmax function is used in place of the sigmoid function.
3. Ordinal:
In ordinal Logistic regression, there can be 3 or
more possible ordered types of dependent variables, such as
“low”, “Medium”, or “High”.
Application of LR
 In health care, logistic regression can be used to predict if
a tumor is likely to be benign or malignant.
 In the financial industry, logistic regression can be used

to predict if a transaction is fraudulent or not.


 In marketing, logistic regression can be used to predict if

a targeted audience will respond or not.


Example-1
Example-2
 Here 5 students’ data is given, you know the number of hours they
studied (x) and whether they passed (1) or failed (0) the exam (y).
Hours Studied (x) Pass/Fail (y)
2 0
3 0
4 0
5 1
6 1
 Use logistic Regression to predict whether a student who studies 10 hours
will pass or fail.
Optimization algorithm
Optimization algorithms are methods or procedures used to
find the best solution from all feasible solutions.
 Genetic algorithm
 Particle Swarm Optimization (PSO)
 Gradient Descent
 Stochastic Gradient Descent
 Simulated Annealing
Genetic-algorithm
 A genetic algorithm is a search heuristic that is inspired
by Charles Darwin’s theory of natural evolution.
 This algorithm reflects the process of natural selection
where the fittest individuals are selected for
reproduction in order to produce offspring for the next
generation.
 Genetic algorithms are commonly used to generate
high-quality solutions for optimization problems and
search problems, and image processing.
Genetic-algorithm
Five phases are considered in a genetic algorithm.
I. Initial population
II. Fitness function
III. Selection
IV. Crossover
V. Mutation
Initial Population
 The process begins with a set of individuals which is
called a Population. Each individual is a solution to the
problem you want to solve.
 An individual is characterized by a set of parameters

(variables) known as Genes. Genes are joined into a string


to form a Chromosome (solution).
Fitness Function
 The fitness function determines how fit an individual
is (the ability of an individual to compete with other
individuals). It gives a fitness score to each individual.
The probability that an individual will be selected for
reproduction is based on its fitness score.
Selection
 The idea of selection phase is to select the fittest
individuals and let them pass their genes to the next
generation.
 Two pairs of individuals (parents) are selected based

on their fitness scores. Individuals with high fitness


have more chances to be selected for reproduction.
Crossover
 Crossover is the most significant phase in a genetic
algorithm. For each pair of parents to be mated,
a crossover point is chosen at random from within the
genes.
Mutation
 In certain new offspring formed, some of their genes

can be subjected to a mutation with a low random


probability. This implies that some of the bits in the
bit string can be flipped.
Thank You

You might also like