0% found this document useful (0 votes)

15 views49 pages

AI Chapter 3 Part 3

The document discusses different machine learning algorithms including support vector machines (SVM), ensemble methods, random forests, and k-nearest neighbors (KNN). SVM searches for the optimal separating hyperplane between classes, while ensemble methods like bagging and boosting combine multiple models to improve performance. Random forests build decision trees on randomly selected subsets of features and data.

Uploaded by

biruck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views49 pages

AI Chapter 3 Part 3

Uploaded by

biruck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Artificial Intelligence

Institute of Technology
University of Gondar
Biomedical Engineering Department

By Ewunate Assaye (MSc.)

Supervised Learning

Outlines: -
» SVM

» Ensemble Methods

» Random Forest

» KNN

2
SVM—Support Vector Machines

» Classification method for both linear and nonlinear data.

» Transform nonlinear training data into a higher dimension
» With the new dimension, searches for the linear optimal separating hyperplane
(i.e., “decision boundary”)
» SVM finds this hyperplane using support vectors (“essential” training tuples)
and margins (defined by the support vectors)

3
Non-linear SVM

4
Transformed into linear hyperplane

5
SVM—History and Applications

» Vapnik and colleagues (1992)—groundwork from Vapnik &

Chervonenkis’ statistical learning theory in 1960s

» Features: training can be slow but accuracy is high owing to their ability
to model complex nonlinear decision boundaries (margin maximization)

» Used for: classification(SVM) and numeric prediction(SVR)

» Applications:

o handwritten digit recognition, object recognition, speaker

identification, benchmarking time-series prediction tests

6
SVM—General Philosophy

Small Margin Large Margin

Support Vectors

7
SVM—When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated with the class labels
yi , each yi can take one of the two values (+1 or -1) corresponding to the classes buys-computer = yes
and buys-computer = no, respectively.

There are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one
that minimizes classification error on unseen data)

SVM
8 searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
Linear SVM: Separable Case

» A linear SVM is a classifier that searches for a hyperplane with the largest margin

Linear decision boundary

» Consider a binary classification problem consisting of N training examples

» For each example is denoted by a tuple (xi, yi)(i = 1,2,3, … , N)

✓Where xi = (xi1, xi2, …, xid)r corresponds the attributes sets for the ith example. By
convention, let yi ∈ −1,1 denotes its class label.

» The decision boundary of a linear classifier can be written in the following form

✓w.x+b =0

» where w and b are parameters of the model

Linear SVM

» A two-dimensional training set

consisting of squares and circles.
» A decision boundary that bisects the
training examples into their respective
classes is illustrated with a solid line
» If we label all the squares as class +1 and
all the circles as class -1, then we can
predict the class label y for any test
example z in the following way:
10
Margin of a Linear Classiﬁer

» The margin of the decision boundary is given by the distance between the two hyperplanes

» We can rescale the parameters w and b of the decision boundary so that the two parallel
hyperplanes bi1 and bi2 can be expressed as follows:
o bi1 : w.x+b = 1

o bi2 : w.x+b = -1

» The margin of the decision boundary is given by the distance between these two hyperplanes

» To compute the margin, let x1 be a data point located on bi1 and x2 be a data point located on
located on bi1

» By substituting these points in to one equation, the margin d can be computed by subtracting
the second equation from the first equation
Margin of a Linear Classiﬁer

» By substituting these points in to one equation the margin d can be computed by subtracting the
second equation from the first equation

Learning a linear SVM model

» The training phase of SVM involves estimating the parameters w and b of the decision boundary
from the training data.

» The parameters must be chosen in such a way that the following two conditions are met
Support Vector Machines
B1

» Which one is better? B1 or B2?

» How do you define better?
13
Support Vector Machines
B1

b21
b22

margin
b11

b12

» Find hyperplane maximizes the margin => B1 is better than B2

14
Linear SVM: Nonseparable Case

» What if the problem is not linearly separable?

15
Linear SVM: Non separable Case

» What if the problem is not linearly separable?

o Introduce slack variables
✓ Need to minimize:  2
|| w ||  N k
L( w) = + C   i 
2  i =1 
✓ Subject to:  
1 if w • x i + b  1 - i
yi =   
− 1 if w • x i + b  −1 + i
✓ If k is 1 or 2, this leads to same objective function as linear SVM but
with different constraints

16
Nonlinear Support Vector Machines

» What if decision boundary is not linear?

o The data set is generated in such a way that

all the circles are clustered near the center
of the diagram and all the squares are
distributed farther away from the center.
o Instances of the data set can be classiﬁed
using the following equation

17
Attribute Transformation

A nonlinear transformation Φ is needed to map the data from its original feature
space into a new space where the decision boundary becomes linear
Learning a Nonlinear SVM Model

» Φ(𝑥), to transform a given data set, after the transformation, we need to construct a
linear decision boundary that will separate the instances into their respective classes.

» The linear decision boundary in the transformed space has the following form:

o w. Φ 𝑥 + 𝑏 = 0

» The learning task for a nonlinear SVM can be formalized as the following optimization
problem
Ensemble Methods

20
Ensemble Methods

21
Ensemble Methods

22
Ensemble Methods

23
Ensemble Methods

24
Methods for Constructing an Ensemble Classiﬁer

» The ensemble of classiﬁers can be constructed in many ways:

By manipulating the training set:

» In this approach, multiple training sets are created by resampling the original data
according to some sampling distribution.

» A classiﬁer is then built from each training set using a particular learning algorithm.

» Bagging and Boosting are two examples of ensemble methods that manipulate their
training sets.
Methods for Constructing an Ensemble Classiﬁer

By manipulating the input features:

» A subset of input features is chosen to form each training set.

» The subset can be either chosen randomly or based on the recommendation of

domain experts.

» Random forest is an ensemble method that manipulates its input features and
uses decision trees as its base classiﬁers
Methods for Constructing an Ensemble Classiﬁer

By manipulating the learning algorithm

» Many learning algorithms can be manipulated in such a way that applying the
algorithm several times on the same training data may result in diﬀerent models.

» For example, an artiﬁcial neural network can produce diﬀerent models by

changing its network topology or the initial weights of the links between neurons.
Bagging (Bootstrap Aggregation)
Bagging

» It is a technique that repeatedly samples (with replacement) from a data set.

» These samples are similar since all drawn from the same original data, but they
are also slightly diﬀerent due to chance.

» A learning algorithm is an unstable algorithm if small changes in the training

set causes a large diﬀerence in the generated learner, namely, the learning
algorithm has high variance

» Bagging improves generalization error by reducing the variance of the base

classiﬁers.
29
Bagging

» Assume that we have a training set:

» We generate, say, B = 3 datasets by bootstrapping:

30
Bagging

» The performance of bagging depends on the stability of the base classiﬁer

» Bagging uses bootstrap to generate n number of training sets then trains n base-
learners and then, during testing, takes an average.

Fit classification or regression models

to bootstrap samples from the data and
combine by voting (classification)
Or
averaging (regression).
Random Forest
Random Forest

In this random forest, two decision trees generate class B then the output become class B
Random Forest

» Random forests can be built using bagging in tandem with random selection of attributes
and samples of datasets.

» It combines the predictions made by multiple decision trees or base learners models, where
each tree is generated based on the values of an independent set of random vectors.

» During classiﬁcation, each tree votes and the most popular class is returned.

» Random forests are comparable in accuracy to AdaBoost, yet are more robust to errors and
outliers

» For each tree grown on a bootstrap sample, the error rate for observations left out of the
bootstrap sample is called the out-of-bag error rate.

» Overﬁtting is not a problem

Random Forest

Random forest for Spam

classification
Boosting

» Boosting is a process that uses a set of Machine Learning algorithms to combine weak learner to
form strong learners in order to increase the accuracy of the model.

How does Boosting algorithms work?

» The basic principle behind the working of the boosting algorithms is to generate multiple weak
learner and combine their predictions to form one strong rule

Step 1: the base algorithms reads the data and assigns equal weight to each sample observation.

Step 2: False predictions are assigning to the next base learner with a higher weightage on these
incorrect prediction.

Step 3: Repeat step 2 until the algorithm can correctly classify the output
Types of Boosting

1. Adaptive Boosting(Ada Boost)

o Which is similar the previous boosting concepts
Type of Boosting

2. Gradient Boosting
XGBoost
k-Nearest Neighbor Classification (kNN)

» KNN stores all available cases and classified new cases based on a similarity
measure.
» Unlike all the previous learning methods, kNN does not build model from the
training data. Due to this called Lazy Learner.
» To classify a test instance d, define k-neighborhood.
» K in KNN is a parameter that refers to the number of nearest neighbors to include
in the majority of voting process

40
k-Nearest Neighbor Classification (kNN)

Unknown record

l Requires three things

– The set of labeled records
– Distance Metric to compute distance
between records
– The value of k, the number of nearest
neighbors to retrieve
l To classify an unknown record:
– Compute distance to other training records
– Identify k nearest neighbors
– Use class labels of nearest neighbors to
determine the class label of unknown record
(e.g., by taking majority vote)
How do we choose K?
When do we use KNN Algorithms?
How does KNN Algorithm Works?
Example

» We have data from the questionnaires survey (to ask people opinion) & objective testing with
two attributes (acid durability & strength) to classify whether a special paper tissue is good
or not. Here is four training samples.

X1 = Acid Durability (seconds) X2 = Strength (kg/m2) Y = Classification

7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
» Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7.

o Without undertaking another expensive survey, guess the goodness of the new tissue? Use
squared Euclidean distance for similarity measurement and K=3
45
Solution

X1 = Acid X2 = Square Distance to Rank Is it Y=

Durability Strength query instance (3, 7) minimum included in Category of
(seconds) (kg/m2) distance 3-NNs? NN

7 7 3 Yes Bad

7 4 4 No -

3 4 1 Yes Good

1 4 2 Yes Good

» Use simple majority of the category of nearest neighbors as the prediction value of the query
instance. We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that
pass laboratory test with X1 = 3 and X2 = 7 is included in Good category.
46
k-Nearest Neighbor Classification (kNN)

» kNN can deal with complex and arbitrary decision boundaries.

» Despite its simplicity, researchers have shown that the
classification accuracy of kNN can be quite strong and in
many cases as accurate as those elaborated methods.
» kNN is slow at the classification time
» kNN does not produce an understandable model

47
Assignment 2

1. Write python algorithm for SVM, Clustering and value based machine
learning methods.

Submit via [email protected] before July 12 2022

Quiz 3

Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Quiz 1 On Wednesday
No ratings yet
Quiz 1 On Wednesday
46 pages
Algorithm of Neural Network M4
No ratings yet
Algorithm of Neural Network M4
25 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
3.unit 3 ML Part-1 Q&A
No ratings yet
3.unit 3 ML Part-1 Q&A
39 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
SVM7
No ratings yet
SVM7
53 pages
Machine Learning Crash Course: Computer Vision James Hays
No ratings yet
Machine Learning Crash Course: Computer Vision James Hays
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Lecture 8
No ratings yet
Lecture 8
19 pages
SVM Unit3
No ratings yet
SVM Unit3
23 pages
Write Up
No ratings yet
Write Up
12 pages
Experiment # 10
No ratings yet
Experiment # 10
10 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Module 3
No ratings yet
Module 3
79 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Generative AI: - Lecture-1
100% (1)
Generative AI: - Lecture-1
21 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
Lecture 02 Supervised Learning 27102022 124322am
No ratings yet
Lecture 02 Supervised Learning 27102022 124322am
29 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
9 pages
SVM Algorithm
No ratings yet
SVM Algorithm
17 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
SVM
No ratings yet
SVM
11 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
ML Module Ii
No ratings yet
ML Module Ii
24 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
CH 5 SVM
No ratings yet
CH 5 SVM
25 pages
Deep Dive Pytorch
No ratings yet
Deep Dive Pytorch
986 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
ANN Question Paper 2022
No ratings yet
ANN Question Paper 2022
4 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Lecture 3
No ratings yet
Lecture 3
68 pages
Computer Network
No ratings yet
Computer Network
10 pages
ANN Self Slides
No ratings yet
ANN Self Slides
59 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
Jon Krohn Metis Deep Learning 2017-05-01
No ratings yet
Jon Krohn Metis Deep Learning 2017-05-01
107 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Ann 2023-2023
No ratings yet
Ann 2023-2023
3 pages
Splnproc 1703
No ratings yet
Splnproc 1703
12 pages
What Is VGG
No ratings yet
What Is VGG
3 pages
Fees Structure Aiml 2025
No ratings yet
Fees Structure Aiml 2025
4 pages
【更有效的掩码模型】Architecture-Agnostic Masked Image Modeling - From ViT Back to CNN
No ratings yet
【更有效的掩码模型】Architecture-Agnostic Masked Image Modeling - From ViT Back to CNN
19 pages
Unit 3 DLT
No ratings yet
Unit 3 DLT
10 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
Module 1
No ratings yet
Module 1
66 pages
Proyek Akhir
No ratings yet
Proyek Akhir
13 pages
Practice Lecture4
No ratings yet
Practice Lecture4
3 pages
Adaboost Algorithm
No ratings yet
Adaboost Algorithm
17 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
3 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

AI Chapter 3 Part 3

Uploaded by

AI Chapter 3 Part 3

Uploaded by

Artificial Intelligence

By Ewunate Assaye (MSc.)

» Classification method for both linear and nonlinear data.

» Vapnik and colleagues (1992)—groundwork from Vapnik &

» Used for: classification(SVM) and numeric prediction(SVR)

o handwritten digit recognition, object recognition, speaker

Small Margin Large Margin

Linear decision boundary

» Consider a binary classification problem consisting of N training examples

» where w and b are parameters of the model

» A two-dimensional training set

Learning a linear SVM model

» Which one is better? B1 or B2?

» Find hyperplane maximizes the margin => B1 is better than B2

» What if the problem is not linearly separable?

» What if the problem is not linearly separable?

» What if decision boundary is not linear?

o The data set is generated in such a way that

» The ensemble of classiﬁers can be constructed in many ways:

By manipulating the training set:

By manipulating the input features:

» A subset of input features is chosen to form each training set.

» The subset can be either chosen randomly or based on the recommendation of

By manipulating the learning algorithm

» For example, an artiﬁcial neural network can produce diﬀerent models by

» It is a technique that repeatedly samples (with replacement) from a data set.

» A learning algorithm is an unstable algorithm if small changes in the training

» Bagging improves generalization error by reducing the variance of the base

» Assume that we have a training set:

» We generate, say, B = 3 datasets by bootstrapping:

» The performance of bagging depends on the stability of the base classiﬁer

Fit classification or regression models

» Overﬁtting is not a problem

Random forest for Spam

How does Boosting algorithms work?

1. Adaptive Boosting(Ada Boost)

l Requires three things

X1 = Acid Durability (seconds) X2 = Strength (kg/m2) Y = Classification

X1 = Acid X2 = Square Distance to Rank Is it Y=

» kNN can deal with complex and arbitrary decision boundaries.

Submit via [email protected] before July 12 2022

You might also like