AI Chapter 3 Part 3
AI Chapter 3 Part 3
Institute of Technology
University of Gondar
Biomedical Engineering Department
Outlines: -
» SVM
» Ensemble Methods
» Random Forest
» KNN
2
SVM—Support Vector Machines
3
Non-linear SVM
4
Transformed into linear hyperplane
5
SVM—History and Applications
» Features: training can be slow but accuracy is high owing to their ability
to model complex nonlinear decision boundaries (margin maximization)
» Applications:
6
SVM—General Philosophy
Support Vectors
7
SVM—When Data Is Linearly Separable
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated with the class labels
yi , each yi can take one of the two values (+1 or -1) corresponding to the classes buys-computer = yes
and buys-computer = no, respectively.
There are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one
that minimizes classification error on unseen data)
SVM
8 searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
Linear SVM: Separable Case
» A linear SVM is a classifier that searches for a hyperplane with the largest margin
✓Where xi = (xi1, xi2, …, xid)r corresponds the attributes sets for the ith example. By
convention, let yi ∈ −1,1 denotes its class label.
» The decision boundary of a linear classifier can be written in the following form
✓w.x+b =0
» The margin of the decision boundary is given by the distance between the two hyperplanes
» We can rescale the parameters w and b of the decision boundary so that the two parallel
hyperplanes bi1 and bi2 can be expressed as follows:
o bi1 : w.x+b = 1
o bi2 : w.x+b = -1
» The margin of the decision boundary is given by the distance between these two hyperplanes
» To compute the margin, let x1 be a data point located on bi1 and x2 be a data point located on
located on bi1
» By substituting these points in to one equation, the margin d can be computed by subtracting
the second equation from the first equation
Margin of a Linear Classifier
» By substituting these points in to one equation the margin d can be computed by subtracting the
second equation from the first equation
» The training phase of SVM involves estimating the parameters w and b of the decision boundary
from the training data.
» The parameters must be chosen in such a way that the following two conditions are met
Support Vector Machines
B1
B2
B2
b21
b22
margin
b11
b12
15
Linear SVM: Non separable Case
16
Nonlinear Support Vector Machines
17
Attribute Transformation
A nonlinear transformation Φ is needed to map the data from its original feature
space into a new space where the decision boundary becomes linear
Learning a Nonlinear SVM Model
» Φ(𝑥), to transform a given data set, after the transformation, we need to construct a
linear decision boundary that will separate the instances into their respective classes.
» The linear decision boundary in the transformed space has the following form:
o w. Φ 𝑥 + 𝑏 = 0
» The learning task for a nonlinear SVM can be formalized as the following optimization
problem
Ensemble Methods
20
Ensemble Methods
21
Ensemble Methods
22
Ensemble Methods
23
Ensemble Methods
24
Methods for Constructing an Ensemble Classifier
» In this approach, multiple training sets are created by resampling the original data
according to some sampling distribution.
» A classifier is then built from each training set using a particular learning algorithm.
» Bagging and Boosting are two examples of ensemble methods that manipulate their
training sets.
Methods for Constructing an Ensemble Classifier
» Random forest is an ensemble method that manipulates its input features and
uses decision trees as its base classifiers
Methods for Constructing an Ensemble Classifier
» Many learning algorithms can be manipulated in such a way that applying the
algorithm several times on the same training data may result in different models.
» These samples are similar since all drawn from the same original data, but they
are also slightly different due to chance.
30
Bagging
» Bagging uses bootstrap to generate n number of training sets then trains n base-
learners and then, during testing, takes an average.
In this random forest, two decision trees generate class B then the output become class B
Random Forest
» Random forests can be built using bagging in tandem with random selection of attributes
and samples of datasets.
» It combines the predictions made by multiple decision trees or base learners models, where
each tree is generated based on the values of an independent set of random vectors.
» During classification, each tree votes and the most popular class is returned.
» Random forests are comparable in accuracy to AdaBoost, yet are more robust to errors and
outliers
» For each tree grown on a bootstrap sample, the error rate for observations left out of the
bootstrap sample is called the out-of-bag error rate.
» Boosting is a process that uses a set of Machine Learning algorithms to combine weak learner to
form strong learners in order to increase the accuracy of the model.
» The basic principle behind the working of the boosting algorithms is to generate multiple weak
learner and combine their predictions to form one strong rule
Step 1: the base algorithms reads the data and assigns equal weight to each sample observation.
Step 2: False predictions are assigning to the next base learner with a higher weightage on these
incorrect prediction.
Step 3: Repeat step 2 until the algorithm can correctly classify the output
Types of Boosting
2. Gradient Boosting
XGBoost
k-Nearest Neighbor Classification (kNN)
» KNN stores all available cases and classified new cases based on a similarity
measure.
» Unlike all the previous learning methods, kNN does not build model from the
training data. Due to this called Lazy Learner.
» To classify a test instance d, define k-neighborhood.
» K in KNN is a parameter that refers to the number of nearest neighbors to include
in the majority of voting process
40
k-Nearest Neighbor Classification (kNN)
Unknown record
» We have data from the questionnaires survey (to ask people opinion) & objective testing with
two attributes (acid durability & strength) to classify whether a special paper tissue is good
or not. Here is four training samples.
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
» Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7.
o Without undertaking another expensive survey, guess the goodness of the new tissue? Use
squared Euclidean distance for similarity measurement and K=3
45
Solution
7 7 3 Yes Bad
7 4 4 No -
3 4 1 Yes Good
1 4 2 Yes Good
» Use simple majority of the category of nearest neighbors as the prediction value of the query
instance. We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that
pass laboratory test with X1 = 3 and X2 = 7 is included in Good category.
46
k-Nearest Neighbor Classification (kNN)
47
Assignment 2
1. Write python algorithm for SVM, Clustering and value based machine
learning methods.