Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)

Pattern Recognition (60014703-3)

Lecture 3

(Support Vector Machines, Decision Trees, Nearest
Neighbor Classification)

Instructor: Amany Al Luhaybi

Source: Bing Liu, UIC

What is Learning?
 Herbert Simon: “Learning is any process by
which a system improves performance from

 “A computer program is said to learn from

experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”
– Tom Mitchell

 Learning is essential for unknown environments,

 Learning is useful as a system construction

 i.e., expose the agent to reality rather than trying to
write it down

 Learning modifies the agent's decision

mechanisms to improve performance

Supervised learning
 Like human learning from past experiences.
 A computer does not have “experiences”.
 A computer system learns from data, which
represent some “past experiences” of an
application domain.
 Our focus: learn a target function that can be used
to predict the values of a discrete class attribute
 The task is commonly called: Supervised learning,
classification, or inductive learning.

The data and the goal
 Data: A set of data records (also called
examples, instances or cases) described by
 k attributes: A1, A2, … Ak.
 a class: Each example is labelled with a pre-
defined class.
 Goal: To learn a classification model from the
data that can be used to predict the classes
of new (future, or test) cases/instances.

An example: data (loan application)
Approved or not

An example: the learning task
 Learn a classification model from the data
 Use the model to classify future loan applications
 Yes (approved) and
 No (not approved)
 What is the class for following case/instance?

Supervised vs. unsupervised
 Supervised learning: classification is seen as
supervised learning from examples.
 Supervision: The data (observations,
measurements, etc.) are labeled with pre-defined
classes. It is like that a “teacher” gives the classes
 Test data are classified into these classes too.
 Unsupervised learning (clustering)
 Class labels of the data are unknown
 Given a set of data, the task is to establish the
existence of classes or clusters in the data

Supervised learning process: two
 Learning (training): Learn a model using the

training data
 Testing: Test the model using unseen test
data to assess the model accuracy

Fundamental assumption of learning
Assumption: The distribution of training
examples is identical to the distribution of test
examples (including future unseen examples).

 In practice, this assumption is often violated to

certain degree.
 Strong violations will clearly result in poor
classification accuracy.
 To achieve good accuracy on the test data,
training examples must be sufficiently
representative of the test data.
Classification: Definition
 In classification, we predict labels y (classes) for
inputs x
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is the class.
 Find a model for class attribute as a function of the
values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set used to
build the model and test set used to validate it.

Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes

Training Set
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?

Test Set

Examples of Classification Task
 Predicting tumor cells as benign or malignant

 Classifying credit card transactions

as legitimate or fraudulent

 Categorizing news stories as finance,

weather, entertainment, sports, etc

Issues: Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes

Resources: Datasets

 UCI Repository:
 UCI KDD Archive:
 Statlib:
 Delve:

Classification Techniques

 Decision Tree based Methods

 Rule-based Methods
 Memory based reasoning
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines


