Pattern Recognition (60014703-3)
Lecture 3
Classifiers
(Support Vector Machines, Decision Trees, Nearest
Neighbor Classification)
Instructor: Amany Al Luhaybi
Source: Bing Liu, UIC
What is Learning?
Herbert Simon: “Learning is any process by
which a system improves performance from
experience.”
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”
– Tom Mitchell
2
2
Learning
Learning is essential for unknown environments,
Learning is useful as a system construction
method,
i.e., expose the agent to reality rather than trying to
write it down
Learning modifies the agent's decision
mechanisms to improve performance
3
Supervised learning
Like human learning from past experiences.
A computer does not have “experiences”.
A computer system learns from data, which
represent some “past experiences” of an
application domain.
Our focus: learn a target function that can be used
to predict the values of a discrete class attribute
The task is commonly called: Supervised learning,
classification, or inductive learning.
4
The data and the goal
Data: A set of data records (also called
examples, instances or cases) described by
k attributes: A1, A2, … Ak.
a class: Each example is labelled with a pre-
defined class.
Goal: To learn a classification model from the
data that can be used to predict the classes
of new (future, or test) cases/instances.
5
An example: data (loan application)
Approved or not
6
An example: the learning task
Learn a classification model from the data
Use the model to classify future loan applications
into
Yes (approved) and
No (not approved)
What is the class for following case/instance?
7
Supervised vs. unsupervised
Learning
Supervised learning: classification is seen as
supervised learning from examples.
Supervision: The data (observations,
measurements, etc.) are labeled with pre-defined
classes. It is like that a “teacher” gives the classes
(supervision).
Test data are classified into these classes too.
Unsupervised learning (clustering)
Class labels of the data are unknown
Given a set of data, the task is to establish the
existence of classes or clusters in the data
8
Supervised learning process: two
steps
Learning (training): Learn a model using the
training data
Testing: Test the model using unseen test
data to assess the model accuracy
9
Fundamental assumption of learning
Assumption: The distribution of training
examples is identical to the distribution of test
examples (including future unseen examples).
In practice, this assumption is often violated to
certain degree.
Strong violations will clearly result in poor
classification accuracy.
To achieve good accuracy on the test data,
training examples must be sufficiently
representative of the test data.
10
Classification: Definition
In classification, we predict labels y (classes) for
inputs x
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the
values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set used to
build the model and test set used to validate it.
11
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
No
1 Yes Large 125K
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
12
Examples of Classification Task
Predicting tumor cells as benign or malignant
Classifying credit card transactions
as legitimate or fraudulent
Categorizing news stories as finance,
weather, entertainment, sports, etc
13
Issues: Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle
missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
14
Resources: Datasets
UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/
15
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
16