0% found this document useful (0 votes)
12 views

Comp Vis Week 3

The document discusses classification in machine learning. It defines classification, describes different types of classifiers, and gives examples of classification tasks like spam filtering and digit recognition. It also covers important topics like representation of objects, data types, training and validation of classifiers, and some commonly used state-of-the-art classifiers.

Uploaded by

MAJD ABDALLAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Comp Vis Week 3

The document discusses classification in machine learning. It defines classification, describes different types of classifiers, and gives examples of classification tasks like spam filtering and digit recognition. It also covers important topics like representation of objects, data types, training and validation of classifiers, and some commonly used state-of-the-art classifiers.

Uploaded by

MAJD ABDALLAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Learning Based Vision - II

Fundamentals of Computer Vision – Week 3


Assist. Prof. Özge Öztimur Karadağ
ALKÜ – Department of Computer Engineering
Alanya
Last Week
• A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P , improves with experience E.
• Task: regression, classification…
• Experience: dataset, Learning: Supervised vs. Unsupervised
• Performance: Mean Squared Error…
• Python example on regression
Today
• Definition of classification
• Types of classifiers
• Sample examples
Classification
• In classification problems, each entity in some domain can be placed in one
of a discrete set of categories: yes/no, friend/foe, good/bad/indifferent,
blue/red/green, etc.
• Given a training set of labeled entities, develop a rule for assigning labels to
entities in a test set
• Many variations on this theme:
• binary classification
• multi-category classification
• non-exclusive categories
• ranking
• Many criteria to assess rules and their predictions
• overall errors
• costs associated with different kinds of errors
• operating points
Representation of Objects
• Each object to be classified is represented as a pair
(x, y):
• where x is a description of the object (see examples of
data types in the following slides)
• where y is a label (assumed binary for now)
• Success or failure of a machine learning classifier
often depends on choosing good descriptions of
objects
• the choice of description can also be viewed as a learning
problem, but good human intuitions are often needed
here
Data Types
• Vectorial data:
• physical attributes
• behavioral attributes
• context
• history
• etc
Data Types
• text and hypertext

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">


<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Welcome to FairmontNET</title>
</head>
<STYLE type="text/css">
.stdtext {font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px; color: #1F3D4E;}
.stdtext_wh {font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px; color: WHITE;}
</STYLE>

<body leftmargin="0" topmargin="0" marginwidth="0" marginheight="0" bgcolor="BLACK">


<TABLE cellpadding="0" cellspacing="0" width="100%" border="0">
<TR>
<TD width=50% background="/TFN/en/CDA/Images/common/labels/decorative_2px_blk.gif">&nbsp;</TD>
<TD><img src="/TFN/en/CDA/Images/common/labels/decorative.gif"></td>
<TD width=50% background="/TFN/en/CDA/Images/common/labels/decorative_2px_blk.gif">&nbsp;</TD>
</TR>
</TABLE>
<tr>
<td align="right" valign="middle"><IMG src="/TFN/en/CDA/Images/common/labels/centrino_logo_blk.gif"></td>
</tr>
</body>
</html>
Data Types
• email
Return-path <[email protected]>Received from relay2.EECS.Berkeley.EDU
(relay2.EECS.Berkeley.EDU [169.229.60.28]) by imap4.CS.Berkeley.EDU (iPlanet Messaging Server
5.2 HotFix 1.16 (built May 14 2003)) with ESMTP id <[email protected]>;
Tue, 08 Jun 2004 11:40:43 -0700 (PDT)Received from relay3.EECS.Berkeley.EDU (localhost
[127.0.0.1]) by relay2.EECS.Berkeley.EDU (8.12.10/8.9.3) with ESMTP id i58Ieg3N000927; Tue, 08
Jun 2004 11:40:43 -0700 (PDT)Received from redbirds (dhcp-168-35.EECS.Berkeley.EDU
[128.32.168.35]) by relay3.EECS.Berkeley.EDU (8.12.10/8.9.3) with ESMTP id i58IegFp007613;
Tue, 08 Jun 2004 11:40:42 -0700 (PDT)Date Tue, 08 Jun 2004 11:40:42 -0700From Robert Miller
<[email protected]>Subject RE: SLT headcount = 25In-reply-
to <[email protected]>To 'Randy Katz'
<[email protected]>Cc "'Glenda J. Smith'" <[email protected]>, 'Gert Lanckriet'
<[email protected]>Message-
id <[email protected]>MIME-version 1.0X-
MIMEOLE Produced By Microsoft MimeOLE V6.00.2800.1409X-Mailer Microsoft Office Outlook, Build
11.0.5510Content-type multipart/alternative; boundary="----
=_NextPart_000_0033_01C44D4D.6DD93AF0"Thread-
index AcRMtQRp+R26lVFaRiuz4BfImikTRAA0wf3Qthe headcount is now 32. ---------------------------
------------- Robert Miller, Administrative Specialist University of California, Berkeley Electronics
Research Lab 634 Soda Hall #1776 Berkeley, CA 94720-1776 Phone: 510-642-6037 fax: 510-
643-1289
Data Types
• protein sequences
Data Types
• sequences of Unix system calls
Data Types
• network layout: graph
Data Types
• images
Example: Spam Filter
Dear Sir.
• Input: email
• Output: spam/ham First, I must solicit your confidence in this
transaction, this is by virture of its nature
• Setup: as being utterly confidencial and top
• Get a large collection of secret. …
example emails, each labeled
“spam” or “ham”
TO BE REMOVED FROM FUTURE
• Note: someone has to hand
label all this data MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
• Want to learn to predict labels SUBJECT.
of new, future emails
99 MILLION EMAIL ADDRESSES
• Features: The attributes used to FOR ONLY $99
make the ham / spam decision
• Words: FREE! Ok, Iknow this is blatantly OT but I'm
• Text Patterns: $dd, CAPS beginning to go insane. Had an old Dell
• Non-text: SenderInContacts Dimension XPS sitting in the corner and
• … decided to put it to use, I know it was
working pre being stuck in the corner, but
when I plugged it in, hit the power nothing
happened.
Example: Digit Recognition
• Input: images / pixel grids 0
• Output: a digit 0-9
• Setup:
• Get a large collection of example images, each 1
labeled with a digit
• Note: someone has to hand label all this data
• Want to learn to predict labels of new, future digit
images 2

• Features: The attributes used to make the digit decision


• Pixels: (6,8)=ON
• Shape Patterns: NumComponents, AspectRatio, 1
NumLoops
• …
• Current state-of-the-art: Human-level performance ??
Other Examples of Real-World Classification Tasks
• Fraud detection (input: account activity, classes: fraud / no fraud)
• Web page spam detection (input: HTML/rendered page, classes: spam /
ham)
• Speech recognition and speaker recognition (input: waveform, classes:
phonemes or words)
• Medical diagnosis (input: symptoms, classes: diseases)
• Automatic essay grader (input: document, classes: grades)
• Customer service email routing and foldering
• Link prediction in social networks
• Catalytic activity in drug design
• … many many more

• Classification is an important commercial technology


Training and Validation
• Data: labeled instances, e.g. emails marked spam/ham
• Training set
• Validation set
• Test set

Training
• Training
• Estimate parameters on training set Data
• Tune hyperparameters on validation set
• Report results on test set
• Anything short of this yields over-optimistic claims

• Evaluation
• Many different metrics
• Ideally, the criteria used to train the classifier should be closely Validation
related to those used to evaluate the classifier Data
• Statistical issues
• Want a classifier which does well on test data
• Overfitting: fitting the training data very closely, but not generalizing Test
well Data
• Error bars: want realistic (conservative) estimates of accuracy
Some State-of-the-art Classifiers
• Support vector machine
• Decision Trees
• Artificial neural networks
• Bayesian classifiers
• Nearest neighbor
• …
Intuitive Picture of the Problem

Class1
Class2
Some Issues
• There may be a simple separator (e.g., a straight line in 2D or a
hyperplane in general) or there may not
• There may be “noise” of various kinds
• There may be “overlap”
• One should not be deceived by one’s low-dimensional geometrical
intuition
• Some classifiers explicitly represent separators (e.g., straight lines),
while for other classifiers the separation is done implicitly
• Some classifiers just make a decision as to which class an object is in;
others estimate class probabilities
Methods
I) Instance-based methods:
1) Nearest neighbor
II) Probabilistic models:
1) Naïve Bayes
2) Logistic Regression
III) Linear Models:
1) Perceptron
2) Support Vector Machine
IV) Decision Models:
1) Decision Trees
2) Boosted Decision Trees
3) Random Forest
Naive Bayes Classifier
Classifier
• Given a data point x, what is the probability of x belonging to some
class c?
• Directly model this statement as a conditional probability: p(c|x).
• Example:
• if there are 3 classes c₁, c₂, c₃, and x consists of 2 features x₁, x₂, the result of a classifier could
be something like p(c₁|x₁, x₂)=0.3, p(c₂|x₁, x₂)=0.5 and p(c₃|x₁, x₂)=0.2.
• If we care for a single label as the output, we would choose the one with the highest
probability, i.e. c₂ with a probability of 50% here.
The Naive Bayes Classifier
• Naive Bayes Classifier
• The naive Bayes classifier tries to compute the probabilities p(c|x) for all classes directly.

• max p(c|x) returns the maximum probability while argmax p(c|x) returns the c with this
highest probability.
Bayes Theorem
• Estimates conditional probability.
• Conditional probability is the likelihood of an outcome occurring, based on a
previous outcome occurring in similar circumstances.

𝑃(𝐴) 𝑃(𝐵|𝐴)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
Bayes Theorem
𝑃 𝐴 𝑃(𝐵|𝐴) 𝑃(𝐴∩𝐵)
•𝑃 𝐴𝐵 = =
𝑃 𝐵 𝑃(𝐵)

P(A)= The probability of A occurring

P(B)= The probability of B occurring

P(A∣B)=The probability of A given B

P(B∣A)= The probability of B given A

P(A⋂B))= The probability of both A and B occurring​


Bayes Theorem 𝑃(𝐴|𝐵) =
𝑃(𝐴) 𝑃(𝐵|𝐴)
𝑃(𝐵)
• Ex. Cancer diagnosis
• Binary classification problem
• Assume P(Cancer)=0.008
• Assume there is a blood test which results + or -.
• The test results + with a probability of %98 in cancer patients
• The test results – with a probability of %97 in cancer free people
• A person takes the test and results +, the questions are :
• What is the probability that the person has cancer?
• What is the probability that the person does not have cancer?
• What is is the diagnosis?
Bayes Theorem 𝑃(𝐴|𝐵) =
𝑃(𝐴) 𝑃(𝐵|𝐴)
𝑃(𝐵)
• Ex. Cancer diagnosis
• Binary classification problem
• Assume P(Cancer)=0.008
• Assume there is a blood test which results + or negative
• The test results + with a probability of %98 in cancer patients
• The test results – with a probability of %97 in cancer free people
• A person takes the test and results +, the questions are :
• What is the probability that the person has cancer?
• What is the probability that the person does not have cancer?
• What is is the diagnosis?
• P(cancer)=0.008
• P(notcancer)=0.992
• P(+|cancer)=0.98
• P(-|cancer)=0.02
• P(-|notcancer)=0.97
• P(+|notcancer)=0.03
• P(cancer|+)=?
• P(notcancer|+)=?

• P(cancer|+)=P(+|cancer)*P(cancer)/P(+)=0.98*0.008/P(+)=0.0078/P(+)
• P(notcancer|+)=P(+|notcancer)*P(notcancer)/P(+)=0.03*0.992/P(+)=0.0298/P(+)
• P(cancer|+)<P(notcancer|+)  not cancer
Bayes Theorem
• Ex: Imagine 100 people at a party, and you are told how many wear
pink or not, and if a man or not, and get these numbers:

• From these numbers, we can find the probabilities P(Man), P(Pink) etc.

• P(Man)=0.4
• P(Pink)=0.25
• P(Pink|Man)=0.125
Bayes Theorem
• Ex: Imagine 100 people at a party, and you are told how many wear pink or not,
and if a man or not, and get these numbers:
• From these numbers, we can find the probabilities P(Man), P(Pink) etc.

• P(Man)=0.4
• P(Pink)=0.25
• P(Pink|Man)=0.125

• What is the probability that a person wearing pink is a man P(Man|Pink)=?


• Even if we do not have access to the complete table, using only three numbers, we can
estimate it!
𝑃 𝑃𝑖𝑛𝑘 𝑀𝑎𝑛 ∗𝑃(𝑀𝑎𝑛)
• 𝑃 𝑀𝑎𝑛 𝑃𝑖𝑛𝑘 = 𝑃(𝑃𝑖𝑛𝑘)
• If we still have the raw data, we can calculate it directly.
Python Example on Classification
• iris_classification_example.py
K - Nearest Neighbor (KNN) Classifier
• KNN algorithm is one of the simplest classification algorithm
• non-parametric
• it does not make any assumptions on the underlying data distribution
• Its purpose is to use a dataset in which the data points are
separated into several classes to predict the classification of a new
sample point.
• KNN Algorithm is based on feature similarity
• How closely out-of-sample features resemble our training set
determines how we classify a given data point
1-Nearest Neighbor
3-Nearest Neighbor
Classification steps

1. Training phase: a model is constructed from the training


instances.
• classification algorithm finds relationships between predictors and
targets
• relationships are summarized in a model
2. Testing phase: test the model on a test sample whose class
labels are known but not used for training the model
3. Usage phase: use the model for classification on new data
whose class labels are unknown
K-Nearest Neighbor
• Features
• All instances correspond to points in an n-dimensional Euclidean space
• Classification is delayed till a new instance arrives
• Classification done by comparing feature vectors of the different points
• Target function may be discrete or real-valued
K-Nearest Neighbor
• An arbitrary instance is represented by (a1(x), a2(x), a3(x),.., an(x))
• ai(x) denotes features
• Euclidean distance between two instances
• 𝑑 𝑥𝑖 , 𝑥𝑗 = (σ𝑛𝑟=1(𝑎𝑟 𝑥𝑖 − 𝑎𝑟 (𝑥𝑗 ))2 )

• Continuous valued target function


• mean value of the k nearest training examples
What to determine in KNN?
•K
• K-nearest neighbours uses the local neighborhood to obtain a prediction
• The K memorized examples more similar to the one that is being classified are retrieved

• Distance metric
• A distance function is needed to compare the examples similarity
KNN - features
• If the ranges of the features differ, feaures with
bigger values will dominate decision
• In general feature values are normalized prior to
distance calculation
Python Example on Classification
• iris_classification_example.py
Homework
• Measure the performance of the classifiers (NB and KNN).
• Compare the two methods.
References
• Micheal I. Jordan, University of California, Berkeley, Lecture Notes,

You might also like