0% found this document useful (0 votes)

64 views28 pages

Introduction To: Information Retrieval

The document provides an introduction to support vector machines (SVMs) for text classification. It discusses how SVMs work by finding a decision boundary between two classes that is maximally distant from any training data points. It describes how SVMs can be extended to handle multiple classes and issues with non-linearly separable data. The document also discusses challenges in text classification, such as choosing a classifier based on the amount of available training data and obtaining more labeled data.

Uploaded by

Sim-pae Chort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views28 pages

Introduction To: Information Retrieval

Uploaded by

Sim-pae Chort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Information Retrieval

Introduction to

Information Retrieval
Hinrich Schtze and Christina Lioma
Lecture 15-1: Support Vector Machines

Introduction to Information Retrieval

Overview

Support Vector Machines

Issues in the classification of text documents

Introduction to Information Retrieval

Outline

Support Vector Machines

Issues in the classification of text documents

Introduction to Information Retrieval

Todays class
Intensive machine-learning research in the last two
decades to improve classifier effectiveness
New generation of state-of-the-art classifiers: support vector
machines (SVMs), boosted decision trees, regularized logistic
regression, neural networks, and random forests
Applications to IR problems, particularly text classification

SVMs: A kind of large-margin classifier

Vector space based machine-learning method aiming to find a
decision boundary between two classes that is maximally far
from any point in the training data (possibly discounting some
points as outliers or noise)
4

Introduction to Information Retrieval

Support Vector Machines

2-class training data
decision boundary
linear separator
criterion: being maximally far
away from any data point
determines classifier
margin
linear separator position
defined by support vectors
5

Introduction to Information Retrieval

Why maximise the margin?

Points near decision
surface uncertain
classification decisions
(50% either way).
A classifier with a large
margin makes no low
certainty classification
decisions.
Gives classification
safety margin w.r.t slight
errors in measurement or
doc. variation
6

Introduction to Information Retrieval

Why maximise the margin?

SVM classifier: large margin
around decision boundary
compare to decision
hyperplane: place fat
separator between classes
fewer choices of where it can
be put

decreased memory capacity

increased ability to correctly
generalize to test data

Introduction to Information Retrieval

Lets formalise an SVM with algebra

Hyperplane
An n-dimensional generalisation of a plane (point in 1-D space,
line in 2-D space, ordinary plane in 3-D space).

Decision hyperplane (previously seen, page 278)

Can be defined by:
intercept term b

normal vector w (weight vector) which is perpendicular to

the hyperplane
All points x on the hyperplane satisfy:

(1)
8

Introduction to Information Retrieval

Lets formalise an SVM with algebra

Preliminaries
Consider a binary classification problem:
xi are the input vectors
yi are the labels

The xi define a space of labelled points called input space.

For SVMs, the two data classes are always named +1 and 1,
and the intercept term is always explicitly represented as b.

The linear classifier is then:

(2)

A value of 1 indicates one class, and a value of +1 the other

class.

Introduction to Information Retrieval

Functional Margin
We are confident in the classification of a point if it is far away
from the decision boundary.
Functional margin

The functional margin of the ith example xi w.r.t the hyperplane

The functional margin of a data set w.r.t a decision surface is
twice the functional margin of any of the points in the data set
with minimal functional margin
factor 2 comes from measuring across the whole width of the
margin

But we can increase functional margin by scaling w and b.

We need to place some constraint on the size of the w vector.

Introduction to Information Retrieval

Geometric margin
Geometric margin of the classifier: maximum width of the band
that can be drawn separating the support vectors of the two
classes.
(3)

The geometric margin is clearly invariant to scaling of parameters:

if we replace w by 5 w
and b by 5b, then the geometric margin is

the same, because it is inherently normalized by the length of w.

Introduction to Information Retrieval

Linear SVM Mathematically

Assume canonical distance
Assume that all data is at least distance 1 from the hyperplane,
then:
(4)
Since each examples distance from the hyperplane is
, the geometric margin is
We want to maximize this geometric margin.

That is, we want to find w

and b such that:

is maximized
For all

Introduction to Information Retrieval

Linear SVM Mathematically (cont.)

Maximizing
is the same as minimizing
This gives the
final standard formulation of an SVM as a minimization problem:
Example
Find w and b such that:
is minimized (because
for all

, and

We are now optimizing a quadratic function subject to linear

constraints. Quadratic optimization problems are standard
mathematical optimization problems, and many algorithms exist
for solving them (e.g. Quadratic Programming libraries).
13

Introduction to Information Retrieval

Recapitulation
We start a training data set
The data set defines the best separating hyperplane
We feed the data through a quadratic optimization procedure
to find this plane
Given a new point
to classify, the classification function
computes the projection of the point onto the
hyperplane normal.
The sign of this function determines the class to assign to the
point.
If the point is within the margin of the classifier, the classifier
can return dont know rather than one of the two classes.
The value of
may also be transformed into a probability
of classification
14

Introduction to Information Retrieval

Soft margin classification

What happens if data is not linearly separable?
Standard approach: allow the fat decision margin to make a few
mistakes
some points, outliers, noisy examples are inside or on the wrong side
of the margin

Pay cost for each misclassified example, depending on how far it is

from meeting the margin requirement
Slack variable i : A non-zero value for i allows
to not meet the
margin requirement at a cost proportional to the value of i.
Optimisation problem: trading off how fat it can make the margin
vs. how many points have to be moved around to allow this
margin.
The sum of the i gives an upper bound on the number of training
errors.
Soft-margin SVMs minimize training error traded off against
15
margin.

Introduction to Information Retrieval

Multiclass support vector machines

SVMs: inherently two-class classifiers.
Most common technique in practice: build |C| one-versusrest classifiers (commonly referred to as one-versus-all or
OVA classification), and choose the class which classifies the
test data with greatest margin
Another strategy: build a set of one-versus-one classifiers,
and choose the class that is selected by the most classifiers.
While this involves building |C|(|C| 1)/2 classifiers, the
time for training classifiers may actually decrease, since the
training data set for each classifier is much smaller.

Introduction to Information Retrieval

Multiclass support vector machines

Better alternative: structural SVMs
Generalization of classification where the classes are not
just a set of independent, categorical labels, but may be
arbitrary structured objects with relationships defined
between them
Will look at this more closely with respect to IR ranking
next time.

Introduction to Information Retrieval

Outline

Support Vector Machines

Issues in the classification of text documents

Introduction to Information Retrieval

Text classification
Many commercial applications
There is no question concerning the commercial value of
being able to classify documents automatically by content.
There are myriad potential applications of such a capability
for corporate Intranets, government departments, and
Internet publishers.

Often greater performance gains from exploiting domain-specific

text features than from changing from one machine learning
method to another.
Understanding the data is one of the keys to successful
categorization, yet this is an area in which most
Categorization tool vendors are extremely weak. Many of the
one size fits all tools on the market have not been tested on
a wide range of content types.
19

Introduction to Information Retrieval

Choosing what kind of classifier to use

When building a text classifier, first question: how much training
data is there currently available?
Practical challenge: creating or obtaining enough training data
Hundreds or thousands of examples from each class are
required to produce a high performance classifier and many
real world contexts involve large sets of categories.

None?
Very little?
Quite a lot?
A huge amount, growing every day?
20

Introduction to Information Retrieval

If you have no labeled training data

Use hand-written rules

Example
IF (wheat OR grain) AND NOT (whole OR bread) THEN
c = grain
In practice, rules get a lot bigger than this, and can be phrased
using more sophisticated query languages than just Boolean
expressions, including the use of numeric scores. With careful
crafting, the accuracy of such rules can become very high (high
90% precision, high 80% recall). Nevertheless the amount of work
to create such well-tuned rules is very large. A reasonable estimate
is 2 days per class, and extra time has to go into maintenance of
rules, as the content of documents in classes drifts over time.
21

Introduction to Information Retrieval

If you have fairly little data and you are going to

train a supervised classifier
Work out how to get more labeled data as quickly as you can.
Best way: insert yourself into a process where humans will
be willing to label data for you as part of their natural tasks.
Example

Often humans will sort or route email for their own purposes, and
these actions give information about classes.
Active Learning
A system is built which decides which documents a human should
label.
Usually these are the ones on which a classifier is uncertain of the
correct classification.
22

Introduction to Information Retrieval

If you have labeled data

Reasonable amount of labeled data
Use everything that we have presented about text
classification.
Preferably hybrid approach (overlay Boolean classifier)

Huge amount of labeled data

Choice of classifier probably has little effect on your results.
Choose classifier based on the scalability of training or runtime
efficiency. Rule of thumb: each doubling of the training data
size produces a linear increase in classifier performance, but
with very large amounts of data, the improvement becomes
sub-linear.
23

Introduction to Information Retrieval

Large and difficult category taxonomies

If small number of well-separated categories, then many
classification algorithms are likely to work well. But often: very
large number of very similar categories.
Example
Web directories (e.g. the Yahoo! Directory consists of over
200,000 categories or the Open Directory Project), library
classification schemes (Dewey Decimal or Library of Congress),
the classification schemes used in legal or medical applications.

Accurate classification over large sets of closely related classes is

inherently difficult.
24

Introduction to Information Retrieval

Recap

SVMs: main idea, maximum margin (soft margin briefly),

binary classification (multiclass briefly)
Issues in text classification: training data availability,
taxonomies in practice

Introduction to Information Retrieval

Resources

Chapter 15 of IIR
Resources at http://ifnlp.org/ir

Introduction to Information Retrieval

Walkthrough example: building an SVM over the

data set shown in the figure
Working geometrically:
The maximum margin weight vector
will be parallel to the shortest line
connecting points of the two classes,
that is, the line between (1, 1) and
(2, 3), giving a weight vector of (1,2).
The optimal decision surface is
orthogonal to that line and
intersects it at the halfway point.
Therefore, it passes through (1.5, 2).
So, the SVM decision boundary is:

y = x1 + 2x2 5.5
27

Introduction to Information Retrieval

Walkthrough example: building an SVM over the

data set shown in the figure
Working algebraically:
With the constraint sign
, we seek to
minimize
We know that the solution is
for some a. So:
a + 2a + b = 1, 2a + 6a + b = 1
Hence, a = 2/5 and b = 11/5.
So the optimal hyperplane is given
by
and b = 11/5.
The margin is

Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
67 pages
Quiz 1 On Wednesday
No ratings yet
Quiz 1 On Wednesday
46 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
14 Vcat
No ratings yet
14 Vcat
66 pages
Lecture 15
No ratings yet
Lecture 15
35 pages
Machine Learning: Shoaib Farooq
No ratings yet
Machine Learning: Shoaib Farooq
17 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
Date: Venue:: 28-11-2023, Saveetha School of Engineering
No ratings yet
Date: Venue:: 28-11-2023, Saveetha School of Engineering
100 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Chap 13
No ratings yet
Chap 13
68 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Lecture 8-2 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-2 - Text Classification, Naïve Bayes, Vector Space Classification
30 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
Intro ML PDF
No ratings yet
Intro ML PDF
232 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Lecture15 Learning Ranking
No ratings yet
Lecture15 Learning Ranking
46 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
SVM Class
No ratings yet
SVM Class
33 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
AnIntroductiontoMachineLearning - Thebook
No ratings yet
AnIntroductiontoMachineLearning - Thebook
234 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Thebook PDF
No ratings yet
Thebook PDF
234 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
ML Word To PDF
No ratings yet
ML Word To PDF
229 pages
Tensor Analysis and Differential Geometry
100% (2)
Tensor Analysis and Differential Geometry
163 pages
A Course in Symbolic Logic
100% (3)
A Course in Symbolic Logic
371 pages
Fine-Structure Constant From Golden Ratio Geometry
No ratings yet
Fine-Structure Constant From Golden Ratio Geometry
17 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
Math10 - Q1 - Mod2 - Lessons1-2 Proving-Remainder Theorem and Factor Theorem v3
70% (10)
Math10 - Q1 - Mod2 - Lessons1-2 Proving-Remainder Theorem and Factor Theorem v3
21 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Lecture15 Learning Ranking
No ratings yet
Lecture15 Learning Ranking
46 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
WC Grade 7 Mathematics Exam Paper 1 June
100% (2)
WC Grade 7 Mathematics Exam Paper 1 June
6 pages
10 SVM
No ratings yet
10 SVM
23 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Matth S2 2324 - S2 - Sess - Exam - P1 - Question
No ratings yet
Matth S2 2324 - S2 - Sess - Exam - P1 - Question
15 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Discrete Structures Assignments
No ratings yet
Discrete Structures Assignments
30 pages
Chapter 03 Test Bank
No ratings yet
Chapter 03 Test Bank
128 pages
Algebra Lineal I
No ratings yet
Algebra Lineal I
100 pages
Edtpa Math Lesson 5
100% (1)
Edtpa Math Lesson 5
4 pages
Kl1 Instructions
No ratings yet
Kl1 Instructions
4 pages
Basel Problem Proof
No ratings yet
Basel Problem Proof
4 pages
Bezdek - 1987 - Some Non-Standard Clustering Algorithms PDF
No ratings yet
Bezdek - 1987 - Some Non-Standard Clustering Algorithms PDF
582 pages
Lecture Note On Clifford Algebra - Park
No ratings yet
Lecture Note On Clifford Algebra - Park
40 pages
OSM TermExamPaper M2A Set1 e 1
No ratings yet
OSM TermExamPaper M2A Set1 e 1
26 pages
Selfstudys Com File
No ratings yet
Selfstudys Com File
23 pages
Stained Glass Project Example
No ratings yet
Stained Glass Project Example
5 pages
Pythagoras Theorem PixiPPt
No ratings yet
Pythagoras Theorem PixiPPt
17 pages
Phy-214
No ratings yet
Phy-214
2 pages
Lecture Notes On Linear Programming
No ratings yet
Lecture Notes On Linear Programming
78 pages
Ch13 Student
No ratings yet
Ch13 Student
14 pages
Java Operators Notes
No ratings yet
Java Operators Notes
2 pages
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
No ratings yet
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
10 pages
Laboratory 8 - Continuous Time Fourier Transform
No ratings yet
Laboratory 8 - Continuous Time Fourier Transform
10 pages
Kinematics - Motion in A Plane
No ratings yet
Kinematics - Motion in A Plane
6 pages
Functions and An To Recursion: Objectives
No ratings yet
Functions and An To Recursion: Objectives
24 pages
Homework 2 Solved PDF
No ratings yet
Homework 2 Solved PDF
6 pages
Nat5 Maths Straight Line Worksheet
No ratings yet
Nat5 Maths Straight Line Worksheet
1 page
INMO 2025 Workshop Schedule
No ratings yet
INMO 2025 Workshop Schedule
2 pages
2051436 (1)
No ratings yet
2051436 (1)
15 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet