Introduction To: Information Retrieval
Introduction To: Information Retrieval
Introduction to
Information Retrieval
Hinrich Schtze and Christina Lioma
Lecture 15-1: Support Vector Machines
Overview
Outline
Todays class
Intensive machine-learning research in the last two
decades to improve classifier effectiveness
New generation of state-of-the-art classifiers: support vector
machines (SVMs), boosted decision trees, regularized logistic
regression, neural networks, and random forests
Applications to IR problems, particularly text classification
(1)
8
Functional Margin
We are confident in the classification of a point if it is far away
from the decision boundary.
Functional margin
10
Geometric margin
Geometric margin of the classifier: maximum width of the band
that can be drawn separating the support vectors of the two
classes.
(3)
if we replace w by 5 w
and b by 5b, then the geometric margin is
11
is maximized
For all
12
, and
Recapitulation
We start a training data set
The data set defines the best separating hyperplane
We feed the data through a quadratic optimization procedure
to find this plane
Given a new point
to classify, the classification function
computes the projection of the point onto the
hyperplane normal.
The sign of this function determines the class to assign to the
point.
If the point is within the margin of the classifier, the classifier
can return dont know rather than one of the two classes.
The value of
may also be transformed into a probability
of classification
14
16
17
Outline
18
Text classification
Many commercial applications
There is no question concerning the commercial value of
being able to classify documents automatically by content.
There are myriad potential applications of such a capability
for corporate Intranets, government departments, and
Internet publishers.
None?
Very little?
Quite a lot?
A huge amount, growing every day?
20
Example
IF (wheat OR grain) AND NOT (whole OR bread) THEN
c = grain
In practice, rules get a lot bigger than this, and can be phrased
using more sophisticated query languages than just Boolean
expressions, including the use of numeric scores. With careful
crafting, the accuracy of such rules can become very high (high
90% precision, high 80% recall). Nevertheless the amount of work
to create such well-tuned rules is very large. A reasonable estimate
is 2 days per class, and extra time has to go into maintenance of
rules, as the content of documents in classes drifts over time.
21
Often humans will sort or route email for their own purposes, and
these actions give information about classes.
Active Learning
A system is built which decides which documents a human should
label.
Usually these are the ones on which a classifier is uncertain of the
correct classification.
22
Recap
25
Resources
Chapter 15 of IIR
Resources at http://ifnlp.org/ir
26
y = x1 + 2x2 5.5
27
28