0% found this document useful (0 votes)

17 views17 pages

Vector Space Classification

Uploaded by

Harsha Vardhan sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views17 pages

Vector Space Classification

Uploaded by

Harsha Vardhan sai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction to Information Retrieval

Introduction to
Information Retrieval
Vector Space Classification

Chris Manning, Pandu Nayak and Prabhakar

Raghavan
Introduction to Information Retrieval Ch. 13

Vector Space Classification -

Topics to Do
 Vector Space Classification

 Rocchio Classification

 K- Nearest Neighbour (KNN) Classification

Introduction to Information Retrieval

Classification Using Vector Spaces

 In vector space classification, training set
corresponds to a labeled set of points
(equivalently, vectors)
 Premise 1: Documents in the same class
form a contiguous region of space
 Premise 2: Documents from different
classes don’t overlap (much)
 Learning a classifier: build surfaces to
delineate classes in the space
Sec.14.1

Documents in a Vector Space

Government

Science
Arts

4
Sec.14.1

Test Document of what class?

Government

Science
Arts

5
Sec.14.1

Test Document = Government

Is this
similarity
hypothesis
true in
general?

Government

Science
Arts

Our focus: how to find good separators 6

Sec.14.2

Definition of centroid

 Where Dc is the set of all documents that belong to

class c and v(d) is the vector space representation of
d.

 Note that centroid will in general not be a unit vector

even when the inputs are unit vectors.

7
Sec.14.2

Rocchio classification
 Rocchio forms a simple representative for
each class: the centroid/prototype
 Classification: nearest prototype/centroid
 It does not guarantee that classifications are
consistent with the given training data

8
Rocchio - Pseudo code

9
Sec.14.2

Rocchio classification
 Little used outside text classification
 It has been used quite effectively for text
classification
 But in general worse than Naïve Bayes
 Again, cheap to train and test documents

10
Sec.14.3

k Nearest Neighbor Classification

 kNN = k Nearest Neighbor

 To classify a document d:
 Define k-neighborhood as the k nearest
neighbors of d
 Pick the majority class label in the k-
neighborhood

11
Sec.14.3

Example: k=6 (6NN)

P(science| )?

Government

Science
Arts

12
Sec.14.3

Nearest-Neighbor Learning
 Learning: just store the labeled training examples D
 Testing instance x (under 1NN):
 Compute similarity between x and all examples in D.
 Assign x the category of the most similar example in D.
 Also called:
 Case-based learning
 Memory-based learning
 Lazy learning
 Rationale of kNN: contiguity hypothesis

13
Sec.14.3

k Nearest Neighbor
 Using only the closest example (1NN)
subject to errors due to:
 A single atypical example.
 Noise (i.e., an error) in the category label of
a single training example.
 More robust: find the k examples and
return the majority category of these k
 k is typically odd to avoid ties; 3 and 5
are most common
14
KNN - Pseudo code

15
Sec.14.3

kNN decision boundaries

Boundaries
are in
principle
arbitrary
surfaces –
but usually
polyhedra

Government

Science
Arts

kNN gives locally defined decision boundaries between

classes – far away points do not influence each classification
16
decision (unlike in Naïve Bayes, Rocchio, etc.)
Sec.14.3

kNN: Discussion
 No feature selection necessary
 No training necessary
 Scales well with large number of classes
 Don’t need to train n classifiers for n classes
 Classes can influence each other
 Small changes to one class can have ripple effect
 May be expensive at test time
 In most cases it’s more accurate than NB or Rocchio

Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
Lecture Notes Solid State Physics 1
No ratings yet
Lecture Notes Solid State Physics 1
28 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Unit 3
No ratings yet
Unit 3
100 pages
Physical Layer
No ratings yet
Physical Layer
82 pages
Numpy Operations
No ratings yet
Numpy Operations
55 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
Lecture 07 Slides
No ratings yet
Lecture 07 Slides
45 pages
CQF - ML - 3 - KNN - Annotated
No ratings yet
CQF - ML - 3 - KNN - Annotated
20 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
FEM 2d Lect1
No ratings yet
FEM 2d Lect1
138 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
Lecture W7ab
No ratings yet
Lecture W7ab
21 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
3 KNN
No ratings yet
3 KNN
18 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
CMTH642 - Module 10.2 - Classification
No ratings yet
CMTH642 - Module 10.2 - Classification
10 pages
NN Survey PDF
No ratings yet
NN Survey PDF
253 pages
Lec 04
No ratings yet
Lec 04
70 pages
CHP 4
No ratings yet
CHP 4
24 pages
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
No ratings yet
18.4 - K-Nearest Neighbours Geometric Intuition With A Toy Example - mp4
3 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Unit 1
No ratings yet
Unit 1
57 pages
CIVI6731 Week8
No ratings yet
CIVI6731 Week8
24 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Lec9-Tclassvspace 2
No ratings yet
Lec9-Tclassvspace 2
16 pages
C# Practical Solution
No ratings yet
C# Practical Solution
61 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
IR - BTech Model Paper
100% (1)
IR - BTech Model Paper
2 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
ML KN
No ratings yet
ML KN
12 pages
K Nearest Neighbours
No ratings yet
K Nearest Neighbours
12 pages
Fleet Management System-Sample
83% (6)
Fleet Management System-Sample
30 pages
Week 07
No ratings yet
Week 07
24 pages
Chap4 KNN
No ratings yet
Chap4 KNN
11 pages
(25452835 - Transactions On Aerospace Research) Infrared Signature Suppression Systems in Modern Military Helicopters PDF
No ratings yet
(25452835 - Transactions On Aerospace Research) Infrared Signature Suppression Systems in Modern Military Helicopters PDF
21 pages
Instance Based Learning: November 2015
No ratings yet
Instance Based Learning: November 2015
11 pages
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
No ratings yet
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
10 pages
Suggestion For ICT-2024
No ratings yet
Suggestion For ICT-2024
5 pages
ML Lec7
No ratings yet
ML Lec7
5 pages
Yu 2017 Centrifugal Microfluidics For Sorti
No ratings yet
Yu 2017 Centrifugal Microfluidics For Sorti
12 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
How To Enable and Read QueryService Logs
No ratings yet
How To Enable and Read QueryService Logs
3 pages
CS352 - Lab Syllabus
No ratings yet
CS352 - Lab Syllabus
2 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
Buckley 2005
No ratings yet
Buckley 2005
11 pages
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
No ratings yet
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
9 pages
Network Security 1.0 Modules 8 - 10 - ACLs and Firewalls Group Exam Answers
No ratings yet
Network Security 1.0 Modules 8 - 10 - ACLs and Firewalls Group Exam Answers
20 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
191008-Elmsbrook BREEAM Daylighting-Rev03
No ratings yet
191008-Elmsbrook BREEAM Daylighting-Rev03
10 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Efficient Layout Design of Junctionless Transistor Based 6-T
No ratings yet
Efficient Layout Design of Junctionless Transistor Based 6-T
7 pages
Module IV - K NN
No ratings yet
Module IV - K NN
15 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Pizza Price Prediction 5thquestion
No ratings yet
Pizza Price Prediction 5thquestion
3 pages
Radio Network Planning: in Arcgis
No ratings yet
Radio Network Planning: in Arcgis
12 pages
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
No ratings yet
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
13 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
17 pages
Chapter3 Electrochemistyry
No ratings yet
Chapter3 Electrochemistyry
2 pages
Relativistic Electrodynamics PDF
No ratings yet
Relativistic Electrodynamics PDF
10 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
67 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Circuit Explanation of 4 Channel Adapter For The Oscilloscope
No ratings yet
Circuit Explanation of 4 Channel Adapter For The Oscilloscope
4 pages
Nandha Engineering College, Erode-52 (: 15ME603 - Finite Element Analysis
No ratings yet
Nandha Engineering College, Erode-52 (: 15ME603 - Finite Element Analysis
4 pages
Unit - 3 Terms of Trade Types
No ratings yet
Unit - 3 Terms of Trade Types
4 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
03d Algind KNN Eng
No ratings yet
03d Algind KNN Eng
23 pages
Mitocw - Watch?V Eg8Djywdmyg: Professor
No ratings yet
Mitocw - Watch?V Eg8Djywdmyg: Professor
13 pages
Internet of Things Comparative Study
No ratings yet
Internet of Things Comparative Study
3 pages
Protech Controller LF-313LD
100% (4)
Protech Controller LF-313LD
2 pages
Specification - Bitumen Slip Layer (G&P)
No ratings yet
Specification - Bitumen Slip Layer (G&P)
4 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Excel Ch4 BTW
No ratings yet
Excel Ch4 BTW
3 pages
Ats Phy 09 F4 P2
No ratings yet
Ats Phy 09 F4 P2
60 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
Chapter 4 Cheat Sheet
No ratings yet
Chapter 4 Cheat Sheet
4 pages
DX Diag
No ratings yet
DX Diag
27 pages
5.1 Pages From Pages From ASME - PCC-2-2008 - Stored Energy Cal
No ratings yet
5.1 Pages From Pages From ASME - PCC-2-2008 - Stored Energy Cal
1 page
Enzyme Practical 1
No ratings yet
Enzyme Practical 1
2 pages
5.0SMLJ24A Datasheet
No ratings yet
5.0SMLJ24A Datasheet
5 pages
65° Panel Antenna
No ratings yet
65° Panel Antenna
2 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Basic Methods of Linear Functional Analysis
From Everand
Basic Methods of Linear Functional Analysis
John D. Pryce
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet