Advanced IR Clustering Techniques

Uploaded by

rm23082001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Advanced IR Clustering Techniques

Uploaded by

rm23082001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Classification

Methods & Cluster

Hypothesis
Information Retrieval CC4151
Classification Methods

 In the context of information retrieval, a classification is required for a purpose.

 The purpose may be to group the documents in such a way that retrieval will be faster or
alternatively it may be to construct a thesaurus automatically.
 There are two main areas of application of classification methods in IR:
(1) keyword clustering;
(2) document clustering.
Clustering and Cluster Hypothesis

 Clustering is used in information retrieval systems to

enhance the efficiency and effectiveness of the retrieval
process. Clustering is achieved by partitioning the documents
in a collection into classes such that documents that are
associated with each other are assigned to the same cluster.
 In information retrieval, the cluster hypothesis is an
assumption about the nature of the data handled in those
fields, which takes various forms. In information retrieval, it
states that documents that are clustered together "behave
similarly with respect to relevance to information needs".
Applications of Clustering
What is Benefit
Application
clustered?
search results more effective information
presentation to user
Search result clustering

(subsets of) alternative user interface: ``search

collection without typing''
Scatter-Gather

collection effective information presentation for

exploratory browsing
Collection clustering

collection increased precision and/or recall

Language modeling

collection higher efficiency: faster search

Cluster-based retrieval
Search Result Clustering
 Search results we mean the documents that were returned in
response to a query.
 The default presentation of search results in information retrieval is
a simple list.
 Users scan the list from top to bottom until they have found the
information they are looking for. Instead, search result clustering
clusters the search results, so that similar documents appear
together.
 It is often easier to scan a few coherent groups than many individual
documents.
 This is particularly useful if a search term has different word senses.
Scatter-Gather

 Scatter-Gather clusters the whole collection to get groups of documents that the user can
select or gather.
 The selected groups are merged and the resulting set is again clustered. This process is
repeated until a cluster of interest is found.
 Example: A collection of New York Times news stories is clustered (``scattered'') into eight
clusters (top row). The user manually gathers three of these into a smaller collection
International Stories and performs another scattering operation. This process repeats until a
small cluster with relevant documents is found (e.g., Trinidad)
Collection clustering

 Clustered collections store documents ordered by the clustered index key value,.
 clustered collections have the following benefits compared to non-clustered collections:
• Faster queries on clustered collections without needing a secondary index, such as queries
with range scans and equality comparisons on the clustered index key.
• Clustered collections have a lower storage size, which improves performance for queries
and bulk inserts.
• Clustered collections have additional performance improvements for inserts, updates,
deletes, and queries.
Language Modelling

 A common suggestion to users for coming up with good queries is

to think of words that would likely appear in a relevant document,
and to use those words as the query. The language modelling
approach to IR directly models that idea: a document is a good
match to a query if the document model is likely to generate the
query, which will in turn happen if the document contains the query
words often. This approach thus provides a different realization of
some of the basic ideas for document ranking.
Example: Finite Automata
Cluster-based

 Cluster-based information retrieval is one of the Information retrieval(IR) tools

that organize, extract features and categorize the web documents according
to their similarity.

An Efficient and Empirical Model of Distributed Clustering
No ratings yet
An Efficient and Empirical Model of Distributed Clustering
5 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
108 pages
Clustering in Information Retrieval
No ratings yet
Clustering in Information Retrieval
73 pages
Information Retrieval Systems Slip Test 2
No ratings yet
Information Retrieval Systems Slip Test 2
10 pages
Clustering Web Search Results: Iwona Białynicka-Birula
No ratings yet
Clustering Web Search Results: Iwona Białynicka-Birula
25 pages
Efficient Clustering Approaches For Organizing Document Collection
No ratings yet
Efficient Clustering Approaches For Organizing Document Collection
29 pages
Clustering Techniques in I.R.
No ratings yet
Clustering Techniques in I.R.
13 pages
SCHISM-A Web Search Engine Using Semantic Taxonomy: Ramesh Singh, Dhruv Dhingra, and Aman Arora
No ratings yet
SCHISM-A Web Search Engine Using Semantic Taxonomy: Ramesh Singh, Dhruv Dhingra, and Aman Arora
5 pages
Unit 1
No ratings yet
Unit 1
108 pages
6 Text Clustering
No ratings yet
6 Text Clustering
66 pages
Flat Clustering in Information Retrieval
No ratings yet
Flat Clustering in Information Retrieval
88 pages
IRT Unit 5
No ratings yet
IRT Unit 5
31 pages
Grouping and Joining 0
No ratings yet
Grouping and Joining 0
41 pages
Irs Cie-II Notes
No ratings yet
Irs Cie-II Notes
30 pages
IR Chapt 5
No ratings yet
IR Chapt 5
55 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
IR Lec 36
No ratings yet
IR Lec 36
29 pages
Lecture 17 Clustering
No ratings yet
Lecture 17 Clustering
63 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
Wi Ese Notes
No ratings yet
Wi Ese Notes
66 pages
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
No ratings yet
A New Hierarchical Document Clustering Method: Gang Kou Yi Peng
4 pages
Google'S Pagerank and Beyond:: The Science of Search Engine Rankings
No ratings yet
Google'S Pagerank and Beyond:: The Science of Search Engine Rankings
158 pages
Grouper A Dynamic Cluster Interface To Web Search Results
No ratings yet
Grouper A Dynamic Cluster Interface To Web Search Results
15 pages
Metasearch Clustering Algorithm
No ratings yet
Metasearch Clustering Algorithm
7 pages
Ir 103 131
No ratings yet
Ir 103 131
29 pages
Clustering in Irs PDF
No ratings yet
Clustering in Irs PDF
8 pages
Understanding Document Clustering in IR
No ratings yet
Understanding Document Clustering in IR
34 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
8 pages
Unt3 PPTX Digital Marketing
No ratings yet
Unt3 PPTX Digital Marketing
17 pages
Clustering in Information Retrieval
No ratings yet
Clustering in Information Retrieval
50 pages
Lecture17 IR
No ratings yet
Lecture17 IR
28 pages
Data Mining Ii Sol
No ratings yet
Data Mining Ii Sol
106 pages
Unit I
No ratings yet
Unit I
33 pages
NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
Information Storage And: Retrieval Techniques
No ratings yet
Information Storage And: Retrieval Techniques
56 pages
Unit I
No ratings yet
Unit I
11 pages
Part B
No ratings yet
Part B
12 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
AICS Unit I
No ratings yet
AICS Unit I
4 pages
Unit II
No ratings yet
Unit II
73 pages
Chapter Four: IR Models (Part-I)
No ratings yet
Chapter Four: IR Models (Part-I)
32 pages
Probabilistic IR & Query Expansion
No ratings yet
Probabilistic IR & Query Expansion
37 pages
Clustering and Search Techniques in Information Retrieval Systems
67% (3)
Clustering and Search Techniques in Information Retrieval Systems
39 pages
Information Retrieval System MODULE 3 Mumbai University
No ratings yet
Information Retrieval System MODULE 3 Mumbai University
27 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
No ratings yet
International Journal of Computing: Comprehensive Document Clustering For Information Retrieval On Web
7 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
12 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Intro to Info Retrieval Course
No ratings yet
Intro to Info Retrieval Course
31 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
IR Notes
No ratings yet
IR Notes
14 pages
Bs 31267274
No ratings yet
Bs 31267274
8 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
Module 1print
No ratings yet
Module 1print
5 pages
Irs Unit-4 Modified
No ratings yet
Irs Unit-4 Modified
13 pages
Inverse Sqaure Law For Gamma Radiation
100% (2)
Inverse Sqaure Law For Gamma Radiation
7 pages
IoT Testing and Quality Assurance Overview
No ratings yet
IoT Testing and Quality Assurance Overview
31 pages
STAT FYUGP Syllabus Mar 19 (Major+Minor)
No ratings yet
STAT FYUGP Syllabus Mar 19 (Major+Minor)
77 pages
4.2.1.3 Packet Tracer - Configuring EtherChannel
No ratings yet
4.2.1.3 Packet Tracer - Configuring EtherChannel
11 pages
Hsslive Xii Zlgy Notes Navas Full 2024
No ratings yet
Hsslive Xii Zlgy Notes Navas Full 2024
160 pages
Prakash Industries Limited
No ratings yet
Prakash Industries Limited
119 pages
Robotics: Kinematics and Mathematical Foundations: Prof. C.J. Taylor University of Pennsylvania
No ratings yet
Robotics: Kinematics and Mathematical Foundations: Prof. C.J. Taylor University of Pennsylvania
28 pages
SOP Dispensing
No ratings yet
SOP Dispensing
8 pages
Pt. Som Chandra Dwivedi Vidhi Mahavidyalaya: Ll.B. 3 Years Course
No ratings yet
Pt. Som Chandra Dwivedi Vidhi Mahavidyalaya: Ll.B. 3 Years Course
16 pages
Idhr 6 J MF VF
No ratings yet
Idhr 6 J MF VF
15 pages
Anchoring
No ratings yet
Anchoring
5 pages
Signal Theory and Application
No ratings yet
Signal Theory and Application
16 pages
Medical Tourism Services in India
No ratings yet
Medical Tourism Services in India
23 pages
AVITEC - Axell Wireless CSFT Frequency Shifting Repeaters
No ratings yet
AVITEC - Axell Wireless CSFT Frequency Shifting Repeaters
4 pages
Sabon Price (Seller)
No ratings yet
Sabon Price (Seller)
1 page
The Impact of Artificial Intelligence On Strategic Decision-Making in Corporations
No ratings yet
The Impact of Artificial Intelligence On Strategic Decision-Making in Corporations
15 pages
Earthing System Specifications 16645
No ratings yet
Earthing System Specifications 16645
7 pages
He-Ne and CO2 Gas Lasers Overview
No ratings yet
He-Ne and CO2 Gas Lasers Overview
27 pages
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
100% (1)
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
52 pages
APP - AxION TOF Analysis of Pharmaceuticals
No ratings yet
APP - AxION TOF Analysis of Pharmaceuticals
5 pages
Change Management Success Factors
No ratings yet
Change Management Success Factors
15 pages
Motorcycle Mechanic Course Curriculum
100% (1)
Motorcycle Mechanic Course Curriculum
19 pages
Email Copy Sample #2
No ratings yet
Email Copy Sample #2
2 pages
Reference 2023-05-03
No ratings yet
Reference 2023-05-03
5 pages
Race Comp 2
100% (1)
Race Comp 2
18 pages
Word Formation Practice Test
No ratings yet
Word Formation Practice Test
55 pages
Access 2003
No ratings yet
Access 2003
347 pages
JVM Presentation
No ratings yet
JVM Presentation
14 pages
Dr. Miswar Fattah, M.Si. Kesiapan ATLM Dalam Melakukan Swab Nasofaring Dan Orofaring Dalam Percepatan Diagnostik COVID-19. PDSPatKLIn 18072020
No ratings yet
Dr. Miswar Fattah, M.Si. Kesiapan ATLM Dalam Melakukan Swab Nasofaring Dan Orofaring Dalam Percepatan Diagnostik COVID-19. PDSPatKLIn 18072020
41 pages

Advanced IR Clustering Techniques

Uploaded by

Advanced IR Clustering Techniques

Uploaded by

Classification

Methods & Cluster

 In the context of information retrieval, a classification is required for a purpose.

 Clustering is used in information retrieval systems to

(subsets of) alternative user interface: ``search

collection effective information presentation for

collection increased precision and/or recall

collection higher efficiency: faster search

 A common suggestion to users for coming up with good queries is

 Cluster-based information retrieval is one of the Information retrieval(IR) tools

You might also like