0% found this document useful (0 votes)

320 views72 pages

Part I IR VTU M Tech SSE

This document provides an overview of information retrieval. It discusses the motivation for information retrieval, including representing, storing, organizing and accessing information items based on a user's information need. The document outlines the basic concepts of information retrieval, including the logical view of documents and document representation. It then discusses the history of information retrieval from the 1960s to the present, including the development of early text retrieval systems, large document databases, search engines and the impact of the world wide web. The document concludes by describing the typical retrieval process that involves operations on the text, query, indexing and searching to retrieve relevant documents.

Uploaded by

NatarajanS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

320 views72 pages

Part I IR VTU M Tech SSE

Uploaded by

NatarajanS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 72

Information Retrieval

M Tech(Software Engineering) Third Semester Elective

Course Instructor : Dr S.Natarajan

Professor and Key Resource Person
Department of Information Science and Engineering
PES Institute of Technology
Bengaluru
Information Retrieval
Information Retrieval – PART I
Introduction-
Motivation
Basic Concepts
Past, Present and the Future
The Retrieval Process
Motivation
 IR: representation, storage, organization of,
and access to information items
 Focus is on the user information need
 User information need:
 Find all docs containing information on college tennis
teams which: (1) are maintained by a USA university and
(2) participate in the NCAA tournament.
 Emphasis is on the retrieval of information (not data)
Motivation
 Data retrieval
 which docs contain a set of keywords?
 Well defined semantics
 a single erroneous object implies failure!

 Information retrieval
 information about a subject or topic
 semantics is frequently loose
 small errors are tolerated

 IR system:
 interpretcontents of information items
 generate a ranking which reflects relevance
 notion of relevance is most important
Motivation
 IR at the center of the stage
 IR in the last 20 years:
 classificationand categorization
 systems and languages

 user interfaces and visualization

 Still,
area was seen as of narrow interest
 Advent of the Web changed this perception
once and for all
 universal repository of knowledge
 free (low cost) universal access

 no central editorial board

 many problems though: IR seen as key to

finding the solutions!
Information Retrieval – UNIT I
INTRODUCTION,RETRIEVAL STRATEGIES –I:
Introduction-
Motivation
Basic Concepts
Past, Present and the Future
The Retrieval Process
Basic Concepts
 The User Task

Retrieval

Database
Browsing

 Retrieval
 information or data
 purposeful

 Browsing
 glancing around
 F1; cars, Le Mans, France, tourism
Basic Concepts
 Logical view of the documents
Accents Noun Manual
Docs spacing stopwords groups stemming indexing

structure

structure Full text Index terms

 Document representation viewed as a

continuum: logical view of docs might shift
Information Retrieval – UNIT I
INTRODUCTION,RETRIEVAL STRATEGIES –I:
Introduction-
Motivation
Basic Concepts
Past, Present and the Future
The Retrieval Process
History of IR

• 1960-70’s:
– Initial exploration of text retrieval systems for
“small” corpora of scientific abstracts, and law
and business documents.
– Development of the basic Boolean and vector-
space models of retrieval.
– Prof. Salton and his students at Cornell
University are the leading researchers in the
area.

11
IR History Continued

• 1980’s:
– Large document database systems, many run by
companies:
• Lexis-Nexis
• Dialog
• MEDLINE

12
IR History Continued

• 1990’s:
– Searching FTPable documents on the Internet
• Archie
• WAIS
– Searching the World Wide Web
• Lycos
• Yahoo
• Altavista

13
IR History Continued

• 1990’s continued:
– Organized Competitions
• NIST TREC
– Recommender Systems
• Ringo
• Amazon
• NetPerceptions
– Automated Text Categorization & Clustering

14
Recent IR History

• 2000’s
– Link analysis for Web Search
• Google
– Automated Information Extraction
• Whizbang
• Fetch
• Burning Glass
– Question Answering
• TREC Q/A track

15
Recent IR History

• 2000’s continued:
– Multimedia IR
• Image
• Video
• Audio and music
– Cross-Language IR
• DARPA Tides
– Document Summarization

16
The Seven Ages of
Information Retrieval
 Vannevar Bush's 1945 article set a
goal of fast access to the contents of
the world's libraries which looks like
it will be achieved by 2010, sixty-five
years later.
 Bush’s Prediction
Modern History
 The “information overload” problem is much older
than you may think
 Origins in period immediately after World War II
 Tremendous scientific progress during the war
 Rapid growth in amount of scientific publications
available
 The “Memex Machine”
 Conceived by Vannevar Bush, President Roosevelt's
science advisor
 Outlined in 1945 Atlantic Monthly article titled “As We
May Think”
 Foreshadows the development of hypertext (the Web)
and information retrieval system
The Memex Machine
Historical aspects
 As We May Think'', by Vannevar Bush
Article was originally published in 1945.
He imagined that machines would read in visual form
His assertion that logic is suitable for mechanical computation is
not yet appreciated
Documents are accessible & viewable from the memex
system of Bush
Documents may exist on many media: text, pictures, audio.

The memex can keep the ``trail'' of documents you read while you
follow your curiosity(Basically, it's a persistent history of URLs as
you surf the web.)
You can create associations between documents
You can enter original material

Most have been implemented as of 2005

IR Childhood (1945-1955)
 Ideas conceived
 Information explosion after World War II

 Possibility of information processing

machine
 Memex
 The hardware seems mostly out of date.

 user inserting 5000 pages per day into a

personal repository and it taking hundreds
of years to fill it up.
 the software goals have not been achieved.
The Schoolboy (1960s)
 Many many experiments
 Use of Precision and Recall
 Use of relevance feedback
Adulthood (1970s)
 The invention of
 word processing systems

 time-sharing systems

 The beginning of information industry

 OCLC(Online Computer Library Centre)

 DIALOG

 BRS(Bibliographic Retrieval Service)

Maturity (1980s)
Mid-Life Crisis (1990s)
• Internet put IR to the test.
• Better understanding of the limit of IR.
• Large scale evaluations
• Digital Libraries projects
Predictions
 Fulfillment (2000s)
 Retirement (2010)
Information Retrieval – PART I
INTRODUCTION,RETRIEVAL STRATEGIES –I:
Introduction-
Motivation
Basic Concepts
Past, Present and the Future
The Retrieval Process
The Retrieval Process
Text
User
Interface

user need 4, 10 Text

Text Operations

6, 7
logical view logical view

Query DB Manager
Operations Indexing
Module
user feedback

5 8
inverted file
query

Searching
Index

8
retrieved docs
Text
Database
Ranking
ranked docs
2
Information Retrieval – PART I
INTRODUCTION,RETRIEVAL STRATEGIES –I:
Introduction-
Motivation
Basic Concepts
Past, Present and the Future
The Retrieval Process
Other Related Slides – not part of the book
Information Retrieval
(IR)
• The indexing and retrieval of textual
documents.
• Searching for pages on the World Wide
Web is the most recent “killer app.”
• Concerned firstly with retrieving relevant
documents to a query.
• Concerned secondly with retrieving from
large sets of documents efficiently.

30
Typical IR Task

• Given:
– A corpus of textual natural-language
documents.
– A user query in the form of a textual string.
• Find:
– A ranked set of documents that are relevant to
the query.

31
IR System

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Documents .
.

32
Relevance

• Relevance is a subjective judgment and may

include:
– Being on the proper subject.
– Being timely (recent information).
– Being authoritative (from a trusted source).
– Satisfying the goals of the user and his/her
intended use of the information (information
need).

33
Keyword Search

• Simplest notion of relevance is that the

query string appears verbatim in the
document.
• Slightly less strict notion is that the words
in the query appear frequently in the
document, in any order (bag of words).

34
Problems with Keywords

• May not retrieve relevant documents that

include synonymous terms.
– “restaurant” vs. “café”
– “PRC” vs. “China”
• May retrieve irrelevant documents that
include ambiguous terms.
– “bat” (baseball vs. mammal)
– “Apple” (company vs. fruit)
– “bit” (unit of data vs. act of eating)
35
Beyond Keywords

• We will cover the basics of keyword-based

IR, but…
• We will focus on extensions and recent
developments that go beyond keywords.
• We will cover the basics of building an
efficient IR system, but…
• We will focus on basic capabilities and
algorithms rather than systems issues that
allow scaling to industrial size databases.
36
Intelligent IR

• Taking into account the meaning of the

words used.
• Taking into account the order of words in
the query.
• Adapting to the user based on direct or
indirect feedback.
• Taking into account the authority of the
source.

37
IR System Architecture

User Interface
Text
User
Text Operations
Need
Logical View
User Query Database
Feedback Operations Indexing
Manager
Inverted
file
Query Searching Index
Text
Ranked Retrieved Database
Docs Ranking Docs
38
IR System Components
• Text Operations forms index words (tokens).
– Stopword removal
– Stemming
• Indexing constructs an inverted index of
word to document pointers.
• Searching retrieves documents that contain a
given query token from the inverted index.
• Ranking scores all retrieved documents
according to a relevance metric.

39
IR System Components (continued)
• User Interface manages interaction with the
user:
– Query input and document output.
– Relevance feedback.
– Visualization of results.
• Query Operations transform the query to
improve retrieval:
– Query expansion using a thesaurus.
– Query transformation using relevance feedback.

40
Web Search

• Application of IR to HTML documents on

the World Wide Web.
• Differences:
– Must assemble document corpus by spidering
the web.
– Can exploit the structural layout information
in HTML (XML).
– Documents change uncontrollably.
– Can exploit the link structure of the web.

41
Web Search System

Web Spider Document

corpus

Query IR
String System

1. Page1
2. Page2
3. Page3
Ranked
. Documents
.

42
Other IR-Related Tasks

• Automated document categorization

• Information filtering (spam filtering)
• Information routing
• Automated document clustering
• Recommending information or products
• Information extraction
• Information integration
• Question answering
43
Related Areas

• Database Management
• Library and Information Science
• Artificial Intelligence
• Natural Language Processing
• Machine Learning

44
Database Management

• Focused on structured data stored in

relational tables rather than free-form text.
• Focused on efficient processing of well-
defined queries in a formal language (SQL).
• Clearer semantics for both data and queries.
• Recent move towards semi-structured data
(XML) brings it closer to IR.

45
Library and Information Science

• Focused on the human user aspects of

information retrieval (human-computer
interaction, user interface, visualization).
• Concerned with effective categorization of
human knowledge.
• Concerned with citation analysis and
bibliometrics (structure of information).
• Recent work on digital libraries brings it
closer to CS & IR.
46
Artificial Intelligence

• Focused on the representation of knowledge,

reasoning, and intelligent action.
• Formalisms for representing knowledge and
queries:
– First-order Predicate Logic
– Bayesian Networks
• Recent work on web ontologies and
intelligent information agents brings it
closer to IR.
47
Natural Language Processing

• Focused on the syntactic, semantic, and

pragmatic analysis of natural language text
and discourse.
• Ability to analyze syntax (phrase structure)
and semantics could allow retrieval based
on meaning rather than keywords.

48
Natural Language Processing:
IR Directions
• Methods for determining the sense of an
ambiguous word based on context (word
sense disambiguation).
• Methods for identifying specific pieces of
information in a document (information
extraction).
• Methods for answering specific NL
questions from document corpora.

49
Machine Learning

• Focused on the development of

computational systems that improve their
performance with experience.
• Automated classification of examples
based on learning concepts from labeled
training examples (supervised learning).
• Automated methods for clustering
unlabeled examples into meaningful
groups (unsupervised learning).
50
Machine Learning:
IR Directions

• Text Categorization
– Automatic hierarchical classification (Yahoo).
– Adaptive filtering/routing/recommending.
– Automated spam filtering.
• Text Clustering
– Clustering of IR query results.
– Automatic formation of hierarchies (Yahoo).
• Learning for Information Extraction
• Text Mining
51
IR research
System prototyping

Interface Retrieval algorithms

Interaction IR System Contents

User Satisfaction Evaluation

User
Top Ten Research Issues
10. Relevance Feedback.
9. Information Extraction.
8. Multimedia Retrieval.
7. Effective Retrieval.
6. Routing and Filtering.
Top Ten Research Issues
5. Interfaces and Browsing.
4. “Magic” (Vocabulary Mapping).
3. Efficient, Flexible Indexing and
Retrieval.
2. Distributed IR.
1. Integrated Solutions.
 A new Industry – Content
Management
Introduction to Information Retrieval

Unstructured (text) vs. structured

(database) data in 1996

55
Introduction to Information Retrieval

Unstructured (text) vs. structured

(database) data in 2009

56
Definitions
• An Information Retrieval (IR) System
• attempts to find relevant documents to
respond to a user’s request.
• The real problem boils down to matching
the language of the query to the language of
the document.
What is Information?
 What do you think?
 There is no “correct” definition
 Cookie Monster’s definition:
 “news or facts about something”
 Different approaches:
 Philosophy
 Psychology
 Linguistics
 Electrical engineering
 Physics
 Computer science
 Information science
Dictionary says…
 Oxford English Dictionary
 information: informing, telling; thing told, knowledge,
items of knowledge, news
 knowledge: knowing familiarity gained by experience;
person’s range of information; a theoretical or practical
understanding of; the sum of what is known
 Random House Dictionary
 information: knowledge communicated or received
concerning a particular fact or circumstance; news
Intuitive Notions
 Information must
 Be something, although the exact nature (substance,
energy, or abstract concept) is not clear;
 Be “new”: repetition of previously received messages is
not informative
 Be “true”: false or counterfactual information is “mis-
information”
 Be “about” something

Robert M. Losee. (1997) A Discipline Independent Definition of Information.

Journal of the American Society for Information Science, 48(3), 254-269.
Three Views of Information
 Information as process
 Information as communication
 Information as message transmission and
reception
One View
 Information = characteristics of the output of a
process
 Tells us something about the process and the input
Input Output

Input Process Output

Input Output

 Information-generating process do not occur in

isolation
Input Process1 Process2 … Output

Ibid.
Where’s the human?
 If a tree falls in the forest, and no one is around to
hear it, is information transmitted?
 In the “information as process”: Yes, but that’s
not very interesting to us
 We’re concerned about information for human
consumption
 Transmission of information from one person to another
 Recording of information
 Reconstruction of stored information
Another View
 Information science is characterized by “the
deliberate (purposeful) structure of the message
by the sender in order to affect the image
structure of the recipient”
 This implies that the sender has knowledge of the
recipient's structure
 Text = “a collection of signs purposefully
structured by a sender with the intention of
changing image-structure of a recipient”
 Information = “the structure of any text which is
capable of changing the image-structure of a
recipient”

Nicholas J. Belkin and Stephen E. Robertson. (1976) Information Science and the Phenomenon of
Information. Journal of the American Society for Information Science, 27(4), 197-204.
Transfer of Information
 Communication = transmission of information

Thoughts Thoughts
Telepathy?

Words Words
Writing

Sounds Sounds
Speech

Encoding Decoding
Information Hierarchy

More refined and abstract

Wisdom

Knowledge

Information

Data
• Simply matching on words is a very brittle approach.
• One word can have a zillion different semantic meanings
– Consider: Take
– “take a place at the table”
– “take money to the bank”
– “take a picture”
– “take a lot of time”
– “take drugs”
Difference of IR with rest of CS
What is Different about IR from the rest of Computer Science
Most algorithms in computer science have a “right” answer:
Consider the two problems:
– Sort the following ten integers
– Find the higest integer
Now consider:
– Find the document most relevant to “hippos in the zoo”
Measuring Effectiveness
• An algorithm is deemed incorrect if it does not have a “right” answer.
• A heuristic tries to guess something close to the right answer.
Heuristics are measured on “how close” they come to a right answer.
IR techniques are essentially heuristics because we do not know the
right answer.
• So we have to measure how close to the right answer we can come.
DOCUMENT RETRIEVAL
Document Routing
Predetermined queries or User profiles

Document Routing
System

Incoming documents

User 1 User 2 User 3 User 4

Result Set: Relevant Retrieved, Relevant and Retrieved

Relevant • Retrieved

Relevant Retrieved
Precision = Relevant Retrieved
Retrieved
Recall = Relevant Retrieved
Relevant
Precision and Two points of Recall
Answer set in order of
similarity coefficient
1.0 (relevant documents:d5,d2) d1
d2 50% recall
0.8
Precision

d3
0.6 (0.5,0.5) d4
d5
0.4 100% recall

(1.0, 0.4) d6
0.2 d7
d8
0.2 0.4 0.6 0.8 1.0 d9
Recall d10

Precision at 50% recall = 1/2= 50%

Precision at 100% recall = 2/5= 40%

Progress Test Files 1-5 Answer Key A Grammar, Vocabulary, and Pronunciation
100% (13)
Progress Test Files 1-5 Answer Key A Grammar, Vocabulary, and Pronunciation
8 pages
AI - Unit - 4
No ratings yet
AI - Unit - 4
53 pages
EFT Tapping Worksheet
100% (1)
EFT Tapping Worksheet
3 pages
Module 2 Notes
No ratings yet
Module 2 Notes
30 pages
Daily Lesson Log
100% (1)
Daily Lesson Log
5 pages
Information Retrieval
No ratings yet
Information Retrieval
31 pages
Computer Organization and Architecture: UNIT-2
No ratings yet
Computer Organization and Architecture: UNIT-2
29 pages
Provincial Report Card, Grades 9-12: Courses Comments
No ratings yet
Provincial Report Card, Grades 9-12: Courses Comments
4 pages
NLP Module 4
No ratings yet
NLP Module 4
15 pages
Relevance Feedback
No ratings yet
Relevance Feedback
47 pages
How Gestalt Therapy Views Depression
100% (2)
How Gestalt Therapy Views Depression
23 pages
Morphological Image Processing and Segmentation: Unit 3
No ratings yet
Morphological Image Processing and Segmentation: Unit 3
59 pages
Basic Relationships Between Pixels
No ratings yet
Basic Relationships Between Pixels
25 pages
M.Tech CSE Syllabus Notes
No ratings yet
M.Tech CSE Syllabus Notes
32 pages
CS8080 Irt Unit 4 23 24
No ratings yet
CS8080 Irt Unit 4 23 24
36 pages
AI and The Future of Work
No ratings yet
AI and The Future of Work
5 pages
(Current Issues in Linguistic Theory 223) Teresa Fanego (Ed.), Javier Pérez-Guerra (Ed.), María José López-Couso (Ed.) - English Historical Syntax and Morphology_ Selected Papers from 11 ICEHL, Santia
100% (1)
(Current Issues in Linguistic Theory 223) Teresa Fanego (Ed.), Javier Pérez-Guerra (Ed.), María José López-Couso (Ed.) - English Historical Syntax and Morphology_ Selected Papers from 11 ICEHL, Santia
317 pages
Information Retrieval Question Bank
No ratings yet
Information Retrieval Question Bank
3 pages
Introduction Information Retrieval
No ratings yet
Introduction Information Retrieval
73 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
No ratings yet
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
6 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
2MS-seq-ONE FuLL - by Precious Rose
No ratings yet
2MS-seq-ONE FuLL - by Precious Rose
31 pages
BigData Mining and Analytics
No ratings yet
BigData Mining and Analytics
2 pages
Fuzzy Classification Part I
No ratings yet
Fuzzy Classification Part I
47 pages
Social Network Analysis
No ratings yet
Social Network Analysis
2 pages
A Semi-Detailed Lesson Plan in Science 6
No ratings yet
A Semi-Detailed Lesson Plan in Science 6
5 pages
Thesis Final Almohallas
No ratings yet
Thesis Final Almohallas
129 pages
86-102. Development, Validation, and Effectiveness of Module For The Course "Teaching Social Studies in The Intermediate Grades"
No ratings yet
86-102. Development, Validation, and Effectiveness of Module For The Course "Teaching Social Studies in The Intermediate Grades"
17 pages
Year 5 Sow 2021
No ratings yet
Year 5 Sow 2021
39 pages
Adjacency and Connectiivty
No ratings yet
Adjacency and Connectiivty
158 pages
Unit-1-Important Questions
No ratings yet
Unit-1-Important Questions
2 pages
Dsa (18CS32)
100% (1)
Dsa (18CS32)
160 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
No ratings yet
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
240 pages
CS6007 Information Retrieval
No ratings yet
CS6007 Information Retrieval
8 pages
Dissertation Gretta KN PDF
No ratings yet
Dissertation Gretta KN PDF
131 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
M.E. Bda 2021
No ratings yet
M.E. Bda 2021
64 pages
Cp5094 IRT University Question
75% (8)
Cp5094 IRT University Question
3 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
Tsa Iat 12 Text and Speech Analysis
No ratings yet
Tsa Iat 12 Text and Speech Analysis
5 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Group 9 Final Manuscript
No ratings yet
Group 9 Final Manuscript
24 pages
Discussion Points PISA
No ratings yet
Discussion Points PISA
17 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
Syllabus Information Retrieval Techniques
No ratings yet
Syllabus Information Retrieval Techniques
2 pages
Attachment 1
No ratings yet
Attachment 1
14 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Demonstration in Teaching Take-Off/ Motivation
No ratings yet
Demonstration in Teaching Take-Off/ Motivation
2 pages
Irs PPT Unit Ii
No ratings yet
Irs PPT Unit Ii
19 pages
The Impact Technology Has Had On High School Education Over The Years
No ratings yet
The Impact Technology Has Had On High School Education Over The Years
11 pages
Tsa Ut III Tsa Notes
No ratings yet
Tsa Ut III Tsa Notes
30 pages
Convolution in 1D and 2D
No ratings yet
Convolution in 1D and 2D
18 pages
Sample Report 22-23 1
No ratings yet
Sample Report 22-23 1
30 pages
CS8792 CNS Unit 1 - R1
No ratings yet
CS8792 CNS Unit 1 - R1
89 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
Python 15CS664 QuestionBank FINAL
No ratings yet
Python 15CS664 QuestionBank FINAL
5 pages
Bai601 NLP
No ratings yet
Bai601 NLP
5 pages
Histogram Specification Simple
No ratings yet
Histogram Specification Simple
9 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
7TH Months Old - Psycholinguistics
No ratings yet
7TH Months Old - Psycholinguistics
5 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
Information Retrieval Systems (A70533)
No ratings yet
Information Retrieval Systems (A70533)
11 pages
IRS Unit-3
No ratings yet
IRS Unit-3
30 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Information Retrieval - Question Bank
No ratings yet
Information Retrieval - Question Bank
3 pages
2 Components of ETA
No ratings yet
2 Components of ETA
3 pages
Cp5151 Advanced Data Structures and Algorithims
No ratings yet
Cp5151 Advanced Data Structures and Algorithims
3 pages
Unit Ii Modeling
No ratings yet
Unit Ii Modeling
15 pages
The Role of Emotional Intelligence in Student Success
No ratings yet
The Role of Emotional Intelligence in Student Success
2 pages
AI Unit 1.
No ratings yet
AI Unit 1.
15 pages
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
8 pages
CANDIDATE-ELIMINATION Learning Algorithm
0% (1)
CANDIDATE-ELIMINATION Learning Algorithm
3 pages
Fluency Week Lesson Plan
No ratings yet
Fluency Week Lesson Plan
2 pages
Sp09midterm Revised
No ratings yet
Sp09midterm Revised
6 pages
Information Retrieval 1
100% (2)
Information Retrieval 1
12 pages
Vocabulary Notebook 1
No ratings yet
Vocabulary Notebook 1
2 pages
Giao Trinh Linguistics 1 - Unit 1
100% (1)
Giao Trinh Linguistics 1 - Unit 1
7 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Lesson Plan in Mathematics Grade Four at The End of The Lesson, The Students Should Be Able To
No ratings yet
Lesson Plan in Mathematics Grade Four at The End of The Lesson, The Students Should Be Able To
2 pages
Assignment Donny Sekai HBET1203
0% (1)
Assignment Donny Sekai HBET1203
16 pages
NOSQL
No ratings yet
NOSQL
16 pages
The Power of An Entrepreneurial Mindset - Bill Roche - Tedxlangleyed
100% (1)
The Power of An Entrepreneurial Mindset - Bill Roche - Tedxlangleyed
2 pages
EFRELYN REACTION PAPER Final1
No ratings yet
EFRELYN REACTION PAPER Final1
1 page
Introduction To AI & ML QUESTION BANK MODULEWISE
No ratings yet
Introduction To AI & ML QUESTION BANK MODULEWISE
3 pages
Mathematics Is Everything
No ratings yet
Mathematics Is Everything
1 page
Syllabus
No ratings yet
Syllabus
9 pages
DEMO DLP bUSINESS fINANCE 1
100% (2)
DEMO DLP bUSINESS fINANCE 1
4 pages
CS3492 Database Management Systems Apr May 2024 Question Paper Download
No ratings yet
CS3492 Database Management Systems Apr May 2024 Question Paper Download
2 pages
DBMS Module4 QuestionBank
No ratings yet
DBMS Module4 QuestionBank
2 pages
Irs Important Questions
0% (1)
Irs Important Questions
3 pages

Part I IR VTU M Tech SSE

Uploaded by

Part I IR VTU M Tech SSE

Uploaded by

Information Retrieval

M Tech(Software Engineering) Third Semester Elective

Course Instructor : Dr S.Natarajan

 user interfaces and visualization

 no central editorial board

 many problems though: IR seen as key to

structure Full text Index terms

 Document representation viewed as a

Most have been implemented as of 2005

 Possibility of information processing

 user inserting 5000 pages per day into a

 The beginning of information industry

 BRS(Bibliographic Retrieval Service)

user need 4, 10 Text

• Relevance is a subjective judgment and may

• Simplest notion of relevance is that the

• May not retrieve relevant documents that

• We will cover the basics of keyword-based

• Taking into account the meaning of the

• Application of IR to HTML documents on

Web Spider Document

• Automated document categorization

• Focused on structured data stored in

• Focused on the human user aspects of

• Focused on the representation of knowledge,

• Focused on the syntactic, semantic, and

• Focused on the development of

Interface Retrieval algorithms

Interaction IR System Contents

User Satisfaction Evaluation

Unstructured (text) vs. structured

Unstructured (text) vs. structured

Robert M. Losee. (1997) A Discipline Independent Definition of Information.

Input Process Output

 Information-generating process do not occur in

More refined and abstract

User 1 User 2 User 3 User 4

Precision at 50% recall = 1/2= 50%

You might also like