0% found this document useful (0 votes)

10 views69 pages

Chapter 1

The document provides an overview of Information Retrieval (IR), including its definitions, processes, and challenges. It discusses the components and functions of Information Retrieval Systems (IRS) and emphasizes the importance of user-centered approaches in effectively retrieving information. Key issues such as organizing and retrieving information, as well as the iterative nature of the IR process, are also highlighted.

Uploaded by

bellhermon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views69 pages

Chapter 1

Uploaded by

bellhermon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Information Retrieval

Chapter 1
Information Storage and
Retrieval (ISR): Basic concepts

1
Sub Topics
 Definition, Foundation, theories and principles
 The (information) retrieval process
 Factors affecting effective retrieval
 Challenges in IR
 Information retrieval system: components,
structures and functions
 Database retrieval Vs. information retrieval

2
Definition, Basic Foundation,
theories and principles

What are the key foundational concepts

regarding IR?

3
Information Retrieval Systems?
Document (Web page)
retrieval in response to a
query
 Quite effective (at some things)
 Commercially successful (some
of them)
But what goes on behind the
scenes?
 How do they work? Web search systems
 What happens beyond the Web? • Lycos, Excite, Yahoo, Google,
Live, Northern Light, Teoma,
HotBot, Baidu, …
4
Web Search System

Web Spider
Document
corpus

Query IR
String System

1. Page1
2. Page2
3. Page3 Ranked
. Documents
.
5
Information Retrieval - Definition
 Is an Important sub-discipline of Information
Science/Computer Sciences that is concerned with
developing theories and methods of access to
information

 Focusis on helping user find information that

matches their information need (User Centered View)

 Is a branch of applied Computer Science that focus on

representation, storage, organization of, and access to
information items (System Centered View).
6
Cont…
 A good formal definition of information retrieval
is given in Baeze-Yates & Riberio-Neto (1990p1)

“Information retrieval deals with representation,

storage, organization of, and access to
information items. The organization and access
of information items should provide the user with
easy access to the information in which he is
interested”

7
Cont…
 The definition incorporates all important features
of a good information retrieval system
 Representation
 Storage
 Organization
 Access
 Evaluation
 Documents Information items: usually text, but
possibly also image, audio, video, etc.

8
IR from different perspectives

 Conceptually,
 IR is used to cover all related problems in finding
needed information
 Historically,
 information retrieval is about document retrieval,
emphasizing documents as a basic units
 Technically,
 information retrieval refers to (text) string
manipulation, indexing, matching, querying, etc.

9
Information Retrieval
 Can be structured for ease of discussion as
 Text IR
 Discussesthe classic problem of searching a
collection of documents for useful information
 Focuses
is on document images that are
predominantly text (rather than pictures)
 These
are called textual images and are
amenable to automatic extraction of key words

10
Cont…
 Multimedia IR
 Discusseshow to index document images and
other binary data by extracting features from
their content and how to search them efficiently
 Human computer interaction (HIC) for IR
 Discussescurrent trends in IR towards improved
user interface and better data visualization tools
 Application of IR
 Covers
modern applications of IR (such as the
Web, bibliographic systems, and digital libraries)

11
Entities in IRS
 Two important entities
 Information need: to be represented by search
statements (query)
 Information items (documents): to be represented
by index terms or any form of representation like
summary
 Thus the process in IRS is matching this abstractions

12
Key Issues IR
 Organizing
 How to describe information resources or
information-bearing objects in ways so that they
may be effectively used by those who need to use
them
 Retrieving
 How to find the appropriate information resources
or information-bearing objects for someone‟s (or
your own) needs.Build a system that retrieves
documents that users are likely to find relevant to
their queries
 This set of assumption underlies the field of IR

13
IR is an Iterative Process – Basic theory
Creation

Active
Authoring
Modifying
Using Organizing
Creating Indexing

Retention/
Mining Accessing Storing
Filtering Retrieval
Semi-Active
Discard
Distribution
Disposition Networking
Searching
Utilization Inactive 14
Implementation

 Thus in order to meet the above key issues the

implementation is developing an Information System
 Retrieval system

 IR deals with very large sets of documents

 High amount of robustness, efficiency
 Domain-independent & multi-linguality

 IR usually deals with NL text which is not always well

structured and could be semantically ambiguous

15
Cont…
• IR considers NL text mainly from a lexical view
 Identifying possible word forms
 Elimination of stop words (e.g the, of zu, ...)
 Stemming (e.g., supporting, supported support)
 Selection of index terms
 Term weighting

16
The Retrieval Process

What does the basic retrieval

process looks like?

17
Cont…

User
Interface queries
spider of the
Index Search
engine

Web pages

18
The Retrieval Process 


Web search engine
Web browser

Text
User
Interface

user need Text

Text Operations

logical view
logical view
Query DB Manager
Indexing
user feedback Operations Module

query inverted file

Searching Index

retrieved docs
Text
Database
Ranking
ranked docs
19
Factors Affecting Effective Retrival

The effective retrival of relevant information is

directly affected by two things
 The User Task
 The logical view of the documents adopted by
the retrival system

20
The User Task

Retrieval

Database

Browsing/ surfing

21
Cont…
 The user task: The user task might be one of rtetrival or
browsing
 Retrieval
 information or data
 Information need (retrieval goal) is focused and
crystalized, Purposeful, Often user is sophesticated
 Browsing/ surfing
 Information need (retrival goal) is vague and impresise
 Glancing around, Often user is naive
 Both are initiated by the user

22
Logical view of documents

 The logical view of documents

 Full text
 Any point in between full text and index terms
 Set of index terms

23
Document Processing Steps

From “Modern IR” textbook

Cont..

 Documents in a collection are frequently represented

through a set of index terms or keywords
 An index term is a key word (or group of related
words) which has some meaning of its own (which
usually has the semantics of a noun)
 In its more general form, an index term is simply
any word which appears in the text of a document
collection
 it is simply a word whose semantic helps in
remembering the document‟s main theme
 How to generate index terms? (next chapter)

25
Cont…
 Key words might be extracted directly from the
text of the document or
 Keywords might be specified by a human expert
(this is frequently done in the information
science arena)
 No matter whether these representative
keywords are derived automatically or generated
by a specialist, they provide a logical view of a
document (concise logical view)

26
Cont...
 Modern computers make possible to represent a
document by its full set of words
 In this case, we say that the retrieval system
adopts a full text logical view (or representation)
of the documents
 With very large collections, however, modern
computers might have to reduce the set of
representative keywords
 This can be accomplished through the following
standard steps

27
Cont...
 Standard steps
 Recognizing document structures (titles, sections,
paragraphs, etc.)
 Break into tokens
 Usually space and punctuation delimited
 Special issues with some languages
 The elimination of stopwords (such as articles
and connectives)

28
Cont…
 Conflation: The use of stemming/ morphological
analysis
 Purpose: Overcome the variants of word forms by
reducing all words with the same root, i.e., (which
reduces distinct words to their common grammatical
root)
 Most IR systems perform stemming on both text and
query
 The identification of noun groups (which eliminates
adjectives, adverbs, and verbs)
 Other further operation can also be performed
 Store in inverted index
29
Cont…
 Such text operations reduce the complexity of the
document representation and allow moving the
logical view from that of a full text to that of indexed
terms

 Index - A list of important key words from the

documents

 The full text is the most complete logical view of a

document, But its usage usually implies higher
computational costs

30
Cont...
 Given a set of index terms for a document, we
notice that not all the terms are equally useful for
describing the document contents
 There are index terms that are simply vague than
the others
 Deciding on the importance of a term for
summarizing the contents of a document is not a
trivial issue
 Despite this difficulty, there are properties of an
index term

31
Cont…
 Examples of such properties
 A word which appears in each of the one hundred
thousand documents is completely useless as an
index term because it does not tell us anything
about which documents the user night be
interested in
 A word which appears in just five documents is
quite useful because it narrows down considerably
the space of documents which might be of
interest to the user
 Thus, distinct index terms have varying relevance
when used to describe document contents
 This effect is captured through the assignment of
numerical weights to each of the index term of a
document
32
Challenges in IR
Why is IR a Difficult Problem?

33
Why is IR a Difficult Problem?

 The size of the web is doubling every year:

 50 million pages in November 1995, 320 million
pages in December 1997, 800 million pages in
February 1999, 1 billion pages in 2000, and
growing every day
 Huge amount of data (e.g., WWW) dictates
efficiency, effectiveness and user-friendliness
 Thus :Any IR system needs the capability of large
scale data processing. Use of indexes and various
representations are required

34
Cont…
 Unstructured data: difficult to capture
semantics in documents. Compare:
 “select * from Employee where Salary > 100,000”
 “retrieve all news items about corporate
takeover”
 Why is the second query more difficult to answer?
The following query is even more difficult:
 “retrieve all news items about corporate
takeover involving an internet company”

35
Cont…
 Documents have unrestricted domains
 itis hard to predefine or pre-categorize the subject
domains of documents
a particular subject is related to several major
topics including linguistics, psychology, Cybernetics,
Communications, Information System design,
Engineering & Technology, Networking, Computer
Science, Mathematics, Economics, Management
Science, education …

36
Cont…

 Diversified user base: expert to casual users

 The users of information retrieval systems include
 Research scientists (that seek articles related to
particular experiments)
 Engineers (who try to determine W/r a patent is
covering some new idea has previously been
obtained)
 Attorney( who search for legal presidents)
 Buyers in general (who try to obtain new product
information)

37
Cont…
 Information retrieval users
 Have a wide variety of different information needs
(Interest), Exhibit many different backgrounds
 May be led by many different reasons to use the retrieval
facilities
 As a result, they require a variety of services and end
products
 In other words, a system may be clumsy for an expert
user but difficult to use for a casual user
 a system may return information too general to be
useful for an expert in the subject but too narrow for a
general user

38
Cont…
 Distributed and interlinked (e.g., Hypertext and
WWW)
 Where to start a search? Unlike in a centralize
database, you have only one (or a few)
database's) to search.
 How are the information related?

 Efficiency vs. effectiveness.

 With a limited amount of resources, one can only
improve efficiency and effectiveness to a certain
degree. Moreover, improving efficiency often
means degrading effectiveness, and vice versa.
39
Information Retrieval System:
components, structures and
functions
How do we characterize IRS?

40
What is a system?
 Is a set of interrelated components interacting together
to achieve an objective.
 Has basic characteristics like:
 Input,output, environment, boundary, objectives,
components, interaction, interface
 Can be living or non-living
 What is “systems thinking”?
 Do you agree with this? “A system is bigger than the
sum of its components”

41
Systems thinking
 Is a mind set or way of thinking to view the world
(every thing in the world) as a system.
 It emphasizes on interaction that keeps the system
alive.
 Benefits
 Identification of a system leads to abstraction
 From abstraction you can think about essential
characteristics of specific system
 Abstraction allows analyst to gain insights into
specific system, to question assumptions, provide
documentation and manipulate the system without
disrupting the real situation

42
Cont..
 Different types of Information systems
 IRS

 DBMS

 MIS

 DSS

 ESS

43
IRS
 Is a system that is capable of storage, retrieval,
and maintenance of information items
 The processes of an IR system is to match two
abstractions
 Index terms/Key words abstracted from
information items
 Queries abstracted from user‟s information needs

Need [ ] Docs
matching the two sides

44
Cont…
 The purpose of an IRS is to capture wanted
items (information ) and to filter out unwanted
information

 Present results in format that helps user

determine relevant items
 Arbitrary(physical) order
 Relevance order

45
Basic functions of an IRS
 Analysis of doc. and organization of
information (creation of document database)
 Analysis of users preparation of a strategy to
search the database
 Actual searching or matching of users queries
with data base
 Retrieval of items that fully or partially match
the search statement

46
A crawler: Basics of crawlers
 Definition:
A Web crawler is a computer program that browses the
World Wide Web in a methodical, automated manner.
 Utilities:
 Gather pages from the Web.
 Support a search engine, perform data mining and so on.
 Object:
 Text, video, image and so on.
 Link structure.

(section B) 47
Q: How does a search
engine know that all
these pages contain
the query terms?
A: Because all of
those pages have
been crawled

48
Many names
 Crawler
 Spider
 Robot (or bot)
 Web agent
 Wanderer, worm, …
 And famous instances: googlebot, scooter, slurp,
msnbot, …

49
starting
pages
(seeds)

Crawler:
basic
idea

50
Features of a crawler
 Must provide:
 Robustness: spider traps
 Infinitely
deep directory structures:
http://foo.com/bar/foo/bar/foo/...
 Pages filled a large number of characters.
 Politeness: which pages can be crawled, and which
cannot
 robots exclusion protocol: robots.txt
 http://blog.sohu.com/robots.txt

51
Motivation for crawlers
 Support
 universal search engines (Google, Yahoo,
MSN/Windows Live, Ask, etc.)
 Vertical (specialized) search engines, e.g. news,
shopping, papers, recipes, reviews, etc.
 Business intelligence: keep track of potential
competitors, partners
 Monitor Web sites of interest
 Evil: harvest emails for spamming, phishing…
 … Can you think of some others?…

52
A crawler based search engine
Web Page repository

googlebot

Text & link

Query analysis

hits

Text index PageRank

Ranker

53
Two most widely used search
designs Graph traversal
(BFS or DFS?)
 Breadth First Search
 Implemented with QUEUE (FIFO)
 Finds pages along shortest paths
 If we start with “good” pages, this
keeps us close; maybe other good
stuff…
 Depth First Search
 Implemented with STACK (LIFO)
 Wander away (“lost in cyberspace”)

54
Implementation issues
 Don‟t want to fetch same page twice!
 Keep lookup table (hash) of visited pages
 The frontier grows very fast!
 May need to prioritize for large crawls
 Fetcher must be robust!
 Don‟t crash if download fails
 Timeout mechanism
 Determine file type to skip unwanted files
 Can try using extensions, but not reliable
 Can issue „HEAD‟ HTTP commands to get Content-Type
headers, but overhead of extra Internet requests

55
More implementation issues

 Fetching
 Get only the first 10-100 KB per page
 Take care to detect and break redirection
loops
 Soft fail for timeout, server not
responding, file not found, and other
errors

56
Two basic subsystems of an IR system
 Next to crawlers

 The two subsystems of an IR system:

 Searching: is an online process of finding relevant
documents in the index list that matches users query

 Indexing: is an offline process of organizing

documents using keywords extracted from the
collection
Indexing is used to speed up access to desired
information from document collection as per users
query
57
Cont…
 Indexing and searching: are inexorably connected
 You cannot search what was not first indexed in some
manner or other
 Indexing of documents is done in order to be
searchable
there are many ways to do indexing
 to index one needs to select an indexing approaches
there are many indexing languages, including
inverted file, sequential file, suffix tree, signature
file, etc..
even taking every word in a document is an
indexing language/approach
 Knowing searching is knowing indexing

58
Indexing Subsystem

documents
Documents Assign document identifier

text document
Tokenize
IDs
tokens Stop list
non-stoplist Stemming & Normalize
tokens
stemmed Term weighting
terms
terms with
weights Index
59
Searching Subsystem

query parse query

query tokens
ranked non-stoplist
document Stop list
tokens
set
ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index
60
Structure of an IR System
Search Storage
Interest profiles Documents Line
Line & Queries Information Storage and Retrieval System & data

T T
r Rules of the game = r
Rules for subject indexing +
a Thesaurus (which consists of a
Formulating query in Indexing
n terms of
Lead-In
(Descriptive and n
descriptors Subject)
s Vocabulary s
and
l Indexing l
a Storage of
Language
Storage of
a
t
profiles
Documents t
i i
o o
n Store1: Profiles/ Comparison/ Store2: Document n
Search requests Matching representations

Ranking
Adapted from Soergel, p. 19
Potentially
Relevant 61
Documents
Database Systems Vs Information
Retrieval Systems

Are they the same or not? Is there any

Overlap?

(section C) 62
DBMS vs IRS

 IRS is one of the different types of information

systems
 But it does have considerable similarity than
difference with DBMS
 Accordingly it will be logical to compare and
contrast these two systems

63
Cont…
 On the Information/data
 DBMS: structured data (often homogeneous records),
semantic unambiguity
 IR systems: unstructured (free text), ambiguity
 On the answers/results
 DBMS:
 Records (tuples) , Perfect precision and recall, each
item is relevant (no ranking) , Well defined results
 IR systems
 Documents, Imperfect precision and recall, each
item has specific relevance (ranking), fuzzy results

64
Cont…

 On their relationship
 Systems complement each other
 On their history
 DB grew out of files and traditional business
system
 IRgrew out of library science and need to
categorize/group/access books/articles

65
Cont…

Data retrieval Information retrieval

 Content Data Information

 Data object Table Document
 Matching Exact match Partial match, Best match
 Items wanted Matching Relevant
 Query language SQL (artificial) Natural
 Query specification Complete Incomplete
 Organization Highly structured less structured
 Classification Monothetic Polythetic

66
Cont…

 Data retrieval
 records contain a set of keywords
 Well defined semantics
a single erroneous object implies failure!
 Information retrieval
 information about a subject or topic
 semantics is frequently loose
 small errors are tolerated

67
Cont…
 IR system:
 interpret contents of information items
 generate a ranking which reflects relevance
 notion of relevance is most important
 Information retrieval is much more difficult than data
retrieval

68
Thank you

ch1 - Information Retrieval Systems
No ratings yet
ch1 - Information Retrieval Systems
52 pages
ISR U 1&2 Tech-Knowledge
No ratings yet
ISR U 1&2 Tech-Knowledge
68 pages
MSC IR 2021
100% (1)
MSC IR 2021
188 pages
Thesis Summary
No ratings yet
Thesis Summary
117 pages
Chapter 1 Ir
No ratings yet
Chapter 1 Ir
37 pages
Ch2 - IR and LT
No ratings yet
Ch2 - IR and LT
45 pages
RetrivalChapter One
No ratings yet
RetrivalChapter One
30 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
Module 5 - Information Retrieval and Lexical Resources
0% (1)
Module 5 - Information Retrieval and Lexical Resources
80 pages
22103071-Assignment - Ii
No ratings yet
22103071-Assignment - Ii
7 pages
Information Retrieval
No ratings yet
Information Retrieval
21 pages
ISR Chap..1
No ratings yet
ISR Chap..1
27 pages
Chapter One IR
No ratings yet
Chapter One IR
18 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
34 pages
Intelligent
No ratings yet
Intelligent
20 pages
Module 1print
No ratings yet
Module 1print
5 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
59 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
IR Introduction
100% (1)
IR Introduction
6 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
IR Textbook
No ratings yet
IR Textbook
167 pages
Ch1 IR
No ratings yet
Ch1 IR
39 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
100% (1)
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
35 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
48 pages
Information Retrieval Detailed Lecture Nov 2023
No ratings yet
Information Retrieval Detailed Lecture Nov 2023
39 pages
IRS B Tech CSE Part 1
No ratings yet
IRS B Tech CSE Part 1
161 pages
1 introIR
No ratings yet
1 introIR
22 pages
IR Chapter 1
No ratings yet
IR Chapter 1
32 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
IR Chapter 1 & 2
No ratings yet
IR Chapter 1 & 2
114 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
Information Retrieval: Dr. Bassel ALKHATIB
No ratings yet
Information Retrieval: Dr. Bassel ALKHATIB
55 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
45 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
IR First Chapter
No ratings yet
IR First Chapter
32 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
Placement Management System
100% (2)
Placement Management System
47 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
Introduction To IR 2021
No ratings yet
Introduction To IR 2021
40 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
IRS Notes
No ratings yet
IRS Notes
10 pages
IR Chapter 1&2
No ratings yet
IR Chapter 1&2
88 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
BW UPGRADe PDF
No ratings yet
BW UPGRADe PDF
23 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Introduction Information Retrieval
No ratings yet
Introduction Information Retrieval
73 pages
Chapter One ISR
No ratings yet
Chapter One ISR
25 pages
MVB Ug d000656-003145
No ratings yet
MVB Ug d000656-003145
109 pages
OSS Information Gateway 2016 Issue 02 (U2000 Poster U2000 Overview V200R016C10)
No ratings yet
OSS Information Gateway 2016 Issue 02 (U2000 Poster U2000 Overview V200R016C10)
4 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
Crop Yield Prediction Using Machine Learning Algorithms
100% (2)
Crop Yield Prediction Using Machine Learning Algorithms
51 pages
Microsoft Office Interview Questions and Answers PDF
100% (1)
Microsoft Office Interview Questions and Answers PDF
15 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
White Book Software Architecture PDF
No ratings yet
White Book Software Architecture PDF
43 pages
XML Functions For Template
No ratings yet
XML Functions For Template
6 pages
VIG Node Replacement
No ratings yet
VIG Node Replacement
8 pages
Tp32 SM
No ratings yet
Tp32 SM
36 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
NodeJS Cheat Sheet - OverAPI
No ratings yet
NodeJS Cheat Sheet - OverAPI
3 pages
DEC 4000 Brochure AP
100% (1)
DEC 4000 Brochure AP
8 pages
Add Games Bios Files To Batocera
No ratings yet
Add Games Bios Files To Batocera
25 pages
Cat Gr11 Theory June2018
No ratings yet
Cat Gr11 Theory June2018
12 pages
CDR
No ratings yet
CDR
22 pages
Requirement and Specifications: Chapter - 3
No ratings yet
Requirement and Specifications: Chapter - 3
17 pages
Composition Aggregation UML Class Diagram For Composition and Aggregation
No ratings yet
Composition Aggregation UML Class Diagram For Composition and Aggregation
25 pages
Presentation 1
No ratings yet
Presentation 1
12 pages
How To Schedule MMPV As A Batch Job To Open Close MM Periods
100% (18)
How To Schedule MMPV As A Batch Job To Open Close MM Periods
8 pages
A Novel Three-Factor Authentication Protocol For Wireless Sensor Networks With IoT Notion
No ratings yet
A Novel Three-Factor Authentication Protocol For Wireless Sensor Networks With IoT Notion
10 pages
Product Description: E5576-325 Mobile Wifi V100R001
No ratings yet
Product Description: E5576-325 Mobile Wifi V100R001
21 pages
472 Assignment 6
No ratings yet
472 Assignment 6
5 pages
1-SYSC5602 2introduction
No ratings yet
1-SYSC5602 2introduction
20 pages
C If Statement
No ratings yet
C If Statement
6 pages
Homework Answer-1
No ratings yet
Homework Answer-1
3 pages
Omega OM-CP-RTDTEMP2000
No ratings yet
Omega OM-CP-RTDTEMP2000
5 pages
ACW Manual v3
No ratings yet
ACW Manual v3
2 pages
User Persona
No ratings yet
User Persona
1 page
Quotation For Office Wiring Rehabilitation
No ratings yet
Quotation For Office Wiring Rehabilitation
1 page
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Information Retrieval

What are the key foundational concepts

 Focusis on helping user find information that

 Is a branch of applied Computer Science that focus on

“Information retrieval deals with representation,

 Thus in order to meet the above key issues the

 IR deals with very large sets of documents

 IR usually deals with NL text which is not always well

What does the basic retrieval

user need Text

query inverted file

The effective retrival of relevant information is

 The logical view of documents

From “Modern IR” textbook

 Documents in a collection are frequently represented

 Index - A list of important key words from the

 The full text is the most complete logical view of a

 The size of the web is doubling every year:

 Diversified user base: expert to casual users

 Efficiency vs. effectiveness.

 Present results in format that helps user

Text & link

Text index PageRank

 The two subsystems of an IR system:

 Indexing: is an offline process of organizing

query parse query

Are they the same or not? Is there any

 IRS is one of the different types of information

Data retrieval Information retrieval

 Content Data Information

You might also like