0% found this document useful (0 votes)
14 views50 pages

IR Unit-1 - Updated

Uploaded by

Danish Nevrekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views50 pages

IR Unit-1 - Updated

Uploaded by

Danish Nevrekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

WELCOME

Department Optional Course -4


Subject: Information Retrieval.
Course Code: CSDC7023

Mr. S. G. Shaikh
Asst. Professor
Dept. Of Computer Engg,
AIKTC, New Panvel
[email protected]
Cell. +91 9960726716

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Subject: Information Retrieval CSDC7023
Prerequisite: Data structures and algorithms
Course Objectives:
The course aims students :
1 To learn the fundamentals of Information Retrieval
2 To analyze various Information retrieval modeling techniques
3 To understand query processing and its applications
4 To explore the various indexing and scoring techniques
5 To assess the various evaluation methods
6 To analyze various information retrieval for real world application.

Course Outcomes:
Learner will be able to: -
1 Describe and Analyze the concepts, challlenges of the Information retrieval system.
2 Design the various modeling techniques for information retrieval systems.
3 Implements the query structure and various query operations
4 Analyzing the indexing and scoring operation in information retrieval systems
5 Perform the evaluation of information retrieval systems
6 Analyze various information retrieval for real world application

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Unit No-01: Introduction to Information Retrieval

Syllabus:-
Introduction to Information Retrieval, Basic Concepts, Information Versus Data,
Trends and research issues in information retrieval. The retrieval process,
Information retrieval in the library, web and digital libraries.

•Reference Books Used


• T1:Modern information retrieval, Baeza-Yates, R. and Ribeiro-Neto, B., 1999. ACM press
• T1:Introduction to Modern Information Retrieval. G.G. Chowdhury. NealSchuman
• R1: Storage Network Management and Retrieval, VaishaliKhairnar

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval

Image Source: Google Images

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval

 Information retrieval (IR) deals with the representation, storage,


organization of, and access to information items.
 The representation and organization of the information items should
provide the user with easy access to the information in which he is
interested.
 Information retrieval (IR) is a field that has been developing in parallel
with database systems for many years.
 Unlike the field of database systems, which has targeted query and
transaction processing of structured data, information retrieval is
concerned with the organization and retrieval of data from multiple text-
based documents.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval

 Information retrieval (IR) deals with the representation, storage,


organization of, and access to information items.
 The representation and organization of the information items should
provide the user with easy access to the information in which he is
interested.
 Information retrieval (IR) is a field that has been developing in parallel
with database systems for many years.
 Unlike the field of database systems, which has targeted query and
transaction processing of structured data, information retrieval is
concerned with the organization and retrieval of data from multiple text-
based documents.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval
 Since information retrieval and database systems each handle different
kinds of data, some database system problems are usually not present in
information retrieval systems, such as concurrency control, recovery,
transaction management, and update.
 There are some common information retrieval problems that are usually not
encountered in traditional database systems, such as unstructured
documents, approximate search based on keywords, and the notion of
relevance.
 Because of the abundance of text data, information retrieval has discovered
several applications.
 There exist several information retrieval systems, including online library
catalog systems, online records management systems, and the more
currently developed Web search engines.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval
 Since information retrieval and database systems each handle different
kinds of data, some database system problems are usually not present in
information retrieval systems, such as concurrency control, recovery,
transaction management, and update.
 There are some common information retrieval problems that are usually not
encountered in traditional database systems, such as unstructured
documents, approximate search based on keywords, and the notion of
relevance.
 Because of the abundance of text data, information retrieval has discovered
several applications.
 There exist several information retrieval systems, including online library
catalog systems, online records management systems, and the more
currently developed Web search engines.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval
 A general data retrieval problem is to locate relevant documents in a
document set depending on a user’s query, which is often some keywords
defining an information need, although it can also be an example of relevant
records.
 This is most suitable when a user has some ad hoc (i.e., short-term) data
need, including finding data to buy a used car. When a user has a long-term
data need (e.g., a researcher’s interests), a retrieval system can also take
the initiative to “push” any newly arrived data elements to a user if the
element is judged as being relevant to the user’s data need.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval
There are two basic measures for assessing the quality of text retrieval which
are as follows −

Precision − This is the percentage of retrieved data that are actually relevant
to the query (i.e., “correct” responses). It is formally represented as
precision=|{Relevant}∩{Retrieved}||{Retrieved}|

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval

Recall − This is the percentage of records that are relevant to the query and
were actually retrieved. It is formally represented as
recall=|{Relevant}∩{Retrieved}||{Relevant}|

An information retrieval system is often required to trade-off recall for


precision or vice versa. There is one generally used trade-off is the F-score,
which is represented as the harmonic mean of recall and precision −
F–score=recall×precision(recall+precision)2

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Introduction to Information Retrieval
 An information retrieval system searches a collection of natural language
documents with the goal of retrieving exactly the set of documents that
matches a user’s question.
 They have their origin in library systems.

 These systems assist users in finding the information they require but it does
not attempt to deduce or generate answers.
 It tells about the existence and location of documents that might consist of
the required information that is given to the user.
 The documents that satisfy the user’s requirement are called relevant
documents. If we have a perfect IR system, then it will retrieve only relevant
documents.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Basics of IR Systems

 From the above diagram, it is clear that a user who needs information will have
to formulate a request in the form of a query in natural language. After that, the
IR system will return output by retrieving the relevant output, in the form of
documents, about the required information.
Image Source: Google Images
Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Basics of IR Systems

The step by step procedure of these systems are as follows:


 Indexing the collection of documents.
 Transforming the query in the same way as the document content is
represented.
 Comparing the description of each document with that of the query.
 Listing the results in order of relevancy.

Retrieval Systems consist of mainly two processes:


 Indexing
 Matching

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Basics of IR Systems
Indexing
It is the process of selecting terms to represent a text.

Indexing involves:
Tokenization of string
Removing frequent words
Stemming

Two common Indexing Techniques:

Boolean Model
Vector space model

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Basics of IR Systems
Matching

It is the process of finding a measure of similarity between two text


representations.

The relevance of a document is computed based on the following parameters:

1. TF: It stands for Term Frequency which is simply the number of times a given
term appears in that document.

TF (i, j) = (count of ith term in jth document)/(total terms in jth document)

2. IDF: It stands for Inverse Document Frequency which is a measure of the


general importance of the term.

IDF (i) = (total no. of documents)/(no. of documents containing ith term)

3. TF-IDF Score (i, j) = TF * IDF

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Basics of IR Systems
The effective retrieval of relevant information is directly affected both by
the user task and by the logical view of the documents adopted by the
retrieval system.

The User Task

 The user of a retrieval system has to translate his information need into a
query in the language provided by the system.
 With an information retrieval system, this normally implies specifying a set
of words which convey the semantics of the information need.
 With a data retrieval system, a query expression (such as, for instance,
a regular expression) is used to convey the constraints that must be
satisfied by objects in the answer set.
 In both cases, we say that the user searches for useful information
executing a retrieval task.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The User Task
 Consider now a user who has an interest which is either poorly defined or
which is inherently broad.
 For instance, the user might be interested in documents about car racing in
general. In this situation, the user might use an interactive interface to simply look
around in the collection for documents related to car racing.
 For instance, he might find interesting documents about Formula 1 racing, about
car manufacturers, or about the `24 Hours of Le Mans.' Furthermore, while reading
about the `24 Hours of Le Mans', he might turn his attention to a document which
provides directions to Le Mans and, from there, to documents which cover tourism
in France. In this situation, we say that the user is browsing the documents in the
collection, not searching.
 It is still a process of retrieving information, but one whose main objectives are not
clearly defined in the beginning and whose purpose might change during the
interaction with the system.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The User Task

Figure: Interaction of the user with the retrieval system through distinct tasks.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Logical View of the Documents
2. Logical View of the Documents
 Due to historical reasons, documents in a collection are frequently represented
through a set of index terms or keywords. Such keywords might be extracted directly from
the text of the document or might be specified by a human subject (as frequently done in the
information sciences arena).
 No matter whether these representative keywords are derived automatically or generated by a
specialist, they provide a logical view of the document.
 Modern computers are making it possible to represent a document by its full set of words. In
this case, we say that the retrieval system adopts a full text logical view (or representation) of
the documents.
 With very large collections, however, even modern computers might have to reduce the set of
representative keywords.
 This can be accomplished through the elimination of stopwords (such as articles and
connectives), the use of stemming (which reduces distinct words to their common
grammatical root), and the identification of noun groups (which eliminates adjectives,
adverbs, and verbs).
Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Logical View of the Documents
2. Logical View of the Documents

Figure: Logical view of a document: from full text to a set of index terms.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Difference of Information Retrieval and Data Retrieval

<

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Issues in Information Retrieval
Indexing is the most vital part of any Information
Retrieval System.
It is a process in which the documents required by
the users are transformed into searchable data
structures.
 Indexing can be also referred to as the process of
extraction rather than analysis of particular content.
It creates a core functionality of the IR process since
it is the first step in IR and assists in efficient
information retrieval.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Issues in Information Retrieval
In the process, first, the document surrogates are
created to represent each document.
Secondly, it requires analysis of original documents
that include simple (identifying meta-information
e.g., author, title, subject etc.) and complex (linguistic
analysis of content) data.
 Indexes are the data structures that are used to
make the search faster.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Issues in Information Retrieval

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Issues in Information Retrieval
1 Document and Query Indexing –

Main goal of Document and Query Indexing is to


find important meanings and creating an internal
representation.
The factors to be considered are accuracy to
represent semantics, exhaustiveness, and facility for
a computer to manipulate.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Issues in Information Retrieval
2. Query Evaluation –
In the retrieval model how can a document be represented
with the selected keywords and how are documents and
query representations compared to calculate a score.
Information Retrieval (IR) deals with issues like uncertainty
and vagueness in information systems.
 Uncertainty :
The available representation does not typically reflect true
semantics of objects such as images, videos etc.
 Vagueness :
The information that the user requires lacks clarity, is only
vaguely expressed in a query, feedback or user action.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

In this section we review a few concepts that are being considered in more recent research
work in information retrieval.
1. Faceted Search
Faceted Search is a technique that allows for integrated search and navigation experience
by allowing users to explore by filtering available information. This search technique is
used often in ecommerce Websites and applications enabling users to navigate a multi-
dimensional information space. Facets are generally used for handling three or more
dimensions of classification. This allows the faceted classification scheme to classify an
object in various ways based on different taxonomical criteria. For example, a Web page
may be classified in various ways: by content (air-lines, music, news, ...); by use (sales,
information, registration, ...); by location; by language used (HTML, XML, ...) and in
other ways or facets. Hence, the object can be classified in multiple ways based on
multiple taxonomies.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

A facet defines properties or characteristics of a class of objects.


The properties should be mutually exclusive and exhaustive. For example, a
collection of art objects might be classified using an artist facet (name of
artist), an era facet (when the art was created), a type facet (painting,
sculpture, mural, ...), a country of origin facet, a media facet (oil, watercolor,
stone, metal, mixed media, ...), a collection facet (where the art resides), and
so on.
Faceted search uses faceted classification that enables a user to navigate
information along multiple paths corresponding to different orderings of the
facets. This contrasts with traditional taxonomies in which the hierarchy of
categories is fixed and unchanging. University of California, Berkeley’s Flamenco
project is one of the earlier examples of a faceted search system.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

2. Social Search
The traditional view of Web navigation and browsing assumes that a single
user is searching for information. This view contrasts with previous research
by library scientists who studied users’ information seeking habits. This research
demonstrated that additional individuals may be valuable information resources
during information search by a single user. More recently, research indicates
that there is often direct user cooperation during Web-based information
search. Some studies report that significant segments of the user population
are engaged in explicit collaboration on joint search tasks on the Web.
Active collaboration by multiple parties also occur in certain cases (for example,
enterprise settings); at other times, and perhaps for a majority of searches, users
often interact with others remotely, asynchronously, and even involuntarily and
implicitly.
Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

Socially enabled online information search (social search) is a new


phenomenon facilitated by recent Web technologies.
Collaborative social search involves different ways for active involvement in
search-related activities such as co-located search, remote collaboration on
search tasks, use of social network for search, use of expertise networks,
involving social data mining or collective intelligence to improve the search
process and even social interactions to facilitate information seeking and sense
making. This social search activity may be done synchronously, asynchronously,
co-located or in remote shared workspaces. Social psychologists have
experimentally validated that the act of social discussions has facilitated
cognitive performance.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

Social psychologists have experimentally validated that the act of social


discussions has facilitated cognitive performance. People in social groups can
provide solutions (answers to questions), pointers to databases or to other people
(meta-knowledge), validation and legitimization of ideas, and can serve as
memory aids and help with problem reformulation. Guided participation is a
process in which people co-construct knowledge in concert with peers in
their community. Information seeking is mostly a solitary activity on the Web
today. Some recent work on collaborative search reports several interesting
findings and the potential of this technology for better information access.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Trends in Information Retrieval

3. Conversational Search
Conversational Search (CS) is an interactive and collaborative information finding
interaction. The participants engage in a conversation and perform a social search
activity that is aided by intelligent agents. The collaborative search activity helps the
agent learn about conversations with interactions and feedback from participants. It uses the
semantic retrieval model with natural language understanding to provide the users with
faster and relevant search results. It moves search from being a solitary activity to being a
more participatory activity for the user. The search agent performs multiple tasks of
finding relevant information and connecting the users together; participants provide
feedback to the agent during the conversations that allows the agent to perform better.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
Artificial Intelligence
 AI focuses on finding a logical, mathematical way to
represent knowledge.
 The computer can be programmed with this mathematical
model to assist in decision making, information retrieval,
and analysis.
 Then, when a query is asked, the computer follows the rules
for a response.
 AI has many facets, including robotics, expert systems, and
voice recognition and simulation. Search engines incorporate
some of the fascinating trends in AI.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
Probabilistic Logic
 Will it rain today? What is the possibility of my car needing an oil change?
Or, what is the chance of getting an A on my history test?.
 There are many questions like these that cannot be answered with an
affirmative or negative answer. Uncertainty reigns.
 In an effort to make a decision which accounted for such doubt, in the
midst of chaos, a branch of logic was defined to study probability.
 Since the 16th and 17th centuries, probability theory has been used to
explain chance. Such questions rely on a factual information as history
coupled with probability.
 In information retrieval, the same applies. By setting up a formula, an
algorithm, that places values on words, their interrelationships, proximity,
and their frequency, the computer can be used to help locate relevant
sites. By computing these terms together, the search engine can produce a
relevancy ranking that is then displayed to the user.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
Query by example
 Query-by-example (QBE) is the concept of providing the search engine
an example for which to Using this example, the system returns other
like documents.
 For example, I want a book about gorillas, published in 1984, that has a
green cover.
I have set up an example of what I am looking for using all my
qualifications. Search engines use the technique to set up queries to find
similar pages or files.
 The search is reinitiated using the example as the new source for the
query. This interactive searching gives the user more control over the
search process.
 Users can find more documents like the one selected. The results returned
are then more focused because of the qualified terms.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
Query Expansion

 Once a search has been completed, it often tends to need to be


enhanced or changed.

 A library patron who comes to the desk asks one question, but usually
there is some other additional information need.

 The purpose of the librarian is to elicit that actual l request.

 The quest of the information scientist is to discover how the computer


can assist in evoking that query and its modifications.

 Newer search engines provide the user with more control over the query,
by adding a means to resubmit the search with any changes.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
Natural Language Processing
 Natural Language Processing is the act and science of getting computers to
understand natural language.
 It is a part of artificial intelligence. (Case.) Computers process language
not only by exact match, using keywords.
 NLP involves using a set of concept to sort out the interrelationships of
words. The computer breaks apart the sentence into its semantic parts:
nouns, verbs, adjectives, etc., and then it creates links.
 Since language can be ambiguous, vague, or metaphorical. NLP seeks to
compute the relationships between words, giving each a correlate to the
words around it.’
 Put into a formula, the computer then makes assumptions based on its
logic. Although similar to a keyword search, the search engine allows a
user to make the query as if asking a librarian.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
New Trends in IR
• Concept-based searching
 Using the idea of a thesaurus, a search engine can expand upon the
keyword that a user may input.
 In this manner, users do not have to know the exact words to use to
retrieve relevant documents.
 And, instead of reinstituting the search based on "confidence" or
"weighting," the search engine automatically includes the like terms.
• Search Engines
 A survey of the Search Engines available from Netscape's Net Search will
help in explaining some of the techniques discussed.
 By conducting a search for current trends in information retrieval,
differences can be seen in the structure and techniques of each engine.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

 The working of Information Retrieval process is explained below


 The Process of Information Retrieval starts when a user creates any query
into the system through some graphical interface provided.
 These user-defined queries are the statements of needed information. for
example, queries fork by users in search engines.
 In IR single query does not match to the right data object instead it
matches with the several collections of data objects from which the most
relevant document is taken into consideration for further evaluation.
 The ranking of relevant documents is done to find out the most related
document to the given query.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

 This is the key difference between the Database searching and Information
Retrieval.
 After the query is sent to the core of the system. This part has the access to
the content management module which is directly linked with the back-end
i.e. the large collections of data objects.
 Once results R are generated by the core system then it is returned to the user
by some graphical user interfaces.
 The process repeats and results are modified until the user satisfied for what he
is actually looking for.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

1. Document Parsing
The Documents comes from different source combinations such as multiple
languages, formatting's, character sets; normally, if any document consisting of
more than languages. e.g. Consider a Spanish mail which has some part in french
language.
Thus Document parsing deals with the overall document structure. In this phase, it
breaks down the document into discrete components. In Preprocessing phase it
creates unit documents for example one document representing emails and
another as additional specific part.
2. Lexical Analysis
In Lexical analysis, tokenization is the process of breaking a stream into words,
phrases, symbols, or other meaningful terms called tokens. These meaningful
elements ae further sent to Parts of Speech Tagging.
Typically, Tokenization occurs at a word level.
Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Process of Information Retrieval

3. Stemming and Lemmatization


In English grammar, for correct sentence structures, we often use different forms of any
word. e.g. go, going, goes etc. Stemming is the process of cutting down the affixes and let
the root word be found out.
Lemmatization usually refers to doing these things properly with Vocabulary and
Morphological analysis of words. Aiming to remove inflectional endings only.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
Information Retrieval in the Library
Libraries were among the first institutions to adopt IR systems for retrieving
information.
Usually, systems to be used in libraries were initially developed by academic
institutions and later by commercial vendors.
In the first generation, such systems consisted basically of an automation of
previous technologies (such as card catalogs) and basically allowed searches
based on author name and title.
In the second generation, increased search functionality was added which
allowed searching by subject headings, by keywords, and some more complex
query facilities.
In the third generation, which is currently being deployed, the focus is on
improved graphical interfaces, electronic forms, hypertext features, and open
system architectures.
Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Web and Digital Libraries

If we consider the search engines on the Web today, we conclude that they continue to use
indexes which are very similar to those used by librarians a century ago. What has changed
then? Three dramatic and fundamental changes have occurred due to the advances in
modern computer technology and the boom of the Web.
 First, it became a lot cheaper to have access to various sources of information. This
allows reaching a wider audience than ever possible before.
 Second, the advances in all kinds of digital communication provided greater access
to networks. This implies that the information source is available even if distantly
located and that the access can be done quickly (frequently, in a few seconds).
 Third, the freedom to post whatever information someone judges useful has greatly
contributed to the popularity of the Web. For the first time in history, many people
have free access to a large publishing medium.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Web and Digital Libraries

Fundamentally, low cost, greater access, and publishing freedom have allowed

people to use the Web (and modern digital libraries) as a highly interactive medium.

Such interactivity allows people to exchange messages, photos, documents,

software, videos, and to `chat' in a convenient and low cost fashion. Further, people

can do it at the time of their preference (for instance, you can buy a book late at

night) which further improves the convenience of the service. Thus, high interactivity

is the fundamental and current shift in the communication paradigm.

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
The Web and Digital Libraries

In the future, three main questions need to be addressed. First, despite the

high interactivity, people still find it difficult (if not impossible) to retrieve

information relevant to their information needs. Thus, in the dynamic world of

the Web and of large digital libraries, which techniques will allow retrieval of higher

quality? Second, with the ever increasing demand for access, quick response

is becoming more and more a pressing factor. Thus, which techniques will yield

faster indexes and smaller query response times? Third, the quality of the

retrieval task is greatly affected by the user interaction with the system. Thus,

how will a better understanding of the user behavior affect the design and

deployment of new information retrieval strategies?

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel
End of Unit-1

Mr. S. G. Shaikh, Assistant Professor, Department of Computer Engineering ,AIKTC, New Panvel

You might also like