Intro To Information Retrieval
Intro To Information Retrieval
Introduction to Information
Retrieval
_Index_
• Introduction to Information Retrieval.
• Data Retrieval and Information Retrieval.
• Text Mining and its Relation to Information Retrieval.
• Block Diagram of an Information Retrieval System.
• Automatic Text Analysis: Luhn's Ideas.
• Conflation Algorithm in Information Retrieval.
Introduction to Information Retrieval.
• Information Retrieval (IR) is the process of obtaining relevant information
from a collection of information sources. This involves analyzing,
indexing, and retrieving data in a way that is both accurate and efficient.
• In today's digital age, the ability to quickly and efficiently retrieve
information is more important than ever. Whether it's searching for a
restaurant recommendation or finding a research paper, we rely on
information retrieval systems every day without even realizing it.
Data Retrieval and Information Retrieval.
• Data retrieval is the process of accessing data from a database or other
storage device. It involves searching for specific data based on certain
criteria, such as keywords or metadata.
• On the other hand, information retrieval is the process of accessing and
retrieving relevant information from a large collection of unstructured
data, such as text documents. Unlike data retrieval, information retrieval
involves analyzing and interpreting data to extract meaningful insights.
Difference between data & information retrieval
Data & text mining
• Data mining
• Extraction of interesting information or patterns from data in large
database is known as data mining
• Text mining
• It is the procedure of synthesizing information, by analyzing relations,
patterns & rules among textual data
Text Mining and its Relation to Information Retrieval.
• The relationship between text mining and information retrieval is that text
mining techniques can be used to improve the effectiveness of information
retrieval systems. For example, text mining can be used to automatically
categorize documents, identify key topics, and extract important
keywords. These techniques can then be used to improve the accuracy and
relevance of search results in information retrieval systems.
Analyzing Text Mining
Procedure
• Text summarization
• Text categorization
• Text clustering
• Document collection Retrieve & preprocess documents
Text categorization, clustering, summarization MIS
• ( Management information system) Knowledge
Overview of text mining techniques
Block Diagram of an Information Retrieval
System.
Block Diagram of an Information Retrieval
System.
• Acquisition: In this step, the selection of documents and other objects from various web resources that consist
of text-based documents takes place. The required data is collected by web crawlers and stored in the database.
• Representation: It consists of indexing that contains free-text terms, controlled vocabulary, manual &
automatic techniques as well. example: Abstracting contains summarizing and Bibliographic description that
contains author, title, sources, data, and metadata.
• File Organization: There are two types of file organization methods. i.e. Sequential: It contains documents by
document data. Inverted: It contains term by term, list of records under each term. Combination of both.
• Query: An IR process starts when a user enters a query into the system. Queries are formal statements of
information needs, for example, search strings in web search engines. In information retrieval, a query does not
uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with
different degrees of relevancy.
Automatic Text Analysis: Luhn's Ideas.
• Luhn's most significant idea was the concept of 'key words', which he
defined as terms that occur frequently in a document but are not
commonly used elsewhere. This idea led to the creation of the first
automated keyword indexing system
• Focuses on analyzing the structure and content of text documents to
improve their retrieval and classification.
Conflation Algorithm in Information Retrieval