Intro To Information Retrieval

This document provides an introduction to information retrieval. It discusses how information retrieval involves analyzing, indexing, and retrieving relevant data from large collections of unstructured information. It also describes how text mining techniques can be used to improve information retrieval systems by automatically categorizing documents, identifying topics, and extracting keywords. Additionally, it outlines some common components of an information retrieval system, including acquisition, representation, file organization, and querying.

Uploaded by

Nilam Honmane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views16 pages

Intro To Information Retrieval

Uploaded by

Nilam Honmane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Unit I

Introduction to Information
Retrieval
_Index_
• Introduction to Information Retrieval.
• Data Retrieval and Information Retrieval.
• Text Mining and its Relation to Information Retrieval.
• Block Diagram of an Information Retrieval System.
• Automatic Text Analysis: Luhn's Ideas.
• Conflation Algorithm in Information Retrieval.
Introduction to Information Retrieval.
• Information Retrieval (IR) is the process of obtaining relevant information
from a collection of information sources. This involves analyzing,
indexing, and retrieving data in a way that is both accurate and efficient.
• In today's digital age, the ability to quickly and efficiently retrieve
information is more important than ever. Whether it's searching for a
restaurant recommendation or finding a research paper, we rely on
information retrieval systems every day without even realizing it.
Data Retrieval and Information Retrieval.
• Data retrieval is the process of accessing data from a database or other
storage device. It involves searching for specific data based on certain
criteria, such as keywords or metadata.
• On the other hand, information retrieval is the process of accessing and
retrieving relevant information from a large collection of unstructured
data, such as text documents. Unlike data retrieval, information retrieval
involves analyzing and interpreting data to extract meaningful insights.
Difference between data & information retrieval
Data & text mining

• Data mining
• Extraction of interesting information or patterns from data in large
database is known as data mining
• Text mining
• It is the procedure of synthesizing information, by analyzing relations,
patterns & rules among textual data
Text Mining and its Relation to Information Retrieval.

• Text mining is a process of analyzing large amounts of unstructured text

data in order to extract useful information. This process involves various
techniques such as natural language processing, machine learning, and
statistical analysis. The goal of text mining is to turn unstructured data
into structured data that can be easily analyzed and used for decision-
making purposes.
Text Mining and its Relation to Information Retrieval

• The relationship between text mining and information retrieval is that text
mining techniques can be used to improve the effectiveness of information
retrieval systems. For example, text mining can be used to automatically
categorize documents, identify key topics, and extract important
keywords. These techniques can then be used to improve the accuracy and
relevance of search results in information retrieval systems.
Analyzing Text Mining
Procedure

• Text summarization
• Text categorization
• Text clustering
• Document collection Retrieve & preprocess documents
Text categorization, clustering, summarization MIS
• ( Management information system) Knowledge
Overview of text mining techniques
Block Diagram of an Information Retrieval
System.
Block Diagram of an Information Retrieval
System.
• Acquisition: In this step, the selection of documents and other objects from various web resources that consist
of text-based documents takes place. The required data is collected by web crawlers and stored in the database.
• Representation: It consists of indexing that contains free-text terms, controlled vocabulary, manual &
automatic techniques as well. example: Abstracting contains summarizing and Bibliographic description that
contains author, title, sources, data, and metadata.
• File Organization: There are two types of file organization methods. i.e. Sequential: It contains documents by
document data. Inverted: It contains term by term, list of records under each term. Combination of both.
• Query: An IR process starts when a user enters a query into the system. Queries are formal statements of
information needs, for example, search strings in web search engines. In information retrieval, a query does not
uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with
different degrees of relevancy.
Automatic Text Analysis: Luhn's Ideas.
• Luhn's most significant idea was the concept of 'key words', which he
defined as terms that occur frequently in a document but are not
commonly used elsewhere. This idea led to the creation of the first
automated keyword indexing system
• Focuses on analyzing the structure and content of text documents to
improve their retrieval and classification.
Conflation Algorithm in Information Retrieval

• The Conflation Algorithm is an important part of Information Retrieval

that helps to address the issue of different word forms.
• For instance, if a query contains the word 'run', it should also retrieve
documents that contain the words 'running' or 'ran’.
• The Conflation Algorithm solves this problem by mapping all similar
words to a single term. This ensures that queries are more accurate and
relevant results are returned.
Conflation Algorithm in Information Retrieval

• To illustrate the relevance of the Conflation Algorithm, consider the

example of a search for the word 'car’.
• Without the algorithm, the search would only return documents containing
the exact word 'car’.
• However, with the Conflation Algorithm in place, the search would also
retrieve documents containing related words such as 'cars' or
'automobiles'. This makes searches more comprehensive and effective.

IR Unit 1
No ratings yet
IR Unit 1
30 pages
FDS-Content Beyond Syllabus
No ratings yet
FDS-Content Beyond Syllabus
15 pages
Intro IR
No ratings yet
Intro IR
108 pages
BCA Semester VI Data Mining Module 5 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 5 (Presentation Kind of N
38 pages
Lec 1 - Intro - Unit 1 Information Technology
No ratings yet
Lec 1 - Intro - Unit 1 Information Technology
102 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
UNIT - 1 Text Mining
No ratings yet
UNIT - 1 Text Mining
18 pages
Text Mining
No ratings yet
Text Mining
18 pages
08-Text Mining
No ratings yet
08-Text Mining
38 pages
CT075!3!2 DTM Topic 12 Text Data Mining
No ratings yet
CT075!3!2 DTM Topic 12 Text Data Mining
25 pages
DM Laqs
No ratings yet
DM Laqs
14 pages
1 2 3 4 5 Merged
No ratings yet
1 2 3 4 5 Merged
23 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
34 pages
Text Mining: Lecturer: Dr. Nguyen Thi Ngoc Anh
No ratings yet
Text Mining: Lecturer: Dr. Nguyen Thi Ngoc Anh
27 pages
Chapter 1
No ratings yet
Chapter 1
69 pages
1 Information Retrieval System
No ratings yet
1 Information Retrieval System
10 pages
Thesis Chapterwise
No ratings yet
Thesis Chapterwise
52 pages
Unit I
No ratings yet
Unit I
11 pages
Submitted To: Submitted By:: Text Mining
No ratings yet
Submitted To: Submitted By:: Text Mining
15 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
45 pages
Information Retrieval Thesis
100% (3)
Information Retrieval Thesis
5 pages
Comparison of Different Dimensionality Reduction Methods For Information Retrieval and Text Mining
No ratings yet
Comparison of Different Dimensionality Reduction Methods For Information Retrieval and Text Mining
92 pages
A Brief Survey of Text Mining: Andreas Hotho KDE Group University of Kassel
No ratings yet
A Brief Survey of Text Mining: Andreas Hotho KDE Group University of Kassel
37 pages
Zhang 2015
No ratings yet
Zhang 2015
5 pages
1 introIR
No ratings yet
1 introIR
22 pages
Chapter 1 Introduction To IR
No ratings yet
Chapter 1 Introduction To IR
18 pages
IR Chapter 1 & 2
No ratings yet
IR Chapter 1 & 2
114 pages
Part B
No ratings yet
Part B
12 pages
EBM
No ratings yet
EBM
16 pages
Text Mining - Hanmei Fan - Fall 2006
No ratings yet
Text Mining - Hanmei Fan - Fall 2006
37 pages
10 1109@icaccs 2019 8728547
No ratings yet
10 1109@icaccs 2019 8728547
5 pages
Text Mining
No ratings yet
Text Mining
16 pages
Information Retrieval Thesis Topics
100% (3)
Information Retrieval Thesis Topics
6 pages
IR Notes
No ratings yet
IR Notes
14 pages
ISR U 1&2 Tech-Knowledge
No ratings yet
ISR U 1&2 Tech-Knowledge
68 pages
Adt Unit 5
No ratings yet
Adt Unit 5
31 pages
Chap 4 Text IR PDF
No ratings yet
Chap 4 Text IR PDF
19 pages
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
No ratings yet
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
11 pages
Hot Ho 05 Text Mining
No ratings yet
Hot Ho 05 Text Mining
37 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
Differentiating Between Data-Mining and Text-Mining Terminology
No ratings yet
Differentiating Between Data-Mining and Text-Mining Terminology
15 pages
Text Mining: Concepts, Process and Applications: January 2013
No ratings yet
Text Mining: Concepts, Process and Applications: January 2013
5 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
4 pages
Text Mining
No ratings yet
Text Mining
23 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
An Introduction To Information Retrieval Systems: Intelligent Systems March 18, 2004 Ramashis Das
No ratings yet
An Introduction To Information Retrieval Systems: Intelligent Systems March 18, 2004 Ramashis Das
25 pages
Dept. of ISE, Acit 1
No ratings yet
Dept. of ISE, Acit 1
12 pages
Information Retrieval
No ratings yet
Information Retrieval
3 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
Ranking and Searching of Document With New Innovative Method in Text Mining: First Review
No ratings yet
Ranking and Searching of Document With New Innovative Method in Text Mining: First Review
7 pages
Case Study On Text Mining
No ratings yet
Case Study On Text Mining
8 pages
Different Text Mining Techniques
No ratings yet
Different Text Mining Techniques
4 pages
Text Mining Assignment
No ratings yet
Text Mining Assignment
12 pages
Text Mining and Its Applications
No ratings yet
Text Mining and Its Applications
5 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Intro To Information Retrieval

Uploaded by

Intro To Information Retrieval

Uploaded by

Unit I

• Text mining is a process of analyzing large amounts of unstructured text

• The Conflation Algorithm is an important part of Information Retrieval

• To illustrate the relevance of the Conflation Algorithm, consider the

You might also like