0% found this document useful (0 votes)

49 views14 pages

Unit 5

Unit 5 data modeling

Uploaded by

Amanraj Somawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views14 pages

Unit 5

Unit 5 data modeling

Uploaded by

Amanraj Somawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Bhilai Institute of Technology, Durg

DEPARTMENT OF COMPUTER SCIENCE &

ENGINEERING

Natural Language Processing

UNIT 5: Information retrieval and lexical resources

Prepared By
Dr. Shikha Pandey
UNIT – V (CO5)
Information retrieval and lexical resources: Information Retrieval: Design features of
Information Retrieval Systems, Classical, Non-classical, Alternative Models of
Information Retrieval, valuation
Lexical Resources: World Net, Frame Net, Stemmers, POS Tagger

What is an Information Retrieval System?

An Information Retrieval (IR) system is a software-based framework designed to efficiently and
effectively retrieve relevant information from a collection of data or documents in response to user
queries.

These systems are integral to various applications, such as search engines, recommendation
systems, document management systems, and chatbots. The primary goal of an IR system is to
bridge the gap between the user’s information needs and the available data by providing timely
and accurate results.

Unlike simple keyword-based searches, modern IR systems employ advanced techniques

from Natural Language Processing (NLP), machine learning, and data mining to understand user
intent, context, and the semantics of queries and documents. This enables them to retrieve
documents that match the exact keyword and answer the user’s query.

Critical features of the IR System

Understanding the critical features of an IR system is essential for effective data searching, retrieval
accuracy, and relevance ranking. Knowledge in this area enhances system usability, improves user
experience, and supports efficient decision-making, making it a vital skill for professionals in data-driven
fields.

 Indexing: It creates an organized structure that maps terms (words or phrases) to the documents
in which they appear. This structure allows for efficient lookup and retrieval of records based on
specific terms.

 Query Processing: The system analyses and processes user queries to identify the most relevant
terms and concepts. This often involves techniques to handle synonymy (different words with the
same meaning) and polysemy (a word with multiple meanings).

 Relevance Ranking: Documents retrieved from the index are ranked based on their perceived
relevance to the user’s query. Various ranking algorithms, such as TF-IDF (Term Frequency-
Inverse Document Frequency) and BM25, are used to determine the order in which documents
are presented to the user.
 User Interaction and Feedback: Some IR systems learn from user interactions to improve their
performance over time. For instance, if a user clicks on a particular search result, the system
might know that similar results are likely relevant.

 Information Presentation: The retrieved documents are typically presented to the user with
additional information, such as document snippets, titles, and links, to help users quickly assess
the relevance of each result.

 Query Expansion: This technique automatically enhances user queries with additional terms
related to the original query. By accounting for different ways of expressing the same idea, it can
help retrieve more relevant results.

Objectives of Information Retrieval System:

The objectives of the IR system are centred around providing efficient and accurate access to relevant
information from a vast collection of data or documents. These objectives go beyond simple keyword
matching and focus on enhancing the user’s experience by delivering meaningful and contextually
appropriate results. The primary goals of an IR system include:

Relevance
The foremost objective of an IR system is to retrieve information directly relevant to the user’s query. This
means the system should consider exact keyword matches, understand the user’s intent, and provide
documents that address the user’s information needs.

Relevance ensures that users receive the most pertinent information, which enhances their overall
satisfaction. By focusing on relevance, IR systems can significantly improve the quality of the search
results, making it easier for users to find the information they need quickly and efficiently.

Efficiency
IR systems aim to retrieve relevant documents quickly, even from large datasets. Speed and efficiency are
critical to providing a satisfactory user experience, especially when users expect rapid responses to their
queries.

An efficient IR system processes vast amounts of data in real-time, ensuring users do not experience delays.
This efficiency is achieved through advanced algorithms and optimised data structures that enable the
system to search and retrieve information rapidly, enhancing the overall user experience.

Ranking
Once relevant documents are retrieved, the IR system ranks them in order of perceived relevance. This
ranking helps users prioritise their focus on the most relevant documents and saves them time by not having
to sift through irrelevant results.

Users can quickly find what they are looking for by presenting the most pertinent information. Ranking
involves sophisticated algorithms that consider keyword frequency, document popularity, and user
preferences, ensuring that the most helpful information appears at the top of the search results.
Accuracy
IR systems strive to minimise false positives (irrelevant documents retrieved) and false negatives (relevant
documents not retrieved). Accurate retrieval ensures that users receive trustworthy and appropriate
information.

An accurate IR system meticulously evaluates the relevance of documents, reducing the chances of
irrelevant details appearing in the search results. This accuracy is crucial for maintaining the credibility and
reliability of the IR system, as users depend on it to provide precise and valuable information.

Contextual Understanding
Beyond literal keyword matching, IR systems aim to comprehend the context and semantics of both user
queries and document content. This allows the system to provide results that align with the user’s intended
meaning.

Contextual understanding involves analysing the relationships between words and phrases within the query
and documents, ensuring that the search results are relevant and contextually appropriate. This deep
understanding of language nuances significantly enhances the accuracy and relevance of the information
retrieved.

User Interaction
Many modern IR systems incorporate user interactions and feedback to improve future retrieval results. By
learning from user behaviour and preferences, the system becomes better at refining its results over time.

User interaction allows the IR system to adapt to individual user needs, making the search process more
personalised and effective. Feedback mechanisms such as clicks, ratings, and comments provide valuable
insights into user preferences, enabling the system to improve and continuously deliver more accurate and
relevant search results.

Personalisation
In some cases, IR systems personalise results based on user profiles, preferences, and historical interactions.
This ensures that users receive information most relevant to their needs. Personalisation involves tailoring
the search results to match each user’s unique interests and requirements.

By considering factors such as search history, demographic information, and individual preferences, the IR
system can deliver a more customised and satisfying search experience, increasing user engagement and
satisfaction.

Diversity of Results
While relevance is crucial, IR systems also aim to provide diverse results. This prevents the system from
returning multiple highly similar documents and instead offers a well-rounded view of the topic.

Diversity ensures that users are exposed to various perspectives and information sources, enriching their
understanding of the subject matter. By incorporating diverse results, the IR system can cater to user needs
and preferences, providing a more comprehensive and balanced search experience.

Adaptability
IR systems need to adapt to changes in data and user behaviour. As new documents are added and user
preferences evolve, the system should continue to provide accurate and relevant results.
Adaptability involves continuously updating the system’s algorithms and data structures to accommodate
new information and changing user behaviours. This ensures that the IR system remains effective and
reliable over time, consistently delivering high-quality search results regardless of the dynamic nature of
the data and user expectations.

Supporting Complex Queries

The system should handle complex queries involving multiple concepts, logical operators, and facets. It
should understand and interpret these queries accurately to provide meaningful results. Supporting complex
queries requires sophisticated algorithms capable of parsing and processing intricate search expressions.

By accurately interpreting and addressing complex queries, the IR system can meet users’ diverse and
specific information needs, ensuring that even the most detailed and nuanced queries yield accurate and
relevant results. This capability enhances the system’s utility and versatility, making it a valuable tool for
users with varied and complex search requirements.

Work Process of Information Retrieval

The Information Retrieval (IR) process involves a series of steps that collectively aim to retrieve relevant
information from a collection of data or documents based on user queries. This process goes beyond simple
keyword matching and employs various techniques to understand user intent, index documents, and rank
their relevance.

Here’s a step-by-step breakdown of the typical information retrieval process:

Data Collection and Preprocessing

First, we gather documents or data on how the IR system will operate. This initial step involves collecting
vast amounts of raw data from various sources, such as databases, web pages, or text documents.
After gathering the data, we preprocess it by cleaning and tokenising it, breaking it into individual words
or phrases. This step also involves removing unnecessary elements like stopwords (common words like
“the” or “and”) and punctuation.

Optionally, we apply techniques like stemming or lemmatisation to reduce words to their root forms,
ensuring consistency and improving search accuracy.

Indexing

In the indexing phase, we create a data structure that maps terms (words or phrases) to the documents they
appear. This index allows for efficient lookup and retrieval of documents containing specific terms. We can
facilitate fast and accurate retrieval by using data structures like inverted indexes.

The inverted index is particularly effective because it stores a list of documents for each term, making it
quick to find all documents containing a particular word or phrase. This step ensures the IR system can
quickly and accurately respond to user queries.

Query Processing

When a user submits a query, the system processes it to identify relevant terms and concepts. This step
involves analysing the query to understand the user’s intent and determine the most important words or
phrases. We also handle query expansion, adding additional terms related to the user’s query to enhance
retrieval accuracy.

For example, if a user searches for “cars,” we might also consider related terms like “automobiles” or
“vehicles.” Additionally, we address synonymy (different words with similar meanings) and polysemy
(words with multiple meanings) to ensure we capture the user’s intended meaning.

Relevance Ranking

After identifying the relevant documents, we calculate a relevance score for each document using ranking
algorithms such as TF-IDF (Term Frequency-Inverse Document Frequency) or BM25. These algorithms
consider various factors, such as the frequency of query terms in the document and the overall importance
of the terms.

Documents with higher relevance scores are ranked higher and presented to the user first. This ranking
process ensures that the most pertinent and useful documents appear at the top of the search results,
enhancing the user’s search experience.

Presentation of Results

We then display the retrieved documents to the user in a user-friendly format. This presentation typically
includes document titles, text snippets matching the query, and links to the full documents.

Additional information, such as publication dates, authors, and metadata, helps users assess the relevance
of each result. By providing a clear and informative presentation, we help users quickly determine which
documents will most likely meet their needs and encourage further exploration of the search results.
User Interaction and Feedback

User interaction with the presented results provides valuable feedback for improving the IR system. By
observing user actions, such as clicks and the amount of time spent on each document, we gather insights
into the relevance of the retrieved documents.

This feedback loop allows us to refine the ranking algorithms and improve future retrieval results.
Incorporating user feedback is essential for adapting to users’ changing needs and preferences, ensuring the
IR system remains practical and relevant over time.

Iterative Querying

Users often refine their queries based on the initial results. They may modify keywords, add filters, or
change their search terms to narrow their search and improve the relevance of the retrieved documents.

Each iteration helps the user get closer to finding the information they need. This iterative querying process
is a critical component of the IR system, as it allows users to explore different aspects of their search topic
and progressively improve their search results.

Continuous Learning and Adaptation

Finally, the IR system must continuously learn and adapt to remain effective. We update the index and
ranking algorithms as new documents are added to the collection. We also adapt to user behaviour and
preferences changes, ensuring the system remains accurate and relevant.

By continuously learning from user interactions and updating the system accordingly, we can maintain
high-quality search results and provide a better user experience.

The Information Retrieval process is dynamic and multifaceted. It aims to efficiently provide users with the
most relevant information. By following these steps, IR systems can effectively meet user needs and adapt
to the ever-changing information landscape.

Information Retrieval Example

Imagine you are searching for “best budget smartphones.” An Information Retrieval (IR) system processes
this query by identifying documents that contain the keywords “best,” “budget,” and “smartphones.” It
doesn’t stop there; the system goes further to understand the context and nuances of the search.

The IR system evaluates the relevance of the documents, ensuring that the articles it retrieves discuss
affordable smartphones with good features. This means it looks for content where the term “budget” is
associated with “smartphones;” these devices are rated highly for their value.

Additionally, the IR system considers the user’s intent behind the search. It understands the user wants to
find the best options within a specific price range. As a result, it prioritises articles that compare different
budget smartphones, reviews that highlight their features, and lists that recommend top choices.

The IR system ensures a more satisfying and accurate search experience by aligning the search results with
the user’s intent. This example demonstrates how IR systems go beyond simple keyword matching,
employing sophisticated algorithms to deliver relevant and helpful information tailored to the user’s needs.
Information Retrieval and Information Extraction in AI
Information Retrieval (IR) and Information Extraction (IE) are two fundamental pillars of AI’s language
understanding capabilities. IR focuses on fetching relevant information from vast datasets. When users
enter a query, IR systems scan large data collections, such as documents, databases, and websites, to find
the most pertinent information.

This process involves indexing, ranking, and retrieving documents based on their relevance to the query.
Effective IR systems, like search engines, ensure users receive accurate and helpful information quickly,
enhancing their ability to find what they need from extensive data sources.

In contrast, Information Extraction identifies and extracts structured information from unstructured text. IE
systems analyse text to identify specific pieces of information, such as names, dates, locations, and
relationships. This structured data can then be organised into databases or knowledge graphs, significantly
contributing to AI’s knowledge base.

For instance, an IE system might process news articles to extract data about events, people involved, and
their connections, transforming raw text into actionable insights. This capability is crucial for tasks like
automated summarization, question answering, and content recommendation.

Together, IR and IE enable AI systems to understand and utilize human language more effectively, driving
advancements in natural language processing and contributing to the development of intelligent
applications.

Design features of Information Retrieval Systems

Designing effective Information Retrieval (IR) Systems involves various technical and usability aspects
to ensure users can find relevant information efficiently. Here are key design features of IR systems:

Inverted Index

An inverted index is an index data structure storing a mapping from content (content can be words or
numbers) to its locations in a document or a set of documents.

We can also say the inverted index is a hashmap-like data structure that directs the user from a word to a
document or a web page. The inverted index is also the primary data structure of most information retrieval
systems.

Inverted index as a data structure lists for every word, all documents that contain it, and frequency of the
occurrences in the document hence making it easy to search for hits of a query word.
Types of inverted indexes: Mainly two types record level inverted index, and word level inverted index.

Record level inverted index contains a list of references to documents for each word.

Word level inverted index additionally contains the positions of each word within a document. This form
of the inverted index also offers more functionality but needs more processing power and space to be
created.

Advantages of inverted index: The main utility of inverted index is that it allows fast full-text searches at
the cost of increased processing when a document is added to the database.

Inverted index is easy to develop.

Inverted index is also the most popular data structure used in document retrieval systems used on a large
scale for example, in search engines.

Disadvantage of inverted index: Inverted index has a large storage overhead with high maintenance costs
for the update, delete, and insert statements.

Stop Word Elimination

Stop words are high-frequency words that are deemed unlikely to be useful for searching inside the
documents of the information retrieval system.

All the words in the corpus with less semantic weights are kept in a list called a stop list.

Example: Articles like a, an, the, and prepositions like in, of, for, at, etc. are examples of stop words.

Size reduction of the inverted index using stop list: One main pro of eliminating stop words is that the size
of the inverted index can be significantly reduced by a stop list.

As per Zipf’s law, a stop list covering a few dozen words reduces the size of the inverted index by almost
half.

One disadvantage of stop word elimination is that sometimes it may cause the elimination of the term that
is useful for searching.

Example: If we eliminate the alphabet A from Vitamin A, then the word will lose its significance.

Stemming

Stemming is the heuristic process of extracting the base form of words by chopping off the ends of words.
It is the process of producing morphological variants of a root or base word. Stemming programs are
commonly referred to as stemming algorithms or stemmers. Stemming is one of the important steps in
information retrieval systems like search engines.
For example, the words laughing, laughs, and laughed would be stemmed from the root word laugh.

Usage of stemming in information retrieval system with an example: If we want to search for the word
chocolate in a collection of documents, we want to see all the documents that have information about the
word chocolate.

It may so happen that the words chocolates, chocolatey, and choco may be present in many documents
instead of chocolate.

To relate these many words, we can stem these words into their root word chocolate again so that we can
retrieve all the documents containing this base word no matter the way it is represented across documents.

There are many standard tools for performing this reduction (of stemming into root word) like the Porter’s
Stemmer, the Snowball stemmer, the Lancaster stemmer, etc.

Classical, Non-classical, and Alternative Models of Information Retrieval

(IR)
Information Retrieval (IR) models provide the theoretical foundation for how information
systems retrieve documents based on user queries. These models have evolved from classical
approaches to non-classical and alternative models to address various challenges and improve
retrieval effectiveness.

1. Classical Models of Information Retrieval

Classical models focus primarily on the mathematical representation of documents and queries,
matching them to retrieve relevant results.

a. Boolean Model

 Key Features:
o Based on set theory.
o Queries are expressed as Boolean expressions using AND, OR, and NOT operators.
o Documents are either relevant or not relevant (binary retrieval).
 Advantages:
o Simple to implement and understand.
o Gives precise control over the query through the logical operators.
 Limitations:
o Does not handle partial matching or ranking of results.
o The query formulation can be complex for users.
o Cannot retrieve documents based on term frequency or document relevance scores.
b. Vector Space Model (VSM)

 Key Features:
o Represents documents and queries as vectors in a high-dimensional space.
o Terms in documents are weighted, often using TF-IDF (Term Frequency-Inverse
Document Frequency).
o Relevance is determined by computing the cosine similarity between the query vector
and document vectors.
 Advantages:
o Supports partial matching and can rank documents by relevance.
o Intuitive representation of document-query similarity.
 Limitations:
o Assumes terms are independent (does not capture term correlations).
o High computational cost due to vector manipulation.

c. Probabilistic Model

 Key Features:
o Assumes there is a probability that a document is relevant to a given query.
o Models include Binary Independence Model (BIM), which estimates the probability that
a document is relevant given its terms and the query.
o Each term contributes to the probability of relevance.
 Advantages:
o The probabilistic model inherently ranks documents.
o It can be updated with user feedback to improve retrieval over time.
 Limitations:
o Requires a prior knowledge of relevant documents for training the model, which is
often not available.
o Limited by the independence assumption between terms.

2. Non-classical Models of Information Retrieval

Non-classical models emerged to address the limitations of classical models, especially their
inability to handle uncertainty and context in a better way.

a. Fuzzy Set Model

 Key Features:
o Represents relevance in degrees rather than binary terms (as in Boolean models).
o Each document has a degree of membership in the relevant set.
o Supports fuzzy logic for query processing, allowing partial matches and linguistic
uncertainty.
 Advantages:
o Flexible and can handle ambiguous or imprecise queries.
o Provides a continuous spectrum of relevance instead of rigid binary retrieval.
 Limitations:
o More complex to implement.
o Requires appropriate fuzzy membership functions, which may be difficult to define.

b. Extended Boolean Model

 Key Features:
o A hybrid of the Boolean model and Vector Space Model.
o Allows weighted terms in Boolean queries, providing a mechanism for partial matching.
o Uses proximity-based retrieval by calculating the distance between documents and
query terms.
 Advantages:
o Balances the strictness of Boolean logic with the flexibility of vector space ranking.
 Limitations:
o Still requires users to formulate complex queries.

c. Inference Network Model

 Key Features:
o Uses a Bayesian network to model the relationships between terms, documents, and
queries.
o Documents and queries are represented as nodes in a network, and relevance is
computed using probabilistic inference.
 Advantages:
o Highly expressive model, capturing term relationships and uncertainty.
o Adaptable to various retrieval tasks and user preferences.
 Limitations:
o Computationally expensive due to the complexity of the network.
o Difficult to implement and scale for large datasets.

3. Alternative Models of Information Retrieval

Alternative models explore approaches that go beyond traditional models, often incorporating
more advanced concepts from machine learning, linguistics, and cognitive science.

a. Language Models for IR

 Key Features:
o Treats each document as a language model and computes the probability of generating
a query from that document.
o One popular method is the query likelihood model, where the goal is to rank
documents by how likely they would generate the query.
 Advantages:
o Provides a strong theoretical framework and allows for a natural incorporation of
statistical techniques.
o Can model term dependencies and capture more complex patterns of term usage.
 Limitations:
o Can suffer from data sparsity, where the model doesn’t have enough information to
build reliable language models for certain documents.

b. Latent Semantic Indexing (LSI)

 Key Features:
o Uses singular value decomposition (SVD) to reduce the dimensionality of the term-
document matrix, capturing the latent structure in the data.
o Documents and queries are mapped to a latent semantic space, where synonyms and
related terms are grouped together.
 Advantages:
o Improves recall by retrieving documents with similar meanings even if different words
are used (synonym handling).
o Reduces noise by focusing on the key latent concepts.
 Limitations:
o High computational cost due to SVD.
o It may struggle with new terms not captured in the original matrix.

c. Neural Models and Deep Learning

 Key Features:
o Leverages deep learning techniques (e.g., word embeddings, neural networks) to model
semantic relationships between terms, documents, and queries.
o Techniques like BERT (Bidirectional Encoder Representations from Transformers) are
used to understand context and relationships between words.
 Advantages:
o Excellent at handling complex, natural language queries and understanding context.
o Can learn from large datasets and improve with more data.
 Limitations:
o Requires significant computational resources.
o Needs large amounts of data for training, making it less practical for smaller datasets.

Evaluation of IR Models
Evaluation of IR models is crucial to assess their effectiveness and usability in real-world
scenarios.

Key Evaluation Metrics:

 Precision: The proportion of relevant documents retrieved out of the total retrieved documents.
 Recall: The proportion of relevant documents retrieved out of the total relevant documents
available.
 F1-Score: Harmonic mean of precision and recall, providing a balance between the two.
 Mean Average Precision (MAP): A measure of precision across multiple queries, providing an
overall assessment.
 Discounted Cumulative Gain (DCG): Measures the usefulness of a document based on its
position in the result set, emphasizing the importance of ranking relevant documents higher.
 User Satisfaction: Subjective but vital metric, based on user feedback regarding the relevance of
results, speed, and ease of use.

Indexing and Abstracting Reviewer LLE
100% (2)
Indexing and Abstracting Reviewer LLE
46 pages
FFT
No ratings yet
FFT
10 pages
Device Level Ring Diagnostics Faceplate User Guide
No ratings yet
Device Level Ring Diagnostics Faceplate User Guide
55 pages
Information Retrieval (IR) System
No ratings yet
Information Retrieval (IR) System
21 pages
Information Retrieval Question Bank
No ratings yet
Information Retrieval Question Bank
161 pages
Information Retrieval Question Bank-2
No ratings yet
Information Retrieval Question Bank-2
168 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
Intelligent
No ratings yet
Intelligent
20 pages
Ir Mod1 Notes
No ratings yet
Ir Mod1 Notes
20 pages
Module 1print
No ratings yet
Module 1print
5 pages
IRS Study Material
100% (1)
IRS Study Material
87 pages
Irs Unit1
No ratings yet
Irs Unit1
15 pages
Information
No ratings yet
Information
61 pages
IRS Unit 1 by Krishna
No ratings yet
IRS Unit 1 by Krishna
33 pages
Information Retrieval
No ratings yet
Information Retrieval
21 pages
Irs Unit - 1-1
No ratings yet
Irs Unit - 1-1
45 pages
Information Retrivals Ans
No ratings yet
Information Retrivals Ans
78 pages
Texto Teste 3
No ratings yet
Texto Teste 3
2 pages
Irs Unit-1-1
No ratings yet
Irs Unit-1-1
113 pages
UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
Concepts of Information Retrieval System
No ratings yet
Concepts of Information Retrieval System
10 pages
Ir Ass1
No ratings yet
Ir Ass1
12 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
44 pages
IRS Notes
No ratings yet
IRS Notes
10 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Assignment On Information Retrieval
No ratings yet
Assignment On Information Retrieval
2 pages
Chapter 6-8IR Revised
No ratings yet
Chapter 6-8IR Revised
76 pages
ISR Chap..1
No ratings yet
ISR Chap..1
27 pages
Irs PDF
No ratings yet
Irs PDF
68 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Ch1 IR
No ratings yet
Ch1 IR
39 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
15 pages
Minimize The Overhead of A User Locating Needed Information Precision and Recall
No ratings yet
Minimize The Overhead of A User Locating Needed Information Precision and Recall
14 pages
IR Chapter 1&2
No ratings yet
IR Chapter 1&2
88 pages
IRS Unit-1
50% (2)
IRS Unit-1
14 pages
Introduction To IR 2021
No ratings yet
Introduction To IR 2021
40 pages
Unit-1 Chapter 1
No ratings yet
Unit-1 Chapter 1
44 pages
IRS Unit 1 Part 2
No ratings yet
IRS Unit 1 Part 2
6 pages
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
100% (1)
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
35 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
ISR Unit 1
No ratings yet
ISR Unit 1
23 pages
CS8080 Irt
100% (1)
CS8080 Irt
33 pages
Irs Notes - Merged
No ratings yet
Irs Notes - Merged
166 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
The Information Retrieval Lesson ?
No ratings yet
The Information Retrieval Lesson ?
3 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Irs Unit-1 Modified
No ratings yet
Irs Unit-1 Modified
12 pages
Ch2 - IR and LT
No ratings yet
Ch2 - IR and LT
45 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
22103071-Assignment - Ii
No ratings yet
22103071-Assignment - Ii
7 pages
Development Team: Paper No
No ratings yet
Development Team: Paper No
10 pages
Irs Unit 1
No ratings yet
Irs Unit 1
10 pages
1 IR Introduction
No ratings yet
1 IR Introduction
23 pages
Course Name: Level: Course Code: 9214 Semester: Spring 2023 Assignment: 1 Due Date: 30-08-2023 Total Assignment: 2 Late Date: 29-09-2023
No ratings yet
Course Name: Level: Course Code: 9214 Semester: Spring 2023 Assignment: 1 Due Date: 30-08-2023 Total Assignment: 2 Late Date: 29-09-2023
19 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
59 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
Information Retrieval: Recent Advances and Beyond: Kailash Hambarde, and Hugo Proença
No ratings yet
Information Retrieval: Recent Advances and Beyond: Kailash Hambarde, and Hugo Proença
26 pages
Unit-I: Introduction To Information Retrieval Systems
100% (1)
Unit-I: Introduction To Information Retrieval Systems
14 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
1 page
DocScanner 20 Nov 2024 7-29 PM
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
1 page
DocScanner 20 Nov 2024 7-30 PM
No ratings yet
DocScanner 20 Nov 2024 7-30 PM
1 page
Send - Unit-4 Notes - Entrepreneurship & EDP
No ratings yet
Send - Unit-4 Notes - Entrepreneurship & EDP
26 pages
Copyright Infringement
100% (1)
Copyright Infringement
43 pages
PL - 1 - OnCODEs (English)
No ratings yet
PL - 1 - OnCODEs (English)
5 pages
Unit 6 Homework Week November 26
No ratings yet
Unit 6 Homework Week November 26
4 pages
FTSearch Method
No ratings yet
FTSearch Method
280 pages
Signal and System
No ratings yet
Signal and System
16 pages
Software Engineering V.imp Question + PYQs Paper (Edushine Classes)
No ratings yet
Software Engineering V.imp Question + PYQs Paper (Edushine Classes)
11 pages
Sample Code
100% (1)
Sample Code
3 pages
600-FKM (FYP1-LB-Rev.0) - FYP LogBook
0% (1)
600-FKM (FYP1-LB-Rev.0) - FYP LogBook
8 pages
CS304 - Object Oriented Programming Final Term Short Question & Answers (Lecture 19 To 45)
100% (1)
CS304 - Object Oriented Programming Final Term Short Question & Answers (Lecture 19 To 45)
10 pages
A Review of Optimization Approach To Power Flow Tracing in A Deregulated Power System
No ratings yet
A Review of Optimization Approach To Power Flow Tracing in A Deregulated Power System
14 pages
Ora 01157 Cannot Identifylock Data File
No ratings yet
Ora 01157 Cannot Identifylock Data File
6 pages
Introduction
100% (1)
Introduction
81 pages
Punjab Project Supervisor: Panel Card Form
No ratings yet
Punjab Project Supervisor: Panel Card Form
3 pages
I & C Maintenance Manual
100% (1)
I & C Maintenance Manual
111 pages
Motorola Mt2000 RSS Manual
100% (1)
Motorola Mt2000 RSS Manual
304 pages
Ansible Cheat Sheet
No ratings yet
Ansible Cheat Sheet
8 pages
Mil PPT 2
No ratings yet
Mil PPT 2
21 pages
Bom Udoo Neo
No ratings yet
Bom Udoo Neo
6 pages
At91sam9263-Ek File Browser v1
No ratings yet
At91sam9263-Ek File Browser v1
4 pages
Multiprocessor System Architecture
No ratings yet
Multiprocessor System Architecture
11 pages
Rman
No ratings yet
Rman
3 pages
Mounting Cdrom Unix
No ratings yet
Mounting Cdrom Unix
7 pages
M2-Mutual Exclusion in Synchronization
No ratings yet
M2-Mutual Exclusion in Synchronization
4 pages
Masterchef Tcnicas de Pastelera Profesional Spanish Edition by Mariana Sebess B00ieh62jo
0% (1)
Masterchef Tcnicas de Pastelera Profesional Spanish Edition by Mariana Sebess B00ieh62jo
5 pages
Important Questions: Subject: Web Design Class: - BCA-4nd Sem
100% (3)
Important Questions: Subject: Web Design Class: - BCA-4nd Sem
2 pages
Manajemen Stratejik (Umar Said)
No ratings yet
Manajemen Stratejik (Umar Said)
70 pages
Hpc301 User Manual
No ratings yet
Hpc301 User Manual
25 pages
University of Mumbai B.E. Electronics Engineering
No ratings yet
University of Mumbai B.E. Electronics Engineering
2 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

Bhilai Institute of Technology, Durg

DEPARTMENT OF COMPUTER SCIENCE &

Natural Language Processing

What is an Information Retrieval System?

Unlike simple keyword-based searches, modern IR systems employ advanced techniques

Critical features of the IR System

Objectives of Information Retrieval System:

Supporting Complex Queries

Work Process of Information Retrieval

Here’s a step-by-step breakdown of the typical information retrieval process:

Data Collection and Preprocessing

Continuous Learning and Adaptation

Information Retrieval Example

Design features of Information Retrieval Systems

Inverted index is easy to develop.

Stop Word Elimination

Classical, Non-classical, and Alternative Models of Information Retrieval

1. Classical Models of Information Retrieval

2. Non-classical Models of Information Retrieval

a. Fuzzy Set Model

b. Extended Boolean Model

c. Inference Network Model

3. Alternative Models of Information Retrieval

a. Language Models for IR

b. Latent Semantic Indexing (LSI)

c. Neural Models and Deep Learning

Key Evaluation Metrics:

You might also like