0% found this document useful (0 votes)

23 views17 pages

What Is Information Retrieval (IR)

The document discusses similarity functions in Information Retrieval (IR), detailing various methods such as Cosine Similarity, Jaccard Similarity, TF-IDF, and Word Embedding Similarity, which are used to compare queries and documents for relevance. It also covers query expansion techniques, document normalization, multi-field retrieval, and the evaluation of retrieval systems, highlighting the importance of these processes in improving search efficiency and accuracy. Additionally, it addresses the challenges faced in IR, including information overload and the precision-recall trade-off.

Uploaded by

m.yadav9315

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views17 pages

What Is Information Retrieval (IR)

Uploaded by

m.yadav9315

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Similarity Function in Information Retrieval (IR)

In Information Retrieval (IR), similarity functions are used to compare queries and
documents or compare documents with each other. These functions help in ranking documents
based on how relevant they are to a given query. Choosing the right similarity function depends
on:

● The nature of the data

● The specific retrieval task
● The characteristics of the document collection

Now, let's go over different types of similarity functions one by one.

1. Vector Space Model (Cosine Similarity)

● In Vector Space Model (VSM), both queries and documents are represented as
vectors in a high-dimensional space.
● Each dimension represents a unique term from the vocabulary.
● The Cosine Similarity is used to measure how similar a document vector is to a query
vector.
● It is calculated using the cosine of the angle between the two vectors.

Example:

Let's say we have two documents:

Document 1: "Information retrieval uses similarity functions."

Document 2: "Similarity functions are used in search engines."

After converting them into vectors based on word frequency, cosine similarity will measure the
angle between them. The smaller the angle, the more similar the documents are.

2. Jaccard Similarity

● Measures similarity between two sets (e.g., sets of words in two documents).

● Formula:
● where A and B are sets of words in two documents.

Example:

Document 1: {information, retrieval, similarity, functions}

Document 2: {similarity, functions, search, engines}
Intersection: {similarity, functions} → 2 words
Union: {information, retrieval, similarity, functions, search, engines}
→ 6 words

A higher Jaccard similarity means two documents share more words.

3. TF-IDF (Term Frequency-Inverse Document Frequency)

● TF-IDF helps determine how important a word is in a document relative to the entire
collection.

● Formula:
● TF (Term Frequency) → How often a term appears in a document.
● IDF (Inverse Document Frequency) → How unique a term is across all documents.

Example:

If the word "retrieval" appears 10 times in one document but only in 2 out of 1000 documents,
it will have a higher TF-IDF score, making it more important for ranking.

4. Word Embedding Similarity (Word2Vec, GloVe, fastText)

● Unlike simple word matching, word embeddings represent words as dense vectors in
a continuous space.
● Similarity is computed by averaging word embeddings in a document and comparing
them.
● Used in deep learning-based retrieval models.

Example:

"Car" and "Vehicle" might have similar embeddings because they often appear in similar
contexts.

Query Expansion

Query expansion improves search performance by modifying the original query to retrieve
more relevant results.

Steps in Query Expansion

1. Identifying Query Terms:

○ The search engine analyzes the user’s query and extracts keywords.
2. Expanding Query Terms:
○ New related words are added to improve search results.
3. Methods of Query Expansion:
○ Synonym Expansion: Adds words with similar meaning.
■ Example: "Big" → "Large"
○ Related Term Expansion: Adds words that are semantically related.
■ Example: "Laptop" → "Computer"
○ Concept Expansion: Uses knowledge bases like Wikipedia or WordNet to find
related concepts.
■ Example: "COVID-19" → "Pandemic"
○ Feedback-based Expansion: Uses top-ranked documents to find additional
relevant terms.
○ Automatic Expansion: Some search engines use predefined rules for
expansion.
4. Re-Ranking of Results:
○ Once the query is expanded, documents are ranked using TF-IDF, BM25, or
neural ranking models.

Query Expansion with Local Analysis

Query expansion enhances a user’s search query by adding relevant terms, improving search
engine performance.

How Local Analysis Helps in Query Expansion

Local analysis analyzes the terms within a set of retrieved documents (rather than relying
on external knowledge bases) to understand the user's intent better. This improves search
relevance.

1. Tokenization
○ Splitting text into individual words or phrases.
○ Example:
■ Query: "Best programming languages for AI"
■ Tokens: ["Best", "programming", "languages", "for",
"AI"]
2. Stopword Removal
○ Removing common words (e.g., the, is, and, or) that do not impact search
results.
○ Example:
■ Before: ["Best", "programming", "languages", "for",
"AI"]
■ After: ["Best", "programming", "languages", "AI"]
3. Stemming & Lemmatization
○ Stemming: Reduces words to their root form (e.g., running → run).
○ Lemmatization: Converts words to their base dictionary form (e.g., better →
good).
4. Term Frequency (TF) Calculation
○ Determines how often a term appears in a document.
○ Example:
■ Document 1: "AI is transforming industries. AI is important."
■ TF of “AI” = 2 (appears twice).
5. Term Importance Weighting
○ Words that appear frequently in many documents are less important.
○ Example: TF-IDF assigns higher importance to unique words.
6. Semantic Analysis
○ Finds contextually similar words.
○ Example: "car" and "vehicle" are semantically related.
7. Query Expansion Using Local Analysis
○ Based on these steps, related terms (synonyms, similar words) are added to
improve search relevance.

Query Expansion with External Resources

Unlike local analysis, this method uses external knowledge sources like:

● Lexical databases (e.g., WordNet)

● Ontologies (structured concepts)
● Knowledge graphs (e.g., Google Knowledge Graph)
● Domain-specific dictionaries (e.g., medical or legal glossaries)

Steps for Query Expansion with External Resources

1. Identifying External Resources

○ Examples:
■ WordNet → Provides synonyms, hyponyms, hypernyms.
■ Wikipedia → Retrieves conceptually related terms.
2. Extracting Additional Terms
○ The search engine queries the external source and extracts related words.
○ Example:
■ User Query: "Renewable energy sources"
■ External resource (Wikipedia) suggests: "solar energy, wind power,
hydropower"
3. Expanding the Query
○ The new terms are merged with the original query.
○ Example:
■ Original: "Renewable energy"
■ Expanded: "Renewable energy, solar power, wind energy,
hydropower"
4. Weighting & Integration
○ More relevant terms (from trusted sources) get higher priority.
○ Example: Terms from an expert database (e.g., IEEE research papers) get more
weight.
5. Retrieval & Ranking
○ The expanded query is used for document retrieval.
○ Ranking methods like:
■ TF-IDF
■ BM25 (improved TF-IDF)
■ Neural ranking models (AI-based)
6. Evaluation & Feedback
○ The effectiveness of query expansion is measured using:
■ Precision (Are results correct?)
■ Recall (Are all relevant results retrieved?)
■ User Satisfaction (Do users find useful information?)

1. Document Normalization

Document normalization is the process of standardizing text data to improve the efficiency,
accuracy, and reliability of information retrieval (IR) and text processing.

Why is Document Normalization Important?

When dealing with large document collections, there may be inconsistencies due to:

● Different word forms (e.g., "running" vs. "ran")

● Noise (extra spaces, special characters, inconsistent formatting)
● Redundant content (duplicate data)
● Irregular capitalization (e.g., "USA" vs. "usa")
Normalization removes these inconsistencies, making searches more accurate and
computational processes faster.

Key Techniques in Document Normalization

Here are some common techniques used in document normalization:

1. Lowercasing
○ Converts all text to lowercase for uniformity.
○ Example: "Machine Learning" → "machine learning"
○ Benefit: Ensures that searches are case-insensitive.
2. Stopword Removal
○ Removes common words that do not add much meaning to the search (e.g.,
"the", "is", "and").
○ Example: "the big brown fox" → "big brown fox"
○ Benefit: Reduces storage and processing time.
3. Stemming and Lemmatization
○ Stemming reduces words to their base form by chopping off suffixes.
■ Example: "running", "runs", "ran" → "run"
○ Lemmatization converts words into their dictionary form.
■ Example: "better" → "good"
○ Benefit: Treats different word variations as the same term.
4. Punctuation & Special Character Removal
○ Eliminates unnecessary symbols.
○ Example: "Hello, world!" → "Hello world"
○ Benefit: Prevents indexing of irrelevant characters.
5. Deduplication
○ Removes duplicate documents from the database.
○ Example: Two identical news articles are merged into one.
6. Whitespace and Formatting Standardization
○ Ensures consistent spacing and formatting.
○ Example:
■ "Machine Learning" → "Machine Learning"
■ "Hello\nWorld" → "Hello World"

Impact of Document Normalization

Normalization significantly enhances information retrieval in the following ways:

1. Indexing Efficiency

○ Why? Standardized documents allow faster indexing and use less storage
space.
○ Example: Instead of storing different forms of "run" ("running", "ran"), the system
stores just "run".
2. Improved Search Relevance
○ Why? Normalization ensures different word variations are treated the same.
○ Example:
■ Query: "color"
■ Document contains "colours" → Without normalization, it might be
missed.
3. Reduced Redundancy
○ Why? Eliminates duplicate content.
○ Example: Wikipedia pages with identical summaries can be merged into one.
4. Consistency Across Documents
○ Why? Helps in structured comparisons.
○ Example: "U.S.A.", "USA", and "United States" all become "USA".
5. Enhanced Computational Efficiency
○ Why? Reducing unnecessary variations decreases processing time.
○ Example: If query processing time reduces from 0.8s to 0.3s, searches become
faster.
6. Interoperability and Integration
○ Why? Standardized documents can be used across multiple platforms.

Multi-field Retrieval

Multi-field retrieval is an advanced search approach where documents are indexed and
retrieved based on multiple attributes (or fields) rather than just plain text content.

Why is Multi-field Retrieval Important?

Documents often contain rich metadata that helps improve search precision. Instead of
searching only in the main text, multi-field retrieval considers other relevant fields like:

● Title
● Author
● Date
● Categories
● Keywords
● Abstract
● References

For example, a research paper search engine can rank results differently based on:

● Title relevance (most important)

● Abstract match (secondary importance)
● Keywords and references (lower importance)

How Multi-field Retrieval Works

Each field gets a score based on its relevance, and these scores are combined to produce a
final ranking score.

Example:
Imagine you are searching for research papers on "deep learning in healthcare".

Final Relevance Score = 0.5 + 0.3 + 0.2 = 1.0

This paper would rank higher than another paper with a lower score.

Techniques in Multi-field Retrieval

1. Field Weighting

○ Assigns different importance levels to different fields.
○ Example:
■ In a job search engine, "Job Title" might have higher priority than
"Job Description".
2. Field-specific Ranking Algorithms
○ TF-IDF for abstract
○ BM25 for main text
○ Neural ranking models for author relevance
3. Query Expansion Based on Fields
○ If a query matches keywords, related fields are searched too.
○ Example: Searching "Quantum Computing" might also search "Physics" or
"AI" fields.

Applications of Multi-field Retrieval

1. Search Engines

○ Google ranks pages based on title, meta description, headings, and body.
2. Academic Research
○ ResearchGate and Google Scholar use title, author, abstract, and references
for ranking.
3. E-commerce
○ Amazon searches across product title, description, brand, and category.
4. Job Portals
○ Title, Company, Location, and Job Type are all considered.
Evaluation of Retrieval

Retrieval evaluation assesses how well an Information Retrieval (IR) system performs when
fetching relevant documents based on user queries.
A good retrieval system should:

● Retrieve relevant documents.

● Display the most useful results at the top.
● Ensure fast and efficient search operations.
● Improve user satisfaction by reducing irrelevant results.

Why is Retrieval Evaluation Important?

❌
Without proper evaluation, search engines may:

❌
Rank irrelevant results higher

❌
Miss out on important documents

❌
Take too long to return results
Fail to understand user intent properly

Thus, evaluating retrieval performance helps identify weaknesses and improve the system.

Key Retrieval Evaluation Metrics

Different evaluation metrics assess different aspects of search effectiveness.

1. Precision, Recall, and F1-score

● These are standard IR metrics used to measure retrieval accuracy.

● Precision: Measures how many retrieved documents are actually relevant
● Example:
○ Query: "Machine Learning"
○ Retrieved Documents: {D1, D2, D3, D4, D5}
○ Relevant Documents: {D1, D2, D4}
○ Precision = 3/5 = 0.6 (60%)

Recall: Measures how many relevant documents were actually retrieved

● Example:
○ If total relevant documents in the dataset are {D1, D2, D3, D4, D5, D6,
D7}
○ Recall = 3/7 = 0.42 (42%)

F1-score: Balances Precision and Recall

○ Example: If Precision = 60% and Recall = 42%,

■ F1-score = 49%

2. Mean Average Precision (MAP)

● Evaluates how well relevant documents are ranked in a set of queries.

● Computes the average precision across multiple queries.
● Higher MAP → Better ranking performance.

3. Normalized Discounted Cumulative Gain (NDCG)

● Measures the ranking quality of retrieved documents.

● Key idea:
○ Highly relevant documents should appear at the top.
○ Lower-ranked documents contribute less to the score.

● Formula:
● where DCG discounts results based on their rank, and IDCG is the best possible DCG
score.
● Example:
○ If D1 is highly relevant but ranked at position 5, its impact is lower than if it
were ranked at 1.

Other Evaluation Techniques

✅ User Feedback: Direct responses from users on search quality.

✅ Click-through Rate (CTR): How often users click retrieved results.
✅ Dwell Time: How long users stay on a document before returning to search results.
2. Disadvantages of Information Retrieval (IR)

While IR systems improve access to vast amounts of information, they also come with
challenges and limitations.

1. Information Overload

● Issue: Too many results, making it hard to find relevant information.

● Example: A Google search for "AI" returns millions of results, overwhelming users.
● Solution: Improve ranking algorithms to filter out low-quality results.

2. Precision-Recall Trade-off

● Issue:
○ High Precision (accurate results) → Misses some relevant documents.
○ High Recall (retrieves more documents) → Includes irrelevant results.
● Example:
○ A medical search engine retrieving only the top 5 articles (high precision)
might miss some relevant studies (low recall).
● Solution: Use hybrid ranking models (e.g., TF-IDF + Neural Networks) to balance
precision and recall.

3. Vocabulary Mismatch

● Issue: Users might use different words than those in documents.

● Example:
○ Query: "COVID treatment"
○ Document uses: "SARS-CoV-2 therapy"
○ Standard search fails because the words don’t match.

✅
● Solution:
Query expansion (e.g., "COVID treatment" → "coronavirus therapy")
✅ Semantic analysis (e.g., using Word2Vec for similar words)

4. Context Sensitivity

● Issue: Same words have different meanings.

● Example: "Apple"
🍏 🍏
💻 💻
○ Fruit
○ Tech Company

✅
● Solution:

✅
User intent detection using machine learning models.
Personalized search (if a user recently searched for iPhones, prioritize Apple Inc.).

5. Bias and Fairness Issues

● Issue: IR systems can reflect algorithmic and data biases.

● Example:
○ A job search engine might prioritize male-dominated professions if trained on
biased data.

✅
● Solution:

✅
Diverse training datasets
Bias-aware algorithms

6. Scalability & Efficiency

● Issue: IR systems struggle to process large-scale data.

● Example:
○ Google handles 3.5 billion searches daily → Requires highly optimized
infrastructure.

✅
● Solution:

✅
Efficient indexing (e.g., Inverted Index, Elasticsearch)
Parallel computing (e.g., Hadoop, Spark)

7. Difficulty in Evaluation

● Issue: Determining relevance is subjective.

● Example:
○ A news article on AI may be relevant for a tech researcher but not for a high
school student.

✅
● Solution:

✅
User-based relevance feedback
Customizable ranking models

8. Privacy & Security Concerns

● Issue: IR systems handle sensitive user data.

● Example:
○ A health search engine storing personal medical queries risks data
breaches.
✅
● Solution:

✅
End-to-end encryption
User consent for data storage

1. Incentives of Engaging with E-commerce

E-commerce platforms encourage customer engagement using various strategies, making

online shopping more attractive and seamless.

1.1. User-Generated Content (UGC)

● E-commerce platforms encourage customers to leave reviews and ratings.

● Why? Customers trust peer reviews more than advertisements.
● Example:
○ On Amazon, products with thousands of reviews and high ratings sell better
than those with few or no reviews.
○ TripAdvisor ranks hotels based on customer feedback, helping travelers make
better choices.

✅ Benefit: Builds social proof → More trust → More purchases.

1.2. Real-Time Inventory Updates

● E-commerce platforms track product availability in real-time.

● Why? Prevents users from ordering out-of-stock items.
● Example:
○ Nike’s online store updates stock instantly after a purchase.

✅
○ Amazon shows:
■ "Only 5 left in stock" → Encourages quick buying.
■ ❌ "Out of stock" → Prevents false expectations.
✅ Benefit: Reduces cart abandonment and improves customer experience.
1.3. Personalized Product Recommendations

● Platforms use AI-based algorithms to recommend relevant products.

● Why? Increases chances of purchase by showing products the customer is already
interested in.
● Example:
○ Netflix recommends movies based on viewing history.
○ Amazon suggests:
■ "Customers who bought this also bought..."
■ "You may also like..."
✅ Benefit: Increases sales by suggesting relevant products.
1.4. Dynamic Pricing

● Prices change based on demand, competition, and user behavior.

● Why? Maximizes profits and attracts price-sensitive customers.
● Example:
○ Uber’s surge pricing: Higher demand = Higher fares.
○ Amazon price changes: Prices of items like electronics fluctuate daily.

✅ Benefit: Ensures competitive pricing and higher revenue.

1.5. Search Relevance

● Optimized search algorithms ensure customers find the right products quickly.
● Why? Poor search = Frustrated customers = Fewer purchases.
● Techniques used:
○ Autocomplete: Predicts what the user is typing.
○ Spell correction: "iphne" → "iPhone"
○ Synonym matching: "sofa" = "couch"

✅ Benefit: Improves user experience and boosts engagement.

1.6. Targeted Marketing Campaigns

● E-commerce companies use customer data to send personalized promotions.

● Why? More relevant ads = Higher engagement.
● Example:
○ Amazon sends personalized discount emails: "20% off on items in your
wishlist!"
○ Facebook ads show products you recently viewed.

✅ Benefit: Higher conversion rates and customer retention.

1.7. Cross-Selling & Upselling

● Cross-selling: Suggests related products.

● Upselling: Encourages customers to buy a better version.
● Example:
○ Amazon: "People also bought..."
○ Apple Store: "Upgrade to iPhone Pro for just $200 more!"

✅ Benefit: Increases order value.

1.8. Interactive Shopping Experiences

● Technologies like Augmented Reality (AR) improve online shopping.

● Why? Enhances the customer experience.
● Example:
○ Lenskart’s virtual try-on lets users see how glasses fit before buying.
○ IKEA’s AR app lets customers see furniture in their home before purchase.

✅ Benefit: Reduces returns and increases engagement.

2. Forces Behind E-commerce

E-commerce growth is driven by several factors, from technology to consumer behavior.

2.1. Technological Innovation

● Faster internet speeds, better mobile devices, and secure payments have boosted
online shopping.
● Example:
○ 5G networks enable faster mobile shopping.
○ Google Pay & PayPal ensure safe transactions.

✅ Impact: Faster transactions & better accessibility.

2.2. Globalization

● Why? E-commerce removes borders—customers can buy from anywhere.

● Example:
○ Alibaba & Amazon allow international shipping.
○ Etsy helps local artists sell globally.

✅ Impact: Increased cross-border trade.

2.3. Consumer Behavior Shifts

● Consumers prefer convenience over physical shopping.

● Why? Online shopping offers variety, better prices, and home delivery.
● Example:
○ COVID-19 pandemic boosted online grocery shopping.
○ Flipkart’s Big Billion Sale attracts millions.

✅ Impact: Growth in e-commerce adoption.

2.4. Market Competition
● Why? More competition forces companies to innovate.
● Example:
○ Amazon, Flipkart, and Reliance JioMart compete in India’s e-commerce
market.
○ Amazon Prime’s free delivery forces rivals to offer better perks.

✅ Impact: Lower prices & better services for customers.

2.5. Digital Transformation

● Why? Even traditional retailers are moving online.

● Example:
○ Reliance, Walmart, and H&M now have online stores.

✅ Impact: Brick-and-mortar stores are evolving into e-commerce businesses.

2.6. Mobile Commerce (M-commerce)

● Why? Mobile shopping is fast and easy.

● Example:
○ Amazon, Myntra, and Flipkart apps drive huge sales via smartphones.
○ UPI & PayTM enable instant mobile payments.

✅ Impact: Mobile shopping is growing rapidly.

2.7. Data-driven Insights

● Why? AI analyzes customer behavior & trends.

● Example:
○ Amazon uses AI to recommend products based on past purchases.
○ Netflix personalizes movie recommendations.

✅ Impact: Higher sales through personalization.

2.8. Logistics & Fulfillment Innovations

● Why? Fast shipping & order tracking improve satisfaction.

● Example:
○ Amazon Prime delivers in 1 day.
○ DHL & FedEx use AI for route optimization.

✅ Impact: Faster, reliable deliveries.

2.9. Regulatory Environment
● Why? Governments regulate consumer protection, taxation, and privacy.
● Example:
○ GDPR protects customer data privacy in Europe.
○ India’s new e-commerce rules ensure fair pricing.

✅ Impact: Trust in online shopping increases.

Summary
● E-commerce engagement is driven by personalized recommendations, interactive
shopping, and targeted marketing.
● E-commerce growth depends on technology, globalization, consumer behavior, and
regulatory policies.
● Emerging trends like AI, AR, and data analytics are shaping the future of e-commerce.

IRS Unit 4
No ratings yet
IRS Unit 4
63 pages
Information Retrieval
No ratings yet
Information Retrieval
72 pages
MSC IR 2021
100% (1)
MSC IR 2021
188 pages
Query Expansion
No ratings yet
Query Expansion
31 pages
Unit 4
No ratings yet
Unit 4
61 pages
Query and Document Expansion in Text Retrieval: Clara Isabel Cabezas University of Maryland College Park May, 2 2000
No ratings yet
Query and Document Expansion in Text Retrieval: Clara Isabel Cabezas University of Maryland College Park May, 2 2000
37 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
NLP Question
No ratings yet
NLP Question
38 pages
Irs Unit-4 Notes - 241202 - 150037
No ratings yet
Irs Unit-4 Notes - 241202 - 150037
18 pages
Final Seminar Report
100% (2)
Final Seminar Report
18 pages
Introduction IR
No ratings yet
Introduction IR
61 pages
7 B - Query Languages
No ratings yet
7 B - Query Languages
33 pages
Asddas
No ratings yet
Asddas
34 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
Text Similarity Algorithms
No ratings yet
Text Similarity Algorithms
28 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
61 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Module 1 Part BInformation Retrieval Webdocuments
No ratings yet
Module 1 Part BInformation Retrieval Webdocuments
49 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
34 pages
Irs Unit-4 Modified
No ratings yet
Irs Unit-4 Modified
13 pages
Chap 6
No ratings yet
Chap 6
70 pages
Relevance Feedback
No ratings yet
Relevance Feedback
16 pages
Module 6 Updated Final
No ratings yet
Module 6 Updated Final
48 pages
Automatic Indexing
No ratings yet
Automatic Indexing
26 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Module 7
No ratings yet
Module 7
53 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
IR Presentation 1
No ratings yet
IR Presentation 1
41 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
10 pages
Mod4 NLP
No ratings yet
Mod4 NLP
53 pages
Emutye
No ratings yet
Emutye
20 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
NLP Ir
No ratings yet
NLP Ir
24 pages
Introduction To Information Retrieval: Courtesy
No ratings yet
Introduction To Information Retrieval: Courtesy
61 pages
Multimedia Information Retrieval (CSC 545) : The Problem of IR
No ratings yet
Multimedia Information Retrieval (CSC 545) : The Problem of IR
29 pages
Bulu
No ratings yet
Bulu
47 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Irs Assignment
No ratings yet
Irs Assignment
12 pages
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
No ratings yet
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
61 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
15 pages
Unit 5 - Query Operations and Languages
No ratings yet
Unit 5 - Query Operations and Languages
11 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
48 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
Contextual Information Search Based On Domain Using Query Expansion
No ratings yet
Contextual Information Search Based On Domain Using Query Expansion
4 pages
Chap - Week8 - Queries and Information Needs
No ratings yet
Chap - Week8 - Queries and Information Needs
44 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
5 pages
Irs Unit - 4
No ratings yet
Irs Unit - 4
29 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Information Retrieval Systems Slip Test 2
No ratings yet
Information Retrieval Systems Slip Test 2
10 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
Query Operation 2021
No ratings yet
Query Operation 2021
35 pages
IR Problem: Introduction To Information Retrieval Outline
No ratings yet
IR Problem: Introduction To Information Retrieval Outline
11 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
Uday Pratap Singh, 45, C 1
No ratings yet
Uday Pratap Singh, 45, C 1
9 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
ComMON5 Installation
No ratings yet
ComMON5 Installation
10 pages
Deep Learning Laboratory Course Outcome Printout
No ratings yet
Deep Learning Laboratory Course Outcome Printout
7 pages
ISC Linked List
No ratings yet
ISC Linked List
22 pages
Ai 102
No ratings yet
Ai 102
34 pages
Basics of Retrieval-Augmented Generation or RAG
No ratings yet
Basics of Retrieval-Augmented Generation or RAG
2 pages
Linear Search and Binary Search
No ratings yet
Linear Search and Binary Search
11 pages
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
11 pages
CH02-Cryptographic Tools - 2
No ratings yet
CH02-Cryptographic Tools - 2
44 pages
Handbook of Statistics 24 Data Mining and Data Visualization C.R. Rao 2024 Scribd Download
100% (4)
Handbook of Statistics 24 Data Mining and Data Visualization C.R. Rao 2024 Scribd Download
84 pages
Unlocking The Power of Machine Learning
No ratings yet
Unlocking The Power of Machine Learning
10 pages
Sudhanshu Kumar: Resume
No ratings yet
Sudhanshu Kumar: Resume
3 pages
Model Exam Version 3
No ratings yet
Model Exam Version 3
19 pages
Proposal PHP
No ratings yet
Proposal PHP
2 pages
Data Base Revolutions First Second and Third Generations
No ratings yet
Data Base Revolutions First Second and Third Generations
10 pages
How To Pass The MLA-C01 AWS Exam 2025
No ratings yet
How To Pass The MLA-C01 AWS Exam 2025
3 pages
Text Generation:Use Technique Like Markov Models or LSTM Network To Generate Realistic Text in A Specific Style or Genre
No ratings yet
Text Generation:Use Technique Like Markov Models or LSTM Network To Generate Realistic Text in A Specific Style or Genre
7 pages
Micron Partner Page
No ratings yet
Micron Partner Page
2 pages
Implementing Post-Quantum Cryptography For Developers: January 2022
No ratings yet
Implementing Post-Quantum Cryptography For Developers: January 2022
12 pages
Software (1) Application Software
No ratings yet
Software (1) Application Software
2 pages
Chapter 10 Reviewer - Cryptography
No ratings yet
Chapter 10 Reviewer - Cryptography
3 pages
Batch 10 Signature Verification
No ratings yet
Batch 10 Signature Verification
12 pages
Abhishek-Patil React JS
No ratings yet
Abhishek-Patil React JS
1 page
De Shaw
No ratings yet
De Shaw
1 page
JCTC: A Large Job Posting Corpus For Text Classification: Haoyu Xu
No ratings yet
JCTC: A Large Job Posting Corpus For Text Classification: Haoyu Xu
15 pages
Mazin CV DATA PROFILE
No ratings yet
Mazin CV DATA PROFILE
1 page
Cse 6 Sem Ai Artificial Intelligence 3490 Summer 2019
No ratings yet
Cse 6 Sem Ai Artificial Intelligence 3490 Summer 2019
2 pages
A Framework For The Structural Analysis of Rest Apis: 2017 IEEE International Conference On Software Architecture
No ratings yet
A Framework For The Structural Analysis of Rest Apis: 2017 IEEE International Conference On Software Architecture
4 pages
Kunal's Resume
No ratings yet
Kunal's Resume
1 page

What Is Information Retrieval (IR)

Uploaded by

What Is Information Retrieval (IR)

Uploaded by

Similarity Function in Information Retrieval (IR)

●​ The nature of the data

Now, let's go over different types of similarity functions one by one.

1. Vector Space Model (Cosine Similarity)

Let's say we have two documents:

Document 1: "Information retrieval uses similarity functions."​

Document 1: {information, retrieval, similarity, functions}​

A higher Jaccard similarity means two documents share more words.

3. TF-IDF (Term Frequency-Inverse Document Frequency)

4. Word Embedding Similarity (Word2Vec, GloVe, fastText)

Steps in Query Expansion

1.​ Identifying Query Terms:

Query Expansion with Local Analysis

How Local Analysis Helps in Query Expansion

Query Expansion with External Resources

●​ Lexical databases (e.g., WordNet)

Steps for Query Expansion with External Resources

1.​ Identifying External Resources

Why is Document Normalization Important?

●​ Different word forms (e.g., "running" vs. "ran")

Key Techniques in Document Normalization

Here are some common techniques used in document normalization:

Impact of Document Normalization

Normalization significantly enhances information retrieval in the following ways:

1.​ Indexing Efficiency

Why is Multi-field Retrieval Important?

●​ Title relevance (most important)

How Multi-field Retrieval Works

Final Relevance Score = 0.5 + 0.3 + 0.2 = 1.0​

Techniques in Multi-field Retrieval

1.​ Field Weighting

Applications of Multi-field Retrieval

1.​ Search Engines

●​ Retrieve relevant documents.

Why is Retrieval Evaluation Important?

Key Retrieval Evaluation Metrics

Different evaluation metrics assess different aspects of search effectiveness.

1. Precision, Recall, and F1-score

●​ These are standard IR metrics used to measure retrieval accuracy.

Recall: Measures how many relevant documents were actually retrieved

F1-score: Balances Precision and Recall

○​ Example: If Precision = 60% and Recall = 42%,

2. Mean Average Precision (MAP)

●​ Evaluates how well relevant documents are ranked in a set of queries.

3. Normalized Discounted Cumulative Gain (NDCG)

●​ Measures the ranking quality of retrieved documents.

Other Evaluation Techniques

✅ User Feedback: Direct responses from users on search quality.​

●​ Issue: Too many results, making it hard to find relevant information.

●​ Issue: Users might use different words than those in documents.

●​ Issue: Same words have different meanings.

5. Bias and Fairness Issues

●​ Issue: IR systems can reflect algorithmic and data biases.

6. Scalability & Efficiency

●​ Issue: IR systems struggle to process large-scale data.

●​ Issue: Determining relevance is subjective.

8. Privacy & Security Concerns

●​ Issue: IR systems handle sensitive user data.

1. Incentives of Engaging with E-commerce

E-commerce platforms encourage customer engagement using various strategies, making

1.1. User-Generated Content (UGC)

●​ E-commerce platforms encourage customers to leave reviews and ratings.

✅ Benefit: Builds social proof → More trust → More purchases.

●​ E-commerce platforms track product availability in real-time.

●​ Platforms use AI-based algorithms to recommend relevant products.

●​ Prices change based on demand, competition, and user behavior.

✅ Benefit: Ensures competitive pricing and higher revenue.

✅ Benefit: Improves user experience and boosts engagement.

●​ E-commerce companies use customer data to send personalized promotions.

✅ Benefit: Higher conversion rates and customer retention.

●​ Cross-selling: Suggests related products.

✅ Benefit: Increases order value.

●​ Technologies like Augmented Reality (AR) improve online shopping.

✅ Benefit: Reduces returns and increases engagement.

E-commerce growth is driven by several factors, from technology to consumer behavior.

2.1. Technological Innovation

● The nature of the data

Document 1: "Information retrieval uses similarity functions."

Document 1: {information, retrieval, similarity, functions}

1. Identifying Query Terms:

● Lexical databases (e.g., WordNet)

1. Identifying External Resources

● Different word forms (e.g., "running" vs. "ran")

1. Indexing Efficiency

● Title relevance (most important)

Final Relevance Score = 0.5 + 0.3 + 0.2 = 1.0

1. Field Weighting

1. Search Engines

● Retrieve relevant documents.

● These are standard IR metrics used to measure retrieval accuracy.

○ Example: If Precision = 60% and Recall = 42%,

● Evaluates how well relevant documents are ranked in a set of queries.

● Measures the ranking quality of retrieved documents.

✅ User Feedback: Direct responses from users on search quality.

● Issue: Too many results, making it hard to find relevant information.

● Issue: Users might use different words than those in documents.

● Issue: Same words have different meanings.

● Issue: IR systems can reflect algorithmic and data biases.

● Issue: IR systems struggle to process large-scale data.

● Issue: Determining relevance is subjective.

● Issue: IR systems handle sensitive user data.

● E-commerce platforms encourage customers to leave reviews and ratings.

● E-commerce platforms track product availability in real-time.

● Platforms use AI-based algorithms to recommend relevant products.

● Prices change based on demand, competition, and user behavior.

● E-commerce companies use customer data to send personalized promotions.

● Cross-selling: Suggests related products.

● Technologies like Augmented Reality (AR) improve online shopping.

● Why? E-commerce removes borders—customers can buy from anywhere.

● Consumers prefer convenience over physical shopping.

● Why? Even traditional retailers are moving online.

● Why? Mobile shopping is fast and easy.

● Why? AI analyzes customer behavior & trends.

● Why? Fast shipping & order tracking improve satisfaction.