N Gram Data Structure in Information Retrieval Systems

N-gram models are essential data structures in information retrieval systems that analyze sequences of text or speech, supporting tasks like query prediction and document indexing. They come in various forms, including unigrams, bigrams, and trigrams, each capturing different levels of language context. Despite challenges like data sparsity and storage overhead, n-grams remain vital for understanding language patterns and are being integrated with modern machine learning techniques.

Uploaded by

Navaneeth Nani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views8 pages

N Gram Data Structure in Information Retrieval Systems

Uploaded by

Navaneeth Nani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

N-Gram Data Structure in

Information Retrieval
Systems
Presented by Navaneeth

N-gram models are foundational data structures in information retrieval

systems, used to analyze and represent sequences of text or speech. They
capture contiguous sequences of n items (usually words or characters) from
a given text corpus, enabling systems to predict and understand language
patterns. In information retrieval, n-grams support tasks such as query
prediction, spelling correction, and document indexing by providing
contextual relationships within the data.

by Navaneeth indarapu
Understanding N-Grams:
Definition and Types
Unigrams Bigrams
Single word sequences. They Pairs of consecutive words.
represent the simplest form of Bigrams capture short-range
n-grams and capture individual dependencies such as common
lexical units without context. phrases and word collocations.

Trigrams and Higher

Sequences of three or more words. They provide richer contextual
information and enable modeling of more complex language
structures.
Construction of N-Gram
Models
Tokenization
Split the raw text into tokens such as words or characters,
forming the basic units for n-gram extraction.

N-Gram Extraction
Generate sequences of n contiguous tokens to form the n-
grams, capturing structural patterns.

Frequency Counting
Count occurrences of each n-gram, which quantifies their
relevance and importance within the corpus.
Applications in Information Retrieval
Query Expansion Spell Correction Document Indexing

N-grams enhance user queries by By analyzing probable n-gram N-grams help index documents
suggesting relevant phrases based on sequences, systems can detect and efficiently by capturing meaningful
common co-occurrences, improving correct misspelled words to refine sequences, supporting fast and accurate
search precision. retrieval results. content matching.
Advantages of N-Gram
Data Structures
Language Context Simplicity and
Capture Efficiency
They provide valuable N-gram models are
contextual cues beyond straightforward to
single words, enabling implement and fast to
better linguistic compute, even on large
representation. datasets.

Versatility
Applicable across various languages and tasks in natural language
processing and information retrieval.
Challenges and Limitations
Sparsity Problem Limited Long-range
Context
Higher-order n-grams often
suffer from data sparsity, N-grams capture fixed-length
making it difficult to estimate sequences and may fail to
probabilities accurately. model dependencies spanning
beyond the chosen n.

Storage Overhead
Storing and managing large n-gram datasets can be resource-intensive
for extensive corpora.
Techniques to Mitigate Challenges
Backoff Models
These models back off to lower-order n-

Smoothing Methods 2 grams when higher-order statistics are

unreliable, improving robustness.
Techniques like Laplace and Kneser-
Ney smoothing help allocate 1
probabilities to unseen n-grams, Pruning
addressing sparsity.
3 Remove rare and less informative n-grams
to reduce storage needs and improve
processing speed.
Summary and Future
Perspectives
Core Role of N-Grams
Despite emergence of advanced models, n-grams remain crucial for
understanding language patterns in information retrieval.

Integration with Modern ML

They complement machine learning methods by providing
structured input features and baseline heuristics.

Ongoing Research
Exploration continues on hybrid models combining n-grams
with neural embeddings for enhanced retrieval performance.

N Gram
No ratings yet
N Gram
6 pages
NLP Module 2
No ratings yet
NLP Module 2
18 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
Week 4
No ratings yet
Week 4
37 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Implementation of N-Gram Technique
No ratings yet
Implementation of N-Gram Technique
6 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
15 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Natural Language Processing - Notes - Unit 2
No ratings yet
Natural Language Processing - Notes - Unit 2
12 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
Natural Language Processing - Notes - Unit 2
No ratings yet
Natural Language Processing - Notes - Unit 2
19 pages
Advanced Indexing Issues
No ratings yet
Advanced Indexing Issues
52 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
17 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
N Grams
No ratings yet
N Grams
51 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
Probabilistic Theory in Natural Language Processing
No ratings yet
Probabilistic Theory in Natural Language Processing
15 pages
Growing An N-Gram Language Model
No ratings yet
Growing An N-Gram Language Model
6 pages
Unit 5
No ratings yet
Unit 5
26 pages
NLP CIE 1 Scheme and Solutions
No ratings yet
NLP CIE 1 Scheme and Solutions
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
CH 6
No ratings yet
CH 6
30 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
74 pages
A7 NLP Exp2
No ratings yet
A7 NLP Exp2
11 pages
Ngram Analysis
No ratings yet
Ngram Analysis
16 pages
NLTK - N-Gram LM
No ratings yet
NLTK - N-Gram LM
13 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Ai Unit 3 Part 2
No ratings yet
Ai Unit 3 Part 2
8 pages
A34 NLP Expt 02
No ratings yet
A34 NLP Expt 02
7 pages
Chapter 4 - Processing Text
No ratings yet
Chapter 4 - Processing Text
7 pages
The Unreasonable Effectiveness of Data PDF
No ratings yet
The Unreasonable Effectiveness of Data PDF
5 pages
NLP New
No ratings yet
NLP New
3 pages
Text Representation: Lecture # 6
No ratings yet
Text Representation: Lecture # 6
21 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
NLP Unit 4 Q & A
No ratings yet
NLP Unit 4 Q & A
17 pages
Final
No ratings yet
Final
14 pages
The Unreasonable Effectiveness of Data by Halevy, Norvig
No ratings yet
The Unreasonable Effectiveness of Data by Halevy, Norvig
5 pages
NLP Sentiment Analysis
No ratings yet
NLP Sentiment Analysis
7 pages
Module 5-Natural Language Processing
No ratings yet
Module 5-Natural Language Processing
13 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
They Are Basically A Set of Co-Occurring Words Within A Given Window
No ratings yet
They Are Basically A Set of Co-Occurring Words Within A Given Window
2 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
341-Forest Cover Type Prediction
100% (1)
341-Forest Cover Type Prediction
5 pages
Image Features Using Wavelets and Applications To Document Image Processing
No ratings yet
Image Features Using Wavelets and Applications To Document Image Processing
71 pages
Artificial Intelligence CS188 Midterm1 Solutions
No ratings yet
Artificial Intelligence CS188 Midterm1 Solutions
28 pages
Introduction To Differential Equations
No ratings yet
Introduction To Differential Equations
9 pages
Maths Unit 5
No ratings yet
Maths Unit 5
53 pages
Chapter 10 Volatility
No ratings yet
Chapter 10 Volatility
23 pages
NIST JANAF Table Temperatures Up To 30000k.
No ratings yet
NIST JANAF Table Temperatures Up To 30000k.
230 pages
Macroscopic and Large Scale Phenomena Coarse Graining Mean Field Limits and Ergodicity 1st Edition Adrian Muntean
No ratings yet
Macroscopic and Large Scale Phenomena Coarse Graining Mean Field Limits and Ergodicity 1st Edition Adrian Muntean
65 pages
Fall Detection For Elderly People Using Machine Learning
No ratings yet
Fall Detection For Elderly People Using Machine Learning
5 pages
FNCE Cheat Sheet Midterm 1
No ratings yet
FNCE Cheat Sheet Midterm 1
1 page
03 A Polynomial Linear Regression
No ratings yet
03 A Polynomial Linear Regression
6 pages
Cs8080 Information Retrieval Techniques
No ratings yet
Cs8080 Information Retrieval Techniques
10 pages
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
No ratings yet
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
86 pages
7 Statistical Thermodynamics-II
No ratings yet
7 Statistical Thermodynamics-II
30 pages
Collection Cheatsheet
No ratings yet
Collection Cheatsheet
12 pages
It6005-Digital Image Processing-737663277-It6005 Dip
No ratings yet
It6005-Digital Image Processing-737663277-It6005 Dip
13 pages
Lecture Notes2 - Vix-Variance Swap
No ratings yet
Lecture Notes2 - Vix-Variance Swap
11 pages
Ai HW
No ratings yet
Ai HW
21 pages
RJwrapper
No ratings yet
RJwrapper
24 pages
Week04Module03 FourierTransforms
No ratings yet
Week04Module03 FourierTransforms
13 pages
Assignment: University of Education Vehari Campus
No ratings yet
Assignment: University of Education Vehari Campus
7 pages
Stat 1st Year
No ratings yet
Stat 1st Year
2 pages
9ma0 03 Q S2
No ratings yet
9ma0 03 Q S2
1 page
Dav Exp3 66
No ratings yet
Dav Exp3 66
4 pages
CI DSA Study Guide PDF
No ratings yet
CI DSA Study Guide PDF
1 page
Design and Implementation of Sorting Algorithms Based On FPGA
No ratings yet
Design and Implementation of Sorting Algorithms Based On FPGA
4 pages
A Python Based Multi-Point Geostatistics by Using Direct Sampling Algorithm
No ratings yet
A Python Based Multi-Point Geostatistics by Using Direct Sampling Algorithm
4 pages
Process-Validation - General-Principles-and-Practices-11
No ratings yet
Process-Validation - General-Principles-and-Practices-11
1 page
ALGO Practice Session-I
No ratings yet
ALGO Practice Session-I
2 pages
KenLM: Efficient Language Modeling in Practice
From Everand
KenLM: Efficient Language Modeling in Practice
William Smith
No ratings yet
CoreNLP in Practice: Definitive Reference for Developers and Engineers
From Everand
CoreNLP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

N Gram Data Structure in Information Retrieval Systems

Uploaded by

N Gram Data Structure in Information Retrieval Systems

Uploaded by

N-Gram Data Structure in

N-gram models are foundational data structures in information retrieval

Trigrams and Higher

Smoothing Methods 2 grams when higher-order statistics are

Integration with Modern ML

You might also like