Chapter 2
Chapter 2
Chapter-2
Retrieval Model
• A retrieval model specifies the details of: –
Document representation
Query representation
Retrieval function
• For example in Document 1 the term game occurs two times. The
total number of terms in the document is 10.
1) = 1 + 1.098726209 = 2.098726209
IDF for terms occurring in all the documents
• For each term in the query multiply its normalized term frequency with
its IDF on each document.
• In Document1 for the term life the normalized term frequency is 0.1
and its IDF is 1.405507153.
• Given in the next slide is TF * IDF calculations for life and learning in
all the documents.
Step 4: Vector Space Model –
Cosine Similarity
• The query entered by the user can also be represented as a
vector.
• Calculate the TF*IDF for the query
Now calculate the cosine similarity
ofSimilarity(Query,Document1)
Cosine the query and Document1. = Dot product(Query,
Document1) / ||Query|| * ||Document1||
0.198768727354
• The idea is simply removing the words that occur commonly across all the documents in
the corpus.
• These words have no significance in some of the NLP tasks like information retrieval and
classification, which means these words are not very discriminative.
• On the contrary, in some NLP applications stop word removal will have very little impact.
• Most of the time, the stop word list for the given language is a well hand-curated list of
words that occur most commonly across corpuses.
Stemming
• Stemming, also called suffix stripping, is a technique used to
reduce text dimensionality. Stemming is also a type of text
normalization that enables you to standardize some words into
specific expressions also called stems.
Language models are the cornerstone of Natural Language Processing (NLP) technology. We have
been making the best of language models in our routine, without even realizing it. Let’s take a
look at some of the examples of language models:
1. Speech Recognition
• Voice assistants such as Siri and Alexa are examples of how language models help machines in
processing speech audio.
2. Machine Translation
• Google Translator and Microsoft Translate are examples of how NLP models can help in
translating one language to another.
3. Sentiment Analysis
• This helps in analyzing the sentiments behind a phrase. This use case of NLP models is used in
products that allow businesses to understand a customer’s intent behind opinions or attitudes
expressed in the text. Hubspot’s Service Hub is an example of how language models can help
in sentiment analysis.
4. Text Suggestions
• Google services such as Gmail or Google Docs use language models to help users get text
suggestions while they compose an email or create long text documents, respectively.
5. Parsing Tools
• Parsing involves analyzing sentences or words that comply with syntax or grammar rules. Spell
checking tools are perfect examples of language modelling and parsing.