Answer Key-3
Answer Key-3
Chatbots take three simple actions: understanding, acting on it, and answering. The chatbot analyzes
the user’s message in the first phase. Then, after interpreting what the user stated, it takes action in
accordance with a set of algorithms. Finally, it chooses one of several suitable answers.
Ideally, Alexa is a chatbot. Amazon recently unveiled a new feature for iOS that allows users to make
requests for Alexa and view responses on display.
Algorithms used by traditional chatbots are decision trees, recurrent neural networks, natural language
processing (NLP), and Naive Bayes.
Retrieval-based QA:
In retrieval-based QA, the system searches for predefined answers from a database
or a set of documents. When a question is asked, the system matches it with similar
questions or extracts relevant information from the database to provide an answer.
Example: A frequently asked questions (FAQ) chatbot that matches user questions to
predefined answers based on keywords or similarity.
Cluster-based Summarization:
Example: Given a set of news articles about a sports event, the system clusters articles by teams
involved. It then generates summaries for each cluster, offering various viewpoints.
Graph-based Summarization:
Example: Documents are represented as nodes in a graph, with edges denoting similarity.
Sentences with high centrality in the graph are selected for the summary, ensuring coverage of
the main points.
Centroid-based Summarization:
Example: For a collection of product reviews, the centroid (average) of TF-IDF vectors is
calculated. Sentences closest to this centroid, representing common sentiments, are chosen for
the summary.
16 b ii) MACHINE TRANSLATION
• Machine translation or MT, automatically translates text from one natural language into
another.
MT systems have evolved to handle linguistic challenges, including variations in phonetic typology
(sound patterns), recognizing and translating idioms (phrases with non-literal meanings), and
addressing linguistic anomalies or exceptions.
1. Statistical Machine Translation or SMT
• It expects to decide the correspondence between a word from the source language and a word
from the objective language. A genuine illustration of this is Google Translate.
2. Rule based MT
• RBMT basically translates the basics of grammatical rules. But, RBMT requires broad
editing, and its substantial reliance on dictionaries implies that proficiency is accomplished
after a significant period.
3. Hybrid Machine Translation
• HMT, as the term demonstrates, is a mix of RBMT and SMT. It uses a translation memory,
making it unquestionably more successful regarding quality.
4. Neural Machine Translation or NMT
• NMT is a type of machine translation that relies upon neural network models (based on the
human brain) to build statistical models with the end goal of translation.
17 a. LSTM
LSTMs Long Short-Term Memory is a type of RNNs Recurrent Neural Network that can detain long-
term dependencies in sequential data.
LSTMs are able to process and analyze
Example of LSTM Working:
• Here we have two sentences separated by a full stop.
• The first sentence is “Bob is a nice person,” and the second sentence is “Dan, on the Other
hand, is evil”.
• It is very clear, in the first sentence, we are talking about Bob, and as soon as we encounter
the full stop(.), we started talking about Dan.
• As we move from the first sentence to the second sentence, our network should realize that
we are no more talking about Bob.
• Now our subject is Dan. Here, the Forget gate of the network allows it to forget about it.
Forget Gate
• In a cell of the LSTM neural network, the first step is to decide whether we should keep the
information from the previous time step or forget it.
Input Gate
• “Bob knows swimming. He told me over the phone that he had served the navy for four long
years.”
• In the first sentence, we get the information that he knows swimming. Whereas the second
sentence tells, he uses the phone and served in the navy for four years.
Output Gate
• In the sentence, only Bob is brave, we cannot say the enemy is brave, or the country is brave.
So based on the current expectation, we have to give a relevant word to fill in the blank. That
word is our output, and this is the function of our Output gate.
17 b i) SEQUENCE LABELING
• Sequence tagging (or sequence labeling) refers to a set of Natural Language Processing (NLP)
tasks that assign labels or tags to tokens or other units of text.
• Common applications of sequence labeling include:
– Named Entity Recognition (NER)
– Part-of-Speech Tagging (POS)
• Sequence labeling is used for a wide range of NLP tasks, such as:
– Part-of-Speech Tagging
• POS Tagging is a helper task for many tasks about NLP:
▪ Word Sense Disambiguation, Dependency Parsing.
• POS tagging is the process of labeling the parts of speech (such as nouns,
verbs, and adjectives) in a sentence.
– Named Entity Recognition
• Named entity recognition (NER) is the task of identifying and classifying
named entities (such as people, organizations, and locations) in text.
• As POS tagging, it can be used to extract information from a large corpus
of texts and help identify more quickly what is wanted.
– Chunking
• Chunking is a task of sequence labeling that involves dividing a sequence of
words into chunks or non-overlapping sub-sequences.
• These chunks are typically tagged with a label that indicates their type or role
in the sequence.
– Semantic Role Labeling.
• While traditional models are based on corpus statistics (Hidden Markov
Models, Maximum Entropy Markov Models, Conditional Random Field,
etc.), recent models are based on neural networks (Recurrent Neural
Networks, Long Short-Term Memory, BERT, etc.).
17 bii) WORD EMBEDDINGS
• Features: Anything that relates words to one another. E.g.: Age, Sports, Fitness, Employed,
etc. Each word vector has values corresponding to these features.
• Goal of Word Embeddings
– To reduce dimensionality
– To use a word to predict the words around it
– Interword semantics must be captured
How are Word Embeddings used?
• They are used as input to machine learning models.
Take the words —-> Give their numeric representation —-> Use in training or inference
• Word2Vec:
• In Word2Vec every word is assigned a vector. We start with either a random vector or one-
hot vector.
• One-Hot vector: A representation where only one bit in a vector is 1.If there are 500 words in
the corpus then the vector length will be 500. After assigning vectors to each word we take a
window size and iterate through the entire corpus. While we do this there are two neural
embedding methods which are used:
• Bag of words (BOW)
• A bag of words is one of the popular word embedding techniques of text where each value in
the vector would represent the count of words in a document/sentence. In other words, it
extracts features from the text. We also refer to it as vectorization.
SKIP GRAM
• In this model, we try to make the central word closer to the neighboring words. It is the
complete opposite of the CBOW model. It is shown that this method produces more
meaningful embeddings.
SET A
11. RNN
RNNs Recurrent Neural Networks are a type of neural network that are designed to process sequential
data.
Types of RNN
1. One to One
2. One to Many
3. Many to One
4. Many to Many
• Semantic search offers results based on the user’s geographical context, the user’s past
search history, and user intent.
• Personalization uses the searcher’s previous searches and interactions to determine response
relevance and rank. Semantic search can also rerank results based on how other users have
interacted with the responses it has pulled. For example, when you type "restaurants" into
your search engine, it will produce results that are in your area.
• With a better understanding of user intent, semantic search can respond to a query like
"Creuset vs. Staub dutch ovens" with content that prioritizes product comparisons because
that is the user’s intent. Semantic search will recognize the intent behind “best Staub deals” or
"Creuset discounts" as intent to purchase and offer responses accordingly.
• Another example is predictive text. As you type a query into a search bar, it uses semantic
search to complete your query and suggest relevant search terms based on context, common
searches, and past search history.
Information fusion refers to the process of combining information from multiple sources, documents,
or modalities to create a unified and coherent representation of knowledgebase.
Techniques of Information Fusion
• Textual Data Integration
● Concatenation
● Weighted Summation
● Feature Concatenation
• Multimodal Fusion:
o Text-Image Fusion
o Text-Audio/Video Fusion
● Weighted Summation
● Feature Concatenation
• Multimodal Fusion:
o Text-Image Fusion
o Text-Audio/Video Fusion
Need of Information Fusion in NLP
6. Multimodal Understanding
7. Redundancy Reduction
8. Context Enhancement
9. Sentiment Analysis
10. Machine Translation
Techniques of Information Fusion
• Temporal Fusion
• Knowledge Graphs and Ontologies
• Sentiment and Emotion Fusion
16.a. SUMMARIZATION
• Summarization attempts to reduce a section of text to a smaller amount.
• It aims to either remove redundant or irrelevant information, or to draw attention immediately
to the most relevant part of a large document.
TYPES OF SUMMARIZATION
Summary Description
Conceptual Sentences that are typical of the document content, which can be from different parts of the
summary document.
Use this type of summary to give a general idea of what the document is about
Contextual A conceptual summary, biased to include sentences that are particularly relevant to
summary the query terms.
Use this type of summary to show the sections of the document that are most relevant to
the query
Quick The first few sentences of the document.
summary Use this type of summary to give a brief introduction to the document
Text Summarization
• In this approach we build algorithms or programs which will reduce the text size and create a
summary of our text data.
How does this text summarizer work?
• Trained by machine learning, text summarizer uses the concept of abstractive summarization
to summarize a book, an article, or a research paper.
Auto Summarization
• Auto summarization is the process of generating a concise and coherent summary of a longer
document or set of documents automatically.
• There are several approaches to auto summarization, including
• Extractive Summarization
▪ This approach entails the method to extract keywords and phrases from
sentences and then joining them to produce a compact meaningful summary.
• Abstractive Summarization
▪ In this summary generator, algorithms are developed in such a way to reproduce
a long text into a shorter one by NLP. It retains its meaning but changes the
structure of sentences.
Algorithms used for Extractive Summarization
1. TextRank
2. LexRank
3. LSA
Retrieval-based QA:
In retrieval-based QA, the system searches for predefined answers from a database
or a set of documents. When a question is asked, the system matches it with similar
questions or extracts relevant information from the database to provide an answer.
Example: A frequently asked questions (FAQ) chatbot that matches user questions to
predefined answers based on keywords or similarity.
Cluster-based Summarization:
Example: Given a set of news articles about a sports event, the system clusters articles by teams
involved. It then generates summaries for each cluster, offering various viewpoints.
Graph-based Summarization:
Example: Documents are represented as nodes in a graph, with edges denoting similarity.
Sentences with high centrality in the graph are selected for the summary, ensuring coverage of
the main points.
Centroid-based Summarization:
Example: For a collection of product reviews, the centroid (average) of TF-IDF vectors is
calculated. Sentences closest to this centroid, representing common sentiments, are chosen for
the summary.
Probabilistic modeling is a statistical approach (mathematical model) that incorporates the influence
of random events or actions to predict the likelihood of future outcomes.
Types of Probabilistic ML Model
• Generative Learning: In this approach, the goal is to learn the joint probability distribution
p(c, d), capturing the relationship between class labels (c) and the data (d).
• Discriminative Learning: Conversely, discriminative learning focuses on estimating the
conditional probability p(c/d), emphasizing the probability of class (c) given a specific data
point (d).
Statistical NLP involves using statistical models and algorithms to process and understand human
language, making it a subset of machine learning techniques applied to language-related tasks.
There are two main steps for preparing data for the machine to understand.
1. Text annotation and formatting: Any NLP task starts with data preparation.
• In NLP tasks, this process is called building a corpus.
• Corpora (plural for corpus) are collections of texts used for ML training. You can’t
simply feed the system your whole dataset of emails and expect it to understand what
you want from it. That’s why texts must be annotated — enhanced by providing a
larger meaning.
2. Model training and deployment.
• The prepared data is then fed to the algorithm for training.
Just like the rule-based approach requires linguistic knowledge to create rules, machine learning
methods are only as good as the quality of data and the accuracy of features created by data
scientists. This means that while ML is better at classification than rules, it falls short in two
directions:
• The complexity of feature engineering, which requires researchers to do massive
amounts of preparation, thus not achieving full automation with ML; and
• The curse of dimensionality, when the volumes of data needed grow exponentially
with the dimension of the model, thus creating data sparsity.
•
17b. SIMILARITY MEASURES
• Text similarity is a key concept in NLP, as it allows you to compare and analyze different
texts based on their content, structure, and meaning.
• Text similarity metrics can be broadly classified into two categories: lexical and semantic.
• Lexical similarity measures the degree of overlap or similarity between two texts based on
their words, characters, or n-grams.
• Semantic similarity measures the degree of similarity between two texts based on their
meaning, context, or concepts.
• Some examples of lexical similarity metrics are Euclidean distance, Jaccard index, and cosine
similarity.
• Some examples of semantic similarity metrics are WordNet, word embeddings, and
transformer models.
• The similarity measure is usually expressed as a numerical value; and these values indicate
the degree of similarity between data samples or objects.
1. High Similarity: A similarity measure of 1 typically means that the data objects are very
similar or nearly identical. In other words, a similarity score of 1 indicates a strong
resemblance or a high level of agreement between the objects being compared.
2. Low Similarity: A similarity measure of 0 typically means that the data objects are
dissimilar or have no shared characteristics. A score of 0 indicates that there is no
resemblance or agreement between the objects.
17bii) CBOW
• The continuous bag-of-words (CBOW) model is a neural network for natural language
processing tasks such as language translation and text classification. It is based on predicting
a target word given the context of the surrounding words. The CBOW model takes a window
of surrounding words as input and tries to predict the target word in the center of the window.
The CBOW model attempts to comprehend the context of the words around the target word to predict
it. Consider the previous phrase, "It is a pleasant day." The model transforms this sentence into word
pairs (context word and target word). The user must configure the window size. The word pairings
would appear like this if the context word's window were 2: ([it, a], is), ([is, nice], a) ([a, day],
pleasant). The model tries to predict the target term using these word pairings while considering the
context words.