Module 4
Module 4
Information Content
Information content, also known as self-information, is a
fundamental concept in information theory that quantifies the
amount of surprise or unexpectedness associated with an event or
outcome. It represents the minimum number of bits needed to
encode or communicate the occurrence of the event.
Information(X) = -log2(P(X))
Where:
Key points
Entropy
Entropy is a fundamental concept in information theory,
thermodynamics, and probability theory. In the context of
information theory, entropy measures the average amount of
uncertainty or surprise associated with a random variable or
probability distribution. It provides a way to quantify the amount of
information required to describe or represent a set of outcomes.
In information theory, entropy is often denoted by “H” and is
calculated using the probabilities of the various outcomes of a
random variable. For a discrete random variable “X” with
probability distribution “P(X)”, the entropy “H(X)” is given by the
formula:
Where:
“Σ” represents the sum taken over all possible values of “X”.
Cross-Entropy
Cross-entropy is a concept used to compare two probability
distributions and quantify the difference between them. It is a
fundamental concept in information theory and is widely applied in
machine learning, particularly in tasks involving classification and
probabilistic modeling.
Where:
“Σ” represents the sum taken over all possible values of “X”.
Information Content
Entropy
Cross-Entropy
Now, let’s consider the predicted probabilities from your spam filter.
You’ve trained a machine learning model to predict the probability
that an email is spam. For each email, the model assigns a
probability distribution (predicted probabilities for spam and not
spam). You can calculate the cross-entropy between the predicted
distribution and the true distribution (actual label) for a set of
emails. A low cross-entropy indicates that your model’s predictions
are close to the actual labels, while a high cross-entropy suggests
differences.
Application Steps:
where,
P(x,y) is the joint probability of X and Y.
P(x) and P(y) are the marginal probabilities of X and Y respectively.
For example, NLG can be used after analysing customer input (such as commands to
voice assistants, queries to chatbots, calls to help centres or feedback on survey forms)
to respond in a personalised, easily-understood way. This makes human-seeming
responses from voice assistants and chatbots possible.
It can also be used for transforming numerical data input and other complex data into
reports that we can easily understand. For example, NLG might be used to generate
financial reports or weather updates automatically.
Computational linguistics
The scientific understanding of written and spoken language from the perspective of
computer-based analysis. This involves breaking down written or spoken dialogue and
creating a system of understanding that computer software can use. It uses semantic
and grammatical frameworks to help create a language model system that computers
can utilise to accurately analyse our speech.
An extractive approach takes a large body of text, pulls out sentences that are most
representative of key points, and combines them in a grammatically accurate way to
generate a summary of the larger text.
An abstractive approach creates novel text by identifying key concepts and then
generating new language that attempts to capture the key points of a larger body of text
intelligibly.
Data analysis
First, data (both structured data like financial information and unstructured data like
transcribed call audio) must be analysed. The data is filtered, to make sure that the end
text that is generated is relevant to the user’s needs, whether it’s to answer a query or
generate a specific report. At this stage, your NLG tools will pick out the main topics in
your source data and the relationships between each topic.
Data understanding
Here is where Natural Language processing, machine learning and a language model
come in. Your software identifies the patterns in the data and based on its algorithmic
training, it’s able to interpret what is being said and the context of these statements. For
numerical data or other types of non-textual data, your software spots the data it’s been
taught to recognise and is able to understand how it relates to actual text.
Document creation and structuring
At this stage, your NLG solutions are working to create data-driven narratives based on
the data being analysed and the result you’ve requested (report, chat response etc.). A
subsequent document plan is created.
Sentence aggregation
Sentences and parts of sentences that have been identified as relevant are put together
to summarise the information to be presented.
Grammatical structuring
Your software begins its generated text, using natural language grammatical rules to
make the text fit our understanding.
Language presentation
Finally, the software will create the final output in whatever format the user has chosen.
As mentioned, this could be in the form of a report, a customer-directed email or a voice
assistant response.