0% found this document useful (0 votes)
5 views2 pages

A Beginners' Guide To Cross-Entropy in Machine Learning

Uploaded by

Bill Petrie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

A Beginners' Guide To Cross-Entropy in Machine Learning

Uploaded by

Bill Petrie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

What is Cross-Entropy?

Assume we have two distributions of data and need to be compared. Cross entropy employs the concept of entropy
which we have seen above. Cross entropy is a measure of the entropy difference between two probability
distributions. Assume the first probability distribution is denoted by A and the second probability distribution is
denoted by B.

The average number of bits required to send a message from distribution A to distribution B is referred to as cross-
entropy. Cross entropy is a concept used in machine learning when algorithms are created to predict from the
model. The construction of the model is based on a comparison of actual and expected results.

Mathematically we can represent cross-entropy as below:

Source

In the above equation, x is the total number of values and p(x) is the probability of distribution in the real world. In
the projected distribution B, A is the probability distribution and q(x) is the probability of distribution. So working with
two distributions how do we link cross-entropy to entropy? If the expected and actual values are the same then
cross-entropy equals entropy.

In the real world, however, the predicted value differs from the actual value which is referred to as divergence,
because they differ or diverge from the actual value. As a result, cross-entropy is the sum of Entropy and KL
divergence (type of divergence).

Cross-Entropy as Loss Function


When optimizing classification models, cross-entropy is commonly employed as a loss function. The logistic
regression technique and artificial neural network can be utilized for classification problems.

In classification, each case has a known class label with a probability of 1.0 while all other labels have a probability
of 0.0. Here model calculates the likelihood that a given example belongs to each class label. The difference
between two probability distributions can then be calculated using cross-entropy.

In classification, the goal of probability distribution P for an input of class labels 0 and 1 is interpreted as probability
as Impossible or Certain. Because this probability includes no surprises (low probability event) they have no
information content and have zero entropy.

When we are dealing with Two Class probability, the probability is modelled as Bernoulli distribution for the positive
class. This means that the mode explicitly predicts the probability for class 1, while the probability for class 0 is given
as 1 – projected probability. For more clearly say;

Class 1 = 1 (originally predicted)


Class 0 = 1 – originally predicted
We are frequently concerned with lowering the model’s cross-entropy throughout the entire training dataset. This
can be done by taking the average cross-entropy of all training sets.

You might also like