0% found this document useful (0 votes)
31 views6 pages

Boltzmann Machine

A Boltzmann machine is an unsupervised deep learning model characterized by fully connected nodes that make binary decisions, functioning as a stochastic generative model. It consists of visible and hidden nodes, optimizing energy distribution to discover features in datasets, and can be utilized in various applications such as clustering and recommendation systems. Variants include Restricted Boltzmann Machines, Deep Belief Networks, and Deep Boltzmann Machines, each with distinct structural and operational characteristics.

Uploaded by

Shivansh Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views6 pages

Boltzmann Machine

A Boltzmann machine is an unsupervised deep learning model characterized by fully connected nodes that make binary decisions, functioning as a stochastic generative model. It consists of visible and hidden nodes, optimizing energy distribution to discover features in datasets, and can be utilized in various applications such as clustering and recommendation systems. Variants include Restricted Boltzmann Machines, Deep Belief Networks, and Deep Boltzmann Machines, each with distinct structural and operational characteristics.

Uploaded by

Shivansh Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Boltzmann Machine

• A Boltzmann machine is an unsupervised deep learning model in which every node is


connected to every other node. It is a type of recurrent neural network, and the nodes make
binary decisions with some level of bias.
• These machines are not deterministic deep learning models, they are stochastic or generative
deep learning models. They are representations of a system.
• A Boltzmann machine has two kinds of nodes
• Visible nodes:
These are nodes that can be measured and are measured.
• Hidden nodes:
These are nodes that cannot be measured or are not measured.
• According to some experts, a Boltzmann machine can be called a stochastic Hop eld network
which has hidden units. It has a network of units with an ‘energy’ de ned for the overall
network.
• Boltzmann machines seek to reach thermal equilibrium. It essentially looks to optimize global
distribution of energy. But the temperature and energy of the system are relative to laws of
thermodynamics and are not literal.
• A Boltzmann machine is made up of a learning algorithm that enables it to discover interesting
features in datasets composed of binary vectors. The learning algorithm tends to be slow in
networks that have many layers of feature detectors but it is possible to make it faster by
implementing a learning layer of feature detectors.
• They use stochastic binary units to reach probability distribution equilibrium (to minimize
energy). It is possible to get multiple Boltzmann machines to collaborate together to form far
more sophisticated systems like deep belief networks.
• The Boltzmann machine is named after Ludwig Boltzmann, an Austrian scientist who came up
with the Boltzmann distribution. However, this type of network was rst developed by Geo
Hinton, a Stanford Scientist.

Structure :
The boltzmann Machine consists of hidden nodes & Input nodes.

Like above picture the visible nodes are blue colored & hidden nodes are red colored. Every node
is connected to every node via synapses/linkage. But the network does not discriminate between
hidden nodes & visible nodes. It treats all nodes same. Every linkage carries some weights. There
is no output layers because we are not giving any output value. RBM is bipartite which means
there is no interlaying connections between input & hidden nodes.

What is the Boltzmann distribution?


The Boltzmann distribution is a probability distribution that gives the probability of a system being
in a certain state as a function of that state's energy and the temperature of the system.
It was formulated by Ludwig Boltzmann in 1868 and is also known as the Gibbs distribution.

fi
fi
fi
ff
Working principle of BM:
We need to know the concepts of Energy Based Model rst to understand Boltzmann Machine.
Energy Based Model (EBM) : The Boltzmann probability distribution function pi is de ned as
pi=[ exp{−E(x)/T}/∑ₓ exp{−E(x)/T}] , where E(x) = the energy of the xth state in the system, T =free
parameter (like Temperature)The equation describes the probability of nding a system in a state x
when the system is in thermal equilibrium with a heat bath at temperature T. Lower the energy
more probability of the state being in real. Z= ∑ₓ exp{−E(x)/T} is the partition function.

• Like above picture System A has some molecules of gas in a high density region of a corner & in
System B has same number of molecules but uniformly distributed throughout the system B.
System B has the uniform density & as per Boltzmann distribution, System B is more stable &
it’s probability of being real is more.
• The Energy function E( v, h) is de ned as

where E( v, h) is the energy of the state, vi = input of the state, hj = hidden state, ai, bj = the biases
of vi & hj respectively, wi,j = the weight element of the matrix associated with vi & hj.
• Now we will be assuming the energy of EBM is equivalent to weights of BM. Once the system is
trained up, Restricted Boltzmann Machine always try to nd out the lowest energy state
possible. If we put the Energy Model equation of E(x) in p, we will get to know p is exponentially
inversely proportional to E(x), which matches with our Boltzmann Machine concept where lower
the energy higher the probability & higher the energy lower the probability.
What are Boltzmann machines used for?
• The main aim of a Boltzmann machine is to optimize the solution of a problem. To do this, it
optimizes the weights and quantities related to the speci c problem that is assigned to it. This
technique is employed when the main aim is to create mapping and to learn from the attributes
and target variables in the data. If you seek to identify an underlying structure or the pattern
within the data, unsupervised learning methods for this model are regarded to be more useful.
Some of the most widely used unsupervised learning methods are clustering, dimensionality
reduction, anomaly detection and creating generative models.
• All of these techniques have a di erent objective of detecting patterns like identifying latent
grouping, nding irregularities in the data, or even generating new samples from the data that is
available. You can even stack these networks in layers to build deep neural networks that
capture highly complicated statistics. Restricted Boltzmann machines are widely used in the
domain of imaging and image processing as well because they have the ability to model
continuous data that are common to natural images. They are even used to solve complicated
quantum mechanical many-particle problems or classical statistical physics problems like the
Ising and Potts classes of models.

How does a Boltzmann machine work?

• Boltzmann machines are non-deterministic (stochastic) generative Deep Learning models that
only have two kinds of nodes - hidden and visible nodes. They don’t have any output nodes,
and that’s what gives them the non-deterministic feature. They learn patterns without the typical
1 or 0 type output through which patterns are learned and optimized using Stochastic Gradient
Descent.
fi
ff
fi
fi
fi
fi
fi
fi
• A major di erence is that unlike other traditional networks (A/C/R) which don’t have any
connections between the input nodes, Boltzmann Machines have connections among the input
nodes. Every node is connected to all other nodes irrespective of whether they are input or
hidden nodes. This enables them to share information among themselves and self-generate
subsequent data. You’d only measure what’s on the visible nodes and not what’s on the hidden
nodes. After the input is provided, the Boltzmann machines are able to capture all the
parameters, patterns and correlations among the data. It is because of this that they are known
as deep generative models and they fall into the class of Unsupervised Deep Learning.

What are the types of Boltzmann machines?

1. Restricted Boltzmann Machines (RBMs)


• To avoid the over tting we do not need the nodes connecting with nodes within same layer.
Only visible nodes connected to hidden nodes, but no connection among input nodes within
input layer & no connection among hidden nodes within hidden layer. It is used in Pattern
recognition within multiple featured dataset & also in recommended system.

• Like above example of RBM, it is designed for recommended system. Here dataset having
movies in features & viewers in rows are going under the training process. Now Genre A, Genre
B, Actor X, Award Y, Director Z are the preferences given by the viewers. We will make a
network which will give a recommendation to us that a new viewer having some parameters
which movie he/she will prefer to see. To build this recommended system, we will train our
network. To have a good accuracy the network must have to adjust the weights of synapses
between the nodes repeatedly & this is exactly where RBM model helps us to this.
ff
fi
• At rst step the input data will be fed into network. The features of the dataset are The matrix,
Fight Club, Forrest Gump, Pulp Fiction, The Departed — these ve movies. Viewers rated the
movie & we have another dataset having the parameters like genre of the movie, oscar-winning
or not & name of actor/director. When we will combine the dataset & fed into the input like
above picture, one by one row the network will get the value of of input nodes. For the rst
movie ‘The Matrix’, the network will check the hidden nodes matches with ‘The Matrix’ or not.
No matching values are found because ‘The Matrix’ is neither a drama, action nor it is a oscar
winning movies & it is not acted by Dicaprio & Tarantino. The second movie ‘Fight club’ does
not have any data. The third one Forrest Gump is a drama. ‘The Titanic’ is also a drama. Drama
hidden node is learnt by Forest Gump & Titanic. Same way Dicaprio matches with Titanic &
Oscar matches with ‘Forrest Gump’ & ‘Titanic’. So the matched hidden nodes are colored
yellow & unmatched hidden nodes are colored red. So the network now knows which input
nodes are activated for which hidden nodes. Then backward propagation happens. RBM will
reconstruct inputs based on hidden nodes. During training if the reconstruction is incorrect, the
weights are adjusted. Then again reconstruction happens, again if the reconstruction is
incorrect, the weights are adjusted. This process is continued till we achieve maximum accuracy
in the network.
• During this reconstruction those vacant input nodes are lled with data by network, which it
gives us the recommendation that new user will love to watch the movie or not. For example,
the second movie ‘Fight Club’ will not be watched by a new viewer if it is a action movie.
Because this movie has not any parameters which are liked by viewers for other movies as well.
So the value of ‘0’ is updated in input node of ‘Fight club’. The last one ‘The Departed’ movie
learns from those hidden nodes & it matches with drama, Dicaprio, Oscar. So a new viewer will
love to watch this movie. Because this movie has some parameters which are liked by viewers
for other movies as well. So the value of ‘1’ is updated in input node of ‘The Departed’.

Contrastive Divergence :
• This is algorithm that allows us RBM to learn & update weights. The gradient descent will not
work here as it is directionless. The rst step is called Gibbs Sampling. We have an input vector
v0 & probability p(h0|v0) where h0 = hidden values. Then we will use p(v0|h0) to nd the v1. If
the no of iteration=k, our reconstructed input vector=vk,

• To understand this, let’s take a simple example of 5input nodes & 5 hidden nodes. At rst step a
hidden node is created by all input nodes. This way all hidden nodes are created. Then all
hidden nodes will now reconstruct input nodes one by one. So after updating value of input
node, it is no longer as same as previous input node even after changing the weights. So this
way the all input node changes. But again we should remember our each hidden nodes are also
based on input nodes. So hidden nodes will also be changed. This process will be continued till
the energy of the state is minimized just like we have learnt from ‘Energy Based Model’. The
weights of RBM is de ned by the energy of EBM. If we draw a curve between the energy of
state versus the epoch we will be getting a curve like this : —
fi
fi
fi
fi
fi
fi
fi
fi
• The more step we will go ahead in the contrastive divergence the less the energy of the state
will be & less the probability of the state will be. Just like above picture E@3rd state < E@2nd
state < E@1st State. We get the change of probability with respect to weights. Here <vi⁰hj⁰> is
the initial state of the system & <vi∞hj∞> is the nal state of the system. The process is called
‘gibbs sampling’. We can drag the energy curve by gibbs sampling, which technically means
adjusting weights with reassembling our input values to reach the minimum energy state.

2. Deep Belief Networks (DBNs)


• The structure of DBN is equivalent to stacked RBM. This is a complex & advanced type of
networks.
fi
• Multiple RBM are stacked over each other. There will be two types of training. If there are 4
layers, we will train the network with the input data in the input nodes. Then the rst hidden
layers are made up by these data & second and third hidden layers are also made up
consecutively one by one. When the reconstruction starts at the top of the network, at rst the
second hidden layer will be reconstructed, then the rst hidden layer will be reconstructed,
because second hidden layer based on rst hidden layer. Finally the input layers to be
reconstructed to minimize the energy of the state as per energy based model.

3. Deep Boltzmann Machines (DBMs)


• Deep Boltzmann Machines are very similar to Deep Belief Networks. The di erence between
these two types of Boltzmann machines is that while connections between layers in DBNs are
directed, in DBMs, the connections within layers, as well as the connections between the layers,
are all undirected.
fi
fi
ff
fi
fi

You might also like