0% found this document useful (0 votes)
63 views5 pages

Computational Statistical Physics Exercise Sheet 5: V H J I 1 N N Ij J I

The document describes using neural networks to analyze Ising configurations: 1) A Restricted Boltzmann Machine (RBM) with visible and hidden layers is used to generate Ising configurations at different temperatures. The RBM is trained using contrastive divergence to find optimal weights and biases. 2) A feed-forward neural network with multiple hidden layers is used to classify Ising configurations by temperature. Nodes can take continuous values between 0-1. 3) Tasks are given to implement functions for training the RBM, generating configurations, and classifying temperatures with the feed-forward network.

Uploaded by

Paride Azzari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views5 pages

Computational Statistical Physics Exercise Sheet 5: V H J I 1 N N Ij J I

The document describes using neural networks to analyze Ising configurations: 1) A Restricted Boltzmann Machine (RBM) with visible and hidden layers is used to generate Ising configurations at different temperatures. The RBM is trained using contrastive divergence to find optimal weights and biases. 2) A feed-forward neural network with multiple hidden layers is used to classify Ising configurations by temperature. Nodes can take continuous values between 0-1. 3) Tasks are given to implement functions for training the RBM, generating configurations, and classifying temperatures with the feed-forward network.

Uploaded by

Paride Azzari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computational Statistical Physics FS 2019

Exercise sheet 5 Lucas Böttcher

Exercise 1. Generating Ising configurations with the Restricted Boltzmann Ma-


chine
Goal: In this exercise we are going to learn a) what Restricted Boltzmann Machines
are, b) how they can be trained and c) how they can be used to generate Ising configu-
rations at a certain temperature.

Task 1: Read carefully through chapter 1.9 of the lecture notes and familiarize yourself with the
concepts of a neuron, the Hopfield Network and the Boltzmann Machine.

A Restricted Boltzmann Machine (RBM) is a neural network consisting of two layers of neurons
where every neuron of one layer is connected with every neuron of the other layer (inter-layer
connections between every two neurons). Within the same layer neurons are not connected (no
intra-layer connections). A schematic is presented in figure 1.

v1
w11
h1

v2
h2

v3

hNh

vNv

Figure 1: Schematic of a RBM. Visible layer (green). Hidden layer (blue).

One of the two layers is called visible layer while the other one is called hidden layer. Interacting
with the machine (input and output) can only occur over the visible layer. The hidden layer
is not directly accessible. Moreover, the neurons are binary, i.e., they can only take one of two
possible values - either 0 or 1.

Let’s call the number of visible nodes Nv and the number of hidden nodes Nh . Furthermore,
call the current value of the j-th node in the visible layer vj and the i-th node in the hidden
layer hi . With these definitions we are able to have a closer look at the dynamics of the system.
Given v = (v1 , .., vNv ) the value of the i-th node in the hidden layer is set to 1 with probability
 
XNv
p(hi = 1|v) = σ  wij vj + bi 
j=1

1
else it is set to 0. The coefficients wij are called weights and the coefficients bi are called biases
(of the hidden layer). σ(x) is the sigmoid function
1
σ(x) =
1 + e−x
which maps any real number to the interval (0, 1). Similarly, given the values h = (h1 , .., hNh )
of the hidden layer the value of the j-th visible node is determined by
Nh
!
X
p(vj = 1|h) = σ wji hi + aj
i=1
where aj are the biases of the visible layer. Note that the weights are symmetric, i.e., wij = wji .
Due to these update rules the RBM is classified as a stochastic model.

Task 2: State and explain the differences between a Hopfield Network, a Boltzmann Machine
and a Restricted Boltzmann Machine.

In the following we are going to use the RBM to generate 2D Ising configurations with L = 32 at
a certain temperature T . Therefore, we choose the number of visible nodes to be Nv = 32 × 32.

Before samples can be drawn the machine has to be trained. By training we mean updating the
weights and biases according to our training data.1 This is done via contrastive divergence. The
update rule for the weights is given by
 
wij → wij −  hvj hi idata − hvj hi ikmodel
where  is a so-called learning rate. The expectation values are understood to be averages over
the whole set of training data.2 The quantity (vj hi )data is calculated by taking a vector v
from the training data and computing the corresponding vector h as described above. For the
quantity (vj hi )kmodel one has to take a vector from the training data, compute the corresponding
vector h, compute the new v and perform k more back-and-forth operations. More information
about the contrastive divergence can be found here: https://arxiv.org/pdf/1803.08823.pdf
(p. 90 ff.).
For completeness, the update rules for the biases are given by
 
aj → aj −  hvj idata − hvj ikmodel
 
bi → bi −  hhi idata − hhi ikmodel .

For the following tasks we provide you with a python project which is missing some functionality
that you have to implement. The main program is ”main.py” and structured as follows: At
first, the training data is extracted and brought into the right shape (implementation found in
”ising main.py”). Then, the RBM is set up and trained for one fixed temperature T (implemen-
tation found in ”my RBM tf2.py”). Finally, new Ising configurations are generated and stored
in an external file.

Task 3: Implement the function ”contr divergence” in the class ”RBM” in the file ”my RBM tf2.py”
as described above.

Hint: You might find the following functions helpful:


1
We provide you with 5000 Ising configurations for six different temperatures. They are found in the file
”ising data.hdf5 ”.
2
Note that this procedure is in general very slow because the averages are always computed over the whole
training data. Instead of performing this kind of optimization for the weights and biases it is beneficial to divide
the whole set of training data into a set of mini-batches and compute the averages only over these mini-batches.

2
• tensorflow.sigmoid

• tensorflow.add

• tensorflow.tensordot

• tensorflow.transpose

• tensorflow.reshape

Check out the tensorflow documentation (https: // www. tensorflow. org/ api_ docs/ python )
for more information.

Task 4: Use the training data stored in ”ising data.hdf5” to find the optimal weights and biases
for your RBM. (Disclaimer: Training the machine may take quite a while depending on your
computer.)

Hint: In python, one possibility to access the Ising training data is given by:
import h5py

ising_file = h5py . File ( ’ ising_data . hdf5 ’) # load the data file

# access the first Ising configuration for the first temperature


temperatures = list ( ising_file . keys () )
t1 = keys [ 0 ]
configurations = ising_file [ t1 ]
x = configurations [ 0 ]

Once the machine is trained it can be used to generate new samples. This can be done in the
following way:

1. Set the nodes in the visible layer to random values (either 0 or 1).

2. Let the machine evolve i.e. go several times back and forth between the visible and hidden
layer.

3. Read out the nodes in the visible layer. This is the desired sample.

Task 5: Use the RBM to obtain new Ising configurations and store these samples in a separate
file.

Task 6: Repeat Task 4 and Task 5 for at least two more temperatures.

Exercise 2. Classifying temperatures of Ising configurations with a feed-forward


network
Goal: Here, we are going to learn about another kind of neural network. This time
we are not going to generate new Ising configurations but instead determine the tem-
peratures of given Ising configurations.

Consider a neural network made up of 4 layers as displayed in figure 2.

The two outer layers (used for input and output) are visible while the two inner layers are
hidden. In contrast to the RBM there is no back-and-forth flow of information between the

3
Input (visible)
hidden hidden
x1 Output (visible)
h1(1) h1(2)
x2 y1

y4

hNh1(1) hNh2(2)

xNv

Figure 2: Schematic of a feed-forward network with two hidden layers. Visible layers (green).
Hidden layers (blue).

visible and hidden layers. Instead, information flows from the input to the output layer. That
is why this network is called a feed-forward network. Furthermore, we assume that the nodes
are not binary anymore but can take continuous values between 0 and 1 and that the dynamics
of the system is given by
!
(1)
X (1) (1)
hk = σ wkl xl + bk
l
!
(2) (2) (1) (2)
X
hj =σ wjk hk + bj
k
 
(3) (2) (3)
X
yi = σ  wij hj + bi 
j

where σ(x) is again the sigmoid function. Due to the fact that the values of the nodes are uniquely
defined (not set with a certain probability as in the RBM) the network is called deterministic.

Since the goal is to map an Ising configuration to its corresponding temperature the input layer
is chosen to have 32 × 32 nodes while the ouput layer consists of 4 nodes (one for every possible
temperature we would like to detect). Thus, the values of the nodes in the output layer can be
interpreted as probabilities that the system has a certain temperature.

Training the machine means again that the weights and biases have to be adjusted. This is done
in a way such that the so-called cost function (also loss function) is minimized. Given some
input i(d) with expected output o(d) the (mean-squared) cost of this single training example is
defined as
4 
(d) 2

(d)
X
C (d) = yi − oi .
i=1

With this the total cost function C is defined as the average of all costs over the whole training

4
data set3
NX
data
  1
C w(1) , b(1) , w(2) , b(2) , w(3) , b(3) = C (d) .
Ndata
d=1

The most straightforward way to do the updates of the weights and biases is by using a steepest
descent method. However, such a method is usually slow because one has to average over all data
of the training set in every step. Therefore, similar to Exercise 1, we randomly divide the set
of training data into mini-batches and compute the gradient only for one of these mini-batches
in one step. This procedure is known as stochastic gradient descent. The update rule can be
stated in the following form:
   
w(1) w(1)
 
∂w(1)
 b(1)   b(1)   ∂b(1) 
   
 (2)   (2)   
w  w  ∂ (2) 
w
 (2)  →  (2)  −    ∂ (2)  C.

b  b   b 
 (3)   (3)  ∂ (3) 
w  w  w
b(3) b(3) ∂b(3)

The gradient ∇C of the cost function can be computed via backpropagation which is nothing
else but the chain rule.

Task 1: State and derive the analytical expressions for ∂w(3) C and ∂w(2) C.
i,j i,j

Task 2: Build up the network and train your machine with the test data provided in ”ising data.hdf5”.

Hint: You may use the python skeleton ”measuring temperature.py”.

Task 3: Use the samples you generated in exercise 1 and determine the corresponding temper-
atures using the network from this exercise.

Task 4 (optional): In the end machine learning is about trial and error, i.e., finding the best
model to describe and successfully predict the kind of data you consider. Therefore, modify your
feed-forward network and see which modifications yield the best results. There are several things
you can change. To mention only a few:

• Number of hidden layers

• Number of nodes in the hidden layers

• Activation function/non-linearity (instead of the sigmoid function one can use the ReLU,
Softmax, etc.)

• Cost function (instead of the mean-squared cost function one can use the categorical cross-
entropy, etc.)

• ...

Feel free to try anything which comes to your mind!

3
Note that in our case Ndata = 5000.

You might also like