Crash Course On Tensorflow!: Vincent Lepetit!
Crash Course On Tensorflow!: Vincent Lepetit!
!
Vincent Lepetit!
1!
TensorFlow!
2!
Why 'Tensor', Why 'Flow'?!
A tensor can be !
a scalar: 3, rank 0, shape []
a vector: [1., 2., 3.], rank 1, shape [3]
a matrix, [[1, 2, 3], [4, 5, 6]], rank 2, shape [2, 3]
their extension to more dimensions:
[[[1, 2, 3], [[7, 8, 9]]], rank 3, shape [2, 1, 3]
!
Computations in TensorFlow are defined using a
graph of operations applied to tensors.!
3!
First Full Example:!
Linear Regression!
(from the official documentation)!
4!
The Math Problem We Will Solve!
Linear regression:!
We want to fit a linear model to some data.!
!
Formally, this means that we want to
estimate the parameters W and b of the
model:!
! y = Wx + b
b !
x+ !
W W and b are scalar. We will estimate them by
yi y= minimizing:!
!
X
! loss = (W xi + b yi ) 2
!
i
!
xi where the (xi, yi) are training data.!
5!
Gradient Descent!
X
loss(W, b) = (W xi + b yi ) 2
i
Linear regression can be solved using linear algebra (at least when the
problem is small).!
!
Here we will use gradient descent as this will be a simple example to start
with TensorFlow.!
6!
Gradient Descent!
X
loss(W, b) = (W xi + b yi ) 2
i
(Ŵ, b̂) = arg min loss(W, b) (Ŵ, b̂) = arg min loss(W, b)
b
" @loss
#
@W
@loss
W @b
@loss X
=2 xi (W xi + b yi )
@W i
@loss X
=2 (W xi + b yi )
@b
7!
i
8!
tf will stand for TensorFlow!
9!
Our unknowns.!
!
They are tf.Variable
!
We need to provide their initial values and types.!
10!
TensorFlow Graph Element !
Can be:!
• A tensor (tf.Tensor);!
• An operation: add, mul, etc. (tf.Operation);!
• A variable (tf.Variable, which is in fact made of a tf.Operation
(assign) and a tf.Tensor);!
• and other things.!
11!
Our unknowns.!
!
They are tf.Variable
!
We need to provide their initial values and types!
12!
The input.!
!
It is a tf.placeholder
13!
linear_model is a tf.Operation
14!
This is a tf.placeholder for the expected output.!
15!
X
2
The loss function:! loss = (W xi + b yi )
i
Note that we cannot write for example:!
(linear_model – y) ** 2!
we have to write:!
tf.square(linear_model – y)
16!
17!
Creates an optimizer object.!
!
It implements gradient descent.!
!
0.01 is the step size.
18!
Create an object that will be used to perform the
minimization.!
!
Still no optimization is actually ran.
19!
20!
These are our training data: (1,0), (2, -1), (3, -2), (4, -3)!
!
They are regular Python arrays.
21!
init is a handle to the TensorFlow sub-graph that initializes all
the global variables.!
22!
23!
Does 1000 steps of gradient descent.!
!
{x:x_train, y:y_train} is a regular Python dictionary of
tensors, created from the x_train and y_train Python arrays.!
!
It associates a value for xi to a value for yi.!
!
sess.run(train, {x:x_train, y:y_train}) applies the
train handle to this data.!
24!
the tensorflow.Session.run function
and the TensorFlow graph!
v = session.run(
fetches,
feed_dict=None,
options=None,
run_metadata=None
)
fetches is a TensorFlow graph element (or a tuple, list, etc. of graph elements);!
!
feed_dict contains the input and expected data used to compute the values of
the elements in fetches;!
!
The return values are the values of the elements in fetches, given the data in
feed_dict
!
See example next slide:!
!
curr_W, curr_b, curr_loss = sess.run([W, b, loss],
{x:x_train, y:y_train})
!
!
! 25!
!
Evaluate W, b, and loss, given x in x_train and y in
y_train.
curr_W, curr_b, curr_loss = sess.run([W, b, loss],
{x:x_train, y:y_train})
26!
Remark:!
!
Because linear regression is very common, there is already
an object for it. See:!
!
tf.contrib.learn.LinearRegressor
28!
We can give better names to the graph's nodes:!
with tf.variable_scope("W"):
W = tf.Variable([.3], tf.float32)
with tf.variable_scope("b"):
h = tf.Variable([-.3], tf.float32)
with tf.variable_scope("input"):
x = tf.placeholder(tf.float)
with tf.variable_scope("output"):
linear_model = W * x + b
y = tf.placeholder(tf.float)
29!
with tf.variable_scope("W"):
W = tf.Variable([.3], tf.float32)
with tf.variable_scope("b"):
h = tf.Variable([-.3], tf.float32)
with tf.variable_scope("input"):
x = tf.placeholder(tf.float)
with tf.variable_scope("output"):
linear_model = W * x + b
y = tf.placeholder(tf.float)
…
30!
Second Example:!
Two-Layer Network!
31!
A Two-Layer Network!
FC FC
We will train a two-layer network to
x y
approximate a 2D function F(x):!
32!
FC FC
33!
Loss function! Hidden layer:!
h1 = ReLU(x W1 + b1)
Training set: !
Output layer:!
(x_traini , y_traini) h2 = h1 W2 + b2
Ns
1 X 2
Loss = h2 (x train) y traini
Ns i=1
34!
Generating Training Data!
35!
Defining the Network! FC FC
Hidden layer: h1 = ReLU(x W1 + b1)
x y
Output layer: h2 = h1 W2 + b2
Ns
1 X 2
Loss = h2 (x train) y traini
Ns i=1
36!
Running the Optimization!
Note the generation of the random batch: This is done by keeping the
batch_size first elements of the np.random.permutation function !
37!
Visualizing the Predicted Function!
without using the run() function!
38!
visualize_2layers()!
h1 = ReLU(x W1 + b1)
h2 = h1 W2 + b2
39!
Third Example:!
Linear Classification on MNIST!
40!
Downloading the MNIST Dataset!
import numpy as np
import tensorflow as tf
im = mnist.train.images[0]
im = tmp.reshape(28,28)
41!
Model!
y = softmax(xW + b)
with !
exp hi
softmax(h)i = P
j exp hj
42!
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
W = tf.Variable(tf.zeros([n_input, n_classes]))
b = tf.Variable(tf.zeros([n_classes]))
# Predicted output:
y_pred = tf.nn.softmax(tf.add(tf.matmul(x, W), b))
y = softmax(xW + b)
43!
Loss Function!
44!
X
L(y, yexpected ) = yexpected i log(yi )
i
# Loss function:
cross_entropy =
tf.reduce_mean(
-tf.reduce_sum(
y_exp * tf.log(y_pred),
reduction_indices=[1]
)
)
45!
Training!
train_step = \
tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
46!
Testing!
correct_prediction = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_exp,1))
47!
Visualizing the Model!
(after optimization)!
W_array = W.eval(sess)
#First column of W:
I = W_array.flatten()[0::10].reshape((28,28))
48!
Visualizing the Model!
During Optimization!
TensorFlow comes with TensorBoard, a program that can display data saved using
TensorFlow functions on a browser.!
!
To visualize the columns of W during optimization with TensorBoard, we need to:!
!
1. create a Tensor that contains these columns as an image. This will be done by our
function display_W;!
2. tell TensorFlow that this Tensor is an image and part of a 'summary': A 'summary'
is made of data useful for monitoring the optimization, and made to be read by
TensorBoard.!
4. during optimization, we can save the image of the columns using the FileWriter
object, and visualize the images using TensorBoard.!
!
49!
display_W( )
50!
Third Example:!
A Convolutional Neural Network!
51!
Loading the Data!
As before:!
import numpy as np
import tensorflow as tf
52!
Model!
h1 = [ReLU(f1,1 ⇤ x), . . . , ReLU(f1,n ⇤ x)]
h2 = [pool(h1,1 ), . . . , pool(h1,n )]
h3 = [ReLU(f3,1 ⇤ h2,1 ), . . . , ReLU(f3,n ⇤ h2,n )]
h4 = [pool(h3,1 ), . . . , pool(h3,n )]
h5 = ReLU(W5 h4 + b5 )
y = W6 h5 + b6
53!
We Need to Convert !
the Input Vectors into Images!
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
54!
First Convolutional Layer!
h1 = [ReLU(f1,1 ⇤ x), . . . , ReLU(f1,n ⇤ x)]
h2 = [pool(h1,1 ), . . . , pool(h1,n )]
h
#32 [ReLU(f3,1 ⇤ h
3 =convolutional 5x52,1 ), . . . , ReLU(f
filters 3,n ⇤ h2,n
and biases on )]
the first layer:
# Filters
h and biases are initialized
4 = [pool(h3,1 ), . . . , pool(h3,n )]
# using values drawn from a normal distribution:
h
F15 = ReLU(W5 h4 + b5 )
= tf.Variable(tf.random_normal([5, 5, 1, 32]))
y
b1== W 6 h5 + b6
tf.Variable(tf.random_normal([32]))
F1_im = tf.nn.conv2d(im, F1, strides=[1, 1, 1, 1],\
padding='SAME')
h1 = tf.nn.relu( tf.nn.bias_add(F1_im, b1) )
55!
First Pooling Layer!
h1 = [ReLU(f1,1 ⇤ x), . . . , ReLU(f1,n ⇤ x)]
h2 = [pool(h1,1 ), . . . , pool(h1,n )]
h3 = [ReLU(f
#Pooling
⇤ h2,1 ), . . . , ReLU(f3,n ⇤ h2,n )]
3,1regions:
on 2x2
h4 ==tf.nn.max_pool(h1,
h2 [pool(h3,1 ), . . . , pool(h3,n )]
h5 = ReLU(W5 h4 +ksize=[1, b5 ) 2, 2, 1],
strides=[1, 2, 2, 1],
y = W6 h5 + b6
padding='SAME')
56!
Second Convolutional and Pooling
Layers!
h1 = [ReLU(f1,1 ⇤ x), . . . , ReLU(f1,n ⇤ x)]
h2 = [pool(h1,1 ), . . . , pool(h1,n )]
h3 = [ReLU(f3,1 ⇤ h2,1 ), . . . , ReLU(f3,n ⇤ h2,n )]
h4 = [pool(h3,1 ), . . . , pool(h3,n )]
h5 = ReLU(W5 h4 + b5 )
y = Wconvolutional
#Second 6 h5 + b6 layer: 64 5x5x32 filters:
F3 = tf.Variable(tf.random_normal([5, 5, 32, 64]))
b3 = tf.Variable(tf.random_normal([64]))
F3_im = tf.nn.conv2d(h2, F3, strides=[1, 1, 1, 1],
padding='SAME')
h3 = tf.nn.relu( tf.nn.bias_add(F3_im, b3) )
57!
h1 = [ReLU(f1,1 ⇤ x), . . . , ReLU(f1,n ⇤ x)]
h2 = [pool(h1,1 ), . . . , pool(h1,n )]
Two Fully Connected Layers!
h3 = [ReLU(f3,1 ⇤ h2,1 ), . . . , ReLU(f3,n ⇤ h2,n )]
h4 = [pool(h3,1 ), . . . , pool(h3,n )]
h5 = ReLU(W5 h4 + b5 )
y = W6 h5 + b6
#First fully connected layer, 1024 output:
h4_vect = tf.reshape(h4, [-1, 7*7*64])
W5 = tf.Variable(tf.random_normal([7*7*64, 1024]))
b5 = tf.Variable(tf.random_normal([1024]))
h5 = tf.nn.relu( tf.add(tf.matmul(h4_vect, W5), b5 ))
58!
Two Fully Connected Layers!
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=y_pred, labels=y_exp
)
)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
59!
Optimization!
step = 1
training_iters = 20000000
batch_size = 128
60!
Adding Evaluation (1)!
Before optimization, let's define:!
!
# Evaluate model
is_prediction_correct = tf.equal(tf.argmax(y_pred, 1),
tf.argmax(y_exp, 1))
accuracy = tf.reduce_mean(tf.cast(is_prediction_correct,
tf.float32))
!
!
61!
Adding Evaluation (2)!
Printing performance on test set during optimization:!
!
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={x: batch_x,
y_exp: batch_y})
if step % display_step == 0:
# Calculate batch loss and accuracy
acc = sess.run([accuracy],
feed_dict={x: mnist.test.images,
y_exp: mnist.test.labels})
print(acc)
step += 1
62!
Adding Dropout (1)!
Dropout is not really useful here, but we will see how to add it to this simple
example:!
keep_prob = tf.placeholder(tf.float32)
h5 = tf.nn.dropout(h5, keep_prob)
keep_prob will be set to 0.5 for training, and 1.0 for actual evaluation.!
63!
Adding Dropout (2)!
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Optimization:
sess.run(optimizer,
feed_dict={x: batch_x,
y_exp: batch_y,
keep_prob: 0.5})
# Evaluation:
if step % display_step == 0:
# Calculate batch loss and accuracy
acc = sess.run([accuracy],
feed_dict={x: mnist.test.images,
y_exp: mnist.test.labels,
keep_prob: 1.0}
)
print(acc)
step += 1
64!