Tensorflow PDF
Tensorflow PDF
Hanock Kwak
2017-08-24
Seoul National University
Preliminary
• Machine Learning
• Deep Learning
• Linear Algebra
• Python (numpy)
Throughout the Slides
• Please put following codes to run our sample codes.
import numpy as np
import tensorflow as tf
https://www.slideshare.net/JenAman/large-scale-deep-learning-with-tensorflow
Graphs in TensorFlow
a b
• Computation is a dataflow graph.
• A variable is defined as a symbol. ×
x c
a = tf.Variable(3)
b = tf.Variable(2)
c = tf.Variable(1) +
x = a*b
y=x+c
y
Device Placement
a b
• A variable or operator can be
pinned to a particular device. ×
CPU
# Pin a variable to CPU.
with tf.device("/cpu:0"): x c
a = tf.Variable(3)
b = tf.Variable(2)
x = a*b +
GPU
# Pin a variable to GPU.
y
with tf.device("/gpu:0"):
c = tf.Variable(1)
y=x+c
Distributed Systems of GPUs and CPUs
TensorFlow in Distributed Systems
http://download.tensorflow.org/paper/whitepaper2015.pdf
TensorFlow in Distributed Systems cont.
http://download.tensorflow.org/paper/whitepaper2015.pdf
Image Model Training Time
https://www.slideshare.net/JenAman/large-scale-deep-learning-with-tensorflow
Partial Flow
• TensorFlow executes a subgraph of the whole graph.
• We do not need “e” and “d” to compute “f”.
http://download.tensorflow.org/paper/whitepaper2015.pdf
Graph Optimizations
• Common Subexpression Elimination
• Controlling Data Communication and Memory Usage
• Asynchronous Kernels
• Optimized Libraries for Kernel Implementations
• BLAS, cuBLAS, GPU, cuda-convnet, cuDNN
• Lossy Compression
• 32 → 16 → 32bit conversion
What is Tensor?
Tensor
• A tensor is a multidimensional data array.
Order 0 1 2 3
1 2
100 5, 3, 7, … , 10
3 1
Shape of Tensor
• List of dimensions for each order.
• Shape = [4, 5, 2]
V = tf.Variable(tf.zeros([4, 5, 2]))
Reshape
V = tf.Variable(tf.zeros([4, 5, 2]))
• Reshapes the tensor.
W = tf.reshape(V, [4, 10])
Transpose
a = np.arange(2*3*4)
• Transposes tensors. x = tf.Variable(a)
• Permutes the dimensions. x = tf.reshape(x, [2, 3, 4])
print(y1.get_shape()) # (2,4,3)
print(y2.get_shape()) # (4,2,3)
print(y3.get_shape()) # (3,4,2)
Concatenation
• Concatenate two or more tensors.
# tensor t1 with shape [2, 3]
# tensor t2 with shape [2, 3]
t3 = tf.concat([t1, t2], 0) # ==> [4, 3]
t4 = tf.concat([t1, t2], 1) # ==> [2, 6]
axis=1
axis=0
Reduce Operations
• Computes an operation over elements across dimensions of a
tensor.
• tf.reduce_sum(…), tf.reduce_prod(…), tf.reduce_max(…), tf.reduce_min(…)
# 'x' is [[1, 1, 1]
# [1, 1, 1]]
tf.reduce_sum(x) # ==> 6
tf.reduce_sum(x, 0) # ==> [2, 2, 2]
tf.reduce_sum(x, 1) # ==> [3, 3]
tf.reduce_sum(x, 1, keep_dims=True) # ==> [[3], [3]]
tf.reduce_sum(x, [0, 1]) # ==> 6
Matrix Multiplication
• Matrix multiplication with two tensors of order 2.
# Build a graph.
x = tf.placeholder(tf.float32, shape=())
y = x*x + tf.sin(x)
g = tf.gradients(y, x) # 2*x + cos(x)
# Build a graph.
x = tf.Variable(100)
assign_op = x.assign(x - 1)
# Run assign_op
sess.run(tf.global_variables_initializer())
print(sess.run(assign_op)) # 99
print(sess.run(assign_op)) # 98
print(sess.run(assign_op)) # 97
Problems with Variables
• Sometimes we want to reuse same set of variables.
• Whenever Variable is called it only creates new variable.
• How can we reuse same variable?
# define function
def f(x):
b = tf.Variable(tf.random_normal([10], stddev=1.0))
return x + b
…
y1 = f(x1)
y2 = f(x2) # it use different ‘b’ variable
Sharing Variables: tf.get_variable()
• The function tf.get_variable() is used to get or create a
variable instead of a direct call to tf.Variable.
# define function
def f(x):
b = tf.get_variable(‘b’, [10], initializer=tf.random_normal_initializer())
return x + b
…
with tf.variable_scope(“bias") as scope:
y1 = f(x1)
scope.reuse_variables()
y2 = f(x2) # it use same ‘b’ variable
How Does Variable Scope Work?
• Variable scope wraps variables with a namespace.
• Reusing variables is only valid within the scope.
Caution: Name Duplication
• Calling tf.get_variable() twice with same name when reuse is
off, invokes error.
Original image
Broadcasting addition
Max Pooling
• Performs the max pooling on the input.
• ‘ksize’
• The size of the window for each dimension of
the input tensor.
• For 2ⅹ2 pooling, ksize = [1, 2, 2, 1]
• ‘strides’ and ‘padding’ are same as those in
the tf.nn.conv2d().
• We can use convolution of stride 2, instead
of using max pooling without significant
loss of performance.
• Check “Springenberg, J. T. et al., (2014).”
Max Pooling Example
• Example of 2ⅹ2 max pooling.
Activation Functions
• TensorFlow provides most of the popular activation functions.
• tf.nn.relu, tf.nn.softmax, tf.nn.sigmoid, tf.nn.elu, ...
• Example of using rectified linear function.
Fully Connected (Dense) Layer
• Fully connected (fc) layer can be implemented by calling
tf.matmul() function.
• y = tf.matmul(x, W)
• To compute fc layer after convolution operation, we need to
reshape 4-D tensor to 2-D tensor.
• [batch_size, height, width, channel]
→ [batch_size, height*width*channel]
Fully Connected Layer Example
TF Layers: High-level API
• The TensorFlow layers module provides a high-level API that
makes it easy to construct a neural network.
• No explicit weight (filter) variable creation.
• Includes activation function in one API.
Other High-level API
• TF Slim
• TF Learn
• Keras (with TensorFlow backend)
• Tensor2Tensor
Loss Functions
• TensorFlow provides various loss functions.
• tf.nn.softmax_cross_entropy_with_logits, tf.nn.l2_loss, ...
• TF Layers also provides similar functions starting with tf.losses.
• Example of tf.losses.softmax_cross_entropy.
# optimizer
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
...
sess.run(train_op, {x: batch_x, y: batch_y})
Review of the Batch Normalization
• Normalize the activations of
the previous layer.
• Advantages
• Allows much higher learning
rates.
• Can be less careful about
initialization.
• Faster learning.
• No need for Dropout.
Batch Normalization
• tf.nn.batch_normalization() needs bunch of variables and does
not support moving statistics, nor inference mode.
• Use tf.layers.batch_normalization()
• Put training=False, when inference mode.
• It supports moving statistics of the mean
and variance.
• ‘momentum’ determines forget rate of
the moving statistics.
tf.layers.batch_normalization
• ‘update_ops’ should be
called to update statistics
of batch normalization.
• In inference mode, the
values are normalized by
moving statistics.
Residual Connection
• A Residual Network is a neural network architecture which
solves the problem of vanishing gradients.
• Residual connection: 𝑦 = 𝑓 𝑥 + 𝑥
Transposed Convolution (Deconvolution)
• The need for transposed convolutions generally arises from
the desire to use a transformation going in the opposite
direction of a normal convolution.
• tf.layers.conv2d_transpose()
Load Pre-trained Models
• There are popular network architectures in TF Slim
• https://github.com/tensorflow/models/tree/master/slim
• Inception V1-V4
• Inception-ResNet-v2
• ResNet 50/101/152
• VGG 16/19
• MobileNet
Thank You
References
• https://www.tensorflow.org
• https://www.slideshare.net/JenAman/large-scale-deep-
learning-with-tensorflow
• https://www.slideshare.net/AndrewBabiy2/tensorflow-
example-for-ai-ukraine2016
• http://download.tensorflow.org/paper/whitepaper2015.pdf