0% found this document useful (0 votes)
29 views99 pages

Tensorflow Ensai SID 13 01 17

The document provides an introduction to deep learning with TensorFlow. It discusses topics such as convolutional neural networks, sequence modeling, reinforcement learning and how TensorFlow works. Exercises are also provided to help understand concepts like gradient descent, logistic regression and feeding data in TensorFlow.

Uploaded by

jr.developer.78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views99 pages

Tensorflow Ensai SID 13 01 17

The document provides an introduction to deep learning with TensorFlow. It discusses topics such as convolutional neural networks, sequence modeling, reinforcement learning and how TensorFlow works. Exercises are also provided to help understand concepts like gradient descent, logistic regression and feeding data in TensorFlow.

Uploaded by

jr.developer.78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

1

INTRODUCTION TO DEEP LEARNING


WITH TENSORFLOW

Fabien Baradel
PhD Candidate

fabienbaradel.github.io
@fabienbaradel
[email protected]
DEEP LEARNING? 2

TENSORFLOW?
TENSORFLOW BACKGROUND 3

• 6 months internship at Xerox Research Center Europe:


- « Unsupervised Domain Adaptation »
- Image recognition & Sentimental text classification
- Machine Learning for Services team
- Grenoble, France

• Phd Candidate at LIRIS - INSA Lyon since October 2016:


- « Deep Learning for human understanding: gestures, poses, activities »
- Imagine team
- Supervisors: Christian Wolf & Julien Mille
- Working with videos
DEEP LEARNING
FOR HUMAN UNDERSTANDING 4

• Action recognition => Classification


• Sequence Learning
• Supervised Learning
DEEP LEARNING
FOR HUMAN UNDERSTANDING 5

• Microsoft Kinect - Xbox


DEEP LEARNING
FOR HUMAN UNDERSTANDING 6

Video = sequence of
Microsoft Kinect v2: frames
Skeleton not enough
• 3D joint location
(looking at book, looking at
• 25 joints smartphone)
SELF-DRIVING CARS 7

Tesla

• https://www.youtube.com/watch?v=CxanE_W46ts
MATERIALS 8

Virtual Machine: USB key


• Python 2.7
• Tensorflow
• Ubuntu 16.04
• Datasets

Seminar slides & Code corrections:


https://fabienbaradel.github.io
9
Introduction
• Basics of Tensorflow
• Machine Learning: analytic solution vs.
gradient descent
Supervised Learning (image recognition)
• Neural networks reminder
• Convolution Networks
• Going deeper with ConvNets
Unsupervised Learning
• Autoencoder
• Generative Adversial Network
Sequence modelling
• RNN, LSTM
• Word2vec
Reinforcement Learning
• Deep Q-learning
• Frozen Lake
10

INTRODUCTION
WHAT IS TENSORFLOW? 11
• A python library
• pip install tensorflow
• Google
• open-source
• library for numerical computation
using data flow graphs
• CPU and GPU
• Research & Industry
PRINCIPLE 12
« HELLO WOLRD » 13

• INTRODUCTION EXERCISES

• Difference between constant/variable


and placeholder

• Constant = a fixed Variable

• With placeholder you need to feed


data to your graph during your
session

• Tensorflow workflow:
• Draw your graph
• Feed data
• … and optimize
« BASIC MATHS OPERATIONS » 14

• Open « math_ops.py »
• Same thing with integer
• Mathematical operation done
only using Tensorflow library (no
numpy or else)
• Draw the schema of the code

WITH CONSTANTS WITH PLACEHOLDERS


ANALYTIC SOLUTION IN ML 15

Linear regression y = X +✏

Least Squares solution


ˆ = arg min||X y||2

ˆ = (X T X) 1
XT Y

Could solve the same problem by


solving the optimization problem
using gradient descent
GRADIENT DESCENT 16

• Goal: minimizing an function


• Random initialization of parameters
• At time t, gradient gives the slope of the function
• Iterative process
• Updating the parameters in the positive direction of
the gradient according to a learning rate
• Repeat until convergence
BATCH STOCHASTIC GRADIENT DESCENT 17
Linear regression y = X +✏
Minimize a loss function:
N
X
J( ) = (Xi yi ) 2
i=1
ˆ = arg min J( )

• Initialize ˆ0 randomly
• Choose a learning rate ⌘
• for t in range(training_step):
• Compute the loss
N
X
J( ˆt ) = (Xi ˆt yi ) 2
i=1
• Update parameters
ˆt+1 = ˆt ⌘rJ( ˆt )

https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/#train-your-dragon
MINI-BATCH SGD 18

SGD = stochastic gradient descent


• Initialize ˆ0 randomly
• Choose a learning rate ⌘
• Choose a batch size n
• for t in range(training_step): • Initialization is
n important!
• S
Pick a random sample t from • Learning rate too!
training data
• Need a validation set
to avoid overfitting
• Compute the loss function
X
J( ˆt ) = (Xi ˆt yi ) 2

i2Stn
Neural nets always trained
• Update parameters with mini-batch SGD!

ˆt+1 = ˆt ⌘rJ( ˆt )
EXERCISES 19

• Go to the Github repo and complete the codes:


✴ SGD/linear_regression_exo.py
✴ SGD/binary_classif_exo.py

import ipdb; ipdb.set_trace()

http://playground.tensorflow.org/
https://wookayin.github.io/TensorflowKR-2016-talk-debugging/
20

NEURAL NETWORKS
MNIST DATASET 21

• Handwritten digits
• 60.000 training data and 10.000 test data
• 28x28 grayscale images
• matrix of size 28x28 with value between 0 and 255
• data preprocessing = rescaling to [0,1]
MULTINOMIAL LOGISTIC REGRESSION ON MNIST:
CREATE THE GRAPH 22

logits predictions
label
c
W
vectorization learnable parameter softmax

10x1 28x28
10

784

Compute cross-entropy
c) =
J(W y ⇥ log(ŷ)
WHY LOG?
http://colah.github.io/posts/2015-09-Visual-Information/
MULTINOMIAL LOGISTIC REGRESSION ON MNIST:
CREATE THE GRAPH 23

images t=0
logits predictions labels

ct
W
vectorization learnable parameter softmax

10
10

784

X
Compute ct ) =
J(W yi ⇥ log(yˆi )
cross-entropy
i2St2

c updated [ c ct )
Wt by SGD W t+1 = W t ⌘rJ(W
MULTINOMIAL LOGISTIC REGRESSION ON MNIST:
FEED DATA 24

images t=1
logits predictions labels

ct
W
vectorization learnable parameter softmax

28x28
10 10
10

784

X
Compute ct ) =
J(W yi ⇥ log(yˆi )
cross-entropy
i2St2

c updated [ c ct )
Wt by SGD W t+1 = W t ⌘rJ(W
MULTINOMIAL LOGISTIC REGRESSION ON MNIST:
FEED DATA 25

images t=2
logits predictions labels

ct
W
vectorization learnable parameter softmax

28x28
10 10
10

784

X
Compute ct ) =
J(W yi ⇥ log(yˆi )
cross-entropy
i2St2

c updated [ c ct )
Wt by SGD W t+1 = W t ⌘rJ(W
NEURAL NETWORKS 26

0 1 0 1
✓ yb1
B yb2 C
0
B1C
B C Error B C
B yb3 C B0C
FUNCTION B C B C
@ ... A @...A
yc10
0
input
inference function output label

• Minimize your error on a training set


• Find the best inference function parameters
• Difference between neuralNets an deepNets: only in the inference function

ŷ = f (✓, x)
J(✓) = error(ŷ, y) given ✓
✓ˆ = argmin J(✓)
And train it using mini-batch SGD!
NEURAL NETWORKS IN TENSORFLOW:
GENERAL GRAPH 27
placeholder placeholder

0 1
✓ 0
yc
c1
1 ?
B yc C Error B?C
FUNCTION B Cc 2 B C
@ ... A @...A
yccn ?
inference function output
input
label

inference function => ŷ = f (✓, x)


loss function => J(✓) = error(ŷ, y) given ✓
optimization problem => ✓ˆ = argmin J(✓)

And train it using mini-batch SGD in a Tensorflow session!


NEURAL NETWORKS IN TENSORFLOW:
GENERAL TRAINING 28
placeholder placeholder
0 1 0 1
✓b0 yc
B yc
B c2 C
c1
C 1
B0C
@ ... A Error B C
FUNCTION yccn 0 1 @...A
yc
c1
0 1
B yc C 0 0
B c2 C B1C
0 1 @ ... A
yc B C
0 [email protected]
c1
inference function B yc C yccn
B C c 2
0
@ ... A 0
yc B0C
cn B C
@...A
outputs 1
inputs

labels

feed mini-batch of data step by step

And train it using mini-batch SGD in a Tensorflow session!


NEURAL NETWORKS IN TENSORFLOW 29
placeholder placeholder
0 1 0 1
✓b1 yc
B yc
B c2 C
c1
C
0
B0C
@ ... A Error B C
@...A
FUNCTION yccn 0 1
yc
c1 1 011
B yc C
B c2 C B0C
0 1 @ ... A
yc B C
0 1 @...A
c1
inference function B yc C yccn
B C c 2
0
B C 0
@ ... A
yc cn B1C
@...A
outputs
0
inputs

labels

feed mini-batch of data step by step

And train it using mini-batch SGD in a Tensorflow session!


NEURAL NETWORKS IN TENSORFLOW 30
placeholder placeholder
0 1 0 1
✓b2 yc
B yc
B c2 C
c1
C 0
B1C
@ ... A Error B C
yc @...A
FUNCTION cn 0 1 0 1
yc
c1 0 0
B yc C B0C
B c2 C B C
0 1 @ ... A
ycc1
yc 0 1 @...A
inference function B yc C cn
1
B C c 2
@ ... A
1
B0C
yc B C
cn
@...A
0
outputs
inputs

labels

feed mini-batch of data step by step

And train it using mini-batch SGD in a Tensorflow session!


BACKPROPAGATION 31

• Forward Activation: Predict the output


• Compute the loss
• Backward Error: And correct the parameters

X f✓ ŷ y

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.l8cz02hlu
BACKPROPAGATION 32

• Forward Activation: Predict the output


• Compute the loss
• Backward Error: And correct the parameters

forward pass

X f✓ ŷ y

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.l8cz02hlu
BACKPROPAGATION 33

• Forward Activation: Predict the output


• Compute the loss
• Backward Error: And correct the parameters

error

X f✓ ŷ y

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.l8cz02hlu
BACKPROPAGATION 34

• Forward Activation: Predict the output


• Compute the loss
• Backward Error: And correct the parameters

X f✓ ŷ y

backpropagation of the error over the network


using derivative function

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.l8cz02hlu
BACKPROPAGATION 35

• Forward Activation: Predict the output


• Compute the loss
• Backward Error: And correct the parameters

forward pass

error

X f✓ ŷ y

backpropagation of the error over the network


using derivative function

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.l8cz02hlu
EXERCISES 36

• Go here: https://github.com/fabienbaradel/Tensorflow-tutorials
• And do the softmax and multilayer perceptron exercises
37

CONVOLUTIONAL NETWORKS
CONVNET 38

« Convolutional neural networks »

• Created by Yann LeCun (90’s)

• Well-know since 2000

• Big acceleration with GPUs

• Computer vision

• NLP

• Artificiel Intelligence

• Convolution & Pooling

ConvNets usually evaluated on ImageNet (5 millions images, 1000 classes)


CONVOLUTION 39
CONVOLUTION 40

• Finding information in subpart of the image

• Local spatial correlation

• Mimic the biological process

• Less parameters than fully-connected layer

Example: convolution on 5x5 matrix (1 filter=3x3 et stride=1)

Input: (5,5,1) Ouput: (nb_filter,3,3)

http://dl.heeere.com/convolution3/
POOLING 41

• Sampling over a matrix

• Dimension reduction

• Reduce number of parameters of further layers

• No learnable parameters!

Example: pooling over a 20x20 matrix (filter=10x10 et stride=10)


CONVNETS 42

AlexNet (2012)

80.1 %

GoogleNet Inception v3 (2015)

93.4 %
CONVNETS 43
EXERCISES 44

• Complete the exercises:


✴ One Conv + Max Pool
✴ LeNet

HAVE A LOOK TO TF.SLIM TO MAKE


YOUR LIFE EASIER
45

WHY CONVOLUTION WORKS?


FEATURE MAPS 46

Layer 1: ~ Gabor filters


FEATURE MAPS 47
FEATURE MAPS 48
FILTERS 49

https://www.youtube.com/watch?v=AgkfIQ4IGaM
FINE-TUNING 50

FROZEN FINETUNED

• Filters after first convolutional


layer are generic (Gabor filters)
• Deeper you go in network and
more task specific are your
filters
FINE-TUNING 51

pretrained Inception v3 on ImageNet

0 1
v1
DIMENSION REDUCTION v( = @ ... A
v2048
EXERCISE 52

• Wanna win $150’000? YES YOU CAN!


• Go to the Github repo and do the
« Classification from DeepFeatures » exercise
• And submit your .csv in Kaggle (and cross your
fingers)

WHAT IS YOUR SCORE?


Want to add convolution?


Reshape your vector to a 3D matrix…
2048 = 32*32*3
53

AUTOENCODER
NEURAL NETWORK LEARNING 54

X f Y

Supervised learning
‣ y are given !

X X
f Z g

Unsupervised learning
‣ y is no longer needed
AUTOENCODER 55

X X
f Z g
encoder latent decoder

• Learning a compact data representation


• Encode input to smaller latent space
• Decode from the latent space to the input
• Predict input from input
• Loss function = mean square error
• f and g are neural networks
• SGD as usual
AUTOENCODER 56

X X
f Z g
encoder latent decoder

f and g are linear without hidden layer


=> your solution is an approximation of a PCA
57

GENERATIVE MODELS
GENERATIVE MOMENT MATCHING NETWORKS58

X X’
f Z g
encoder latent decoder RMSE

X
GENERATIVE MOMENT MATCHING NETWORKS59

Z
GENERATED
predicted latent

N(0,1)
GENERATIVE MOMENT MATCHING NETWORKS60
Z latent

MMD
X
f Z
GENERATED
predicted latent

encoder

N(0,1)
GENERATIVE MOMENT MATCHING NETWORKS61

X
Z
GENERATED
g GENERATED

decoder

N(0,1)
GENERATIVE MOMENT MATCHING NETWORKS62
latent
X
f Z
encoder ~
~
X
latent Z
GENERATED
g GENERATED

decoder

N(0,1)
GENERATIVE ADVERSIAL NETWORKS 63

Intuition

Generator fake money


GENERATIVE ADVERSIAL NETWORKS 64

Intuition

Generator fake money Discriminator

FAKE
OR
REAL?

FAKE
OR
REAL?

real money
GENERATIVE ADVERSIAL NETWORKS 65

noise Generator fake image Discriminator real or not?

Z G D Y

• G and D are neural networks


• Find a G that minimizes Y
the accuracy of the best D
D
• Alternate optimization of G
and D real image

http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
GAN: EXAMPLES 66
GAN: EXAMPLES 67
GAN: EXAMPLES 68

Ongoing topic…
EXERCISE 69

• Complete the exercises:


✴ « Autoencoder_exo »
✴ « Conv-Deconv Autoencoder_exo »
✴ And GMMN if you are fast enough!
70

SEQUENCE MODELING
WHAT ABOUT SEQUENCE? 71

Image = Static Vidéo = Sequence of images


(almost) Solved not solved at all…
WHAT ABOUT SEQUENCE? 72

Sequence to sequence:
Machine Translation
RECURRENT NEURAL NETWORK 73

• Imagine X as a time series : (x1, x2, …, xn)


• h is the hidden state of the RNN
• Initialized at (1,1,…,1) at t=0
• And h is modified after each timestep

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
RNN AND CLASSIFICATION 74

H_0 Initialize randomly

RNN
RNN AND CLASSIFICATION 75

H_0

X_1 RNN H_1


RNN AND CLASSIFICATION 76

H_0

X_1 H_1

X_2 RNN H_2


RNN AND CLASSIFICATION 77

H_0

X_1 RNN H_1

H_N-1

H_N
X_n RNN
RNN AND CLASSIFICATION 78

H_0

X_1 RNN H_1

H_N-1

H_N classif
X_n RNN Y
WORD2VEC 79

How to represent a word as a vector?

TF-IDF?

=> Learning word embedding

100
Italy = (5.12, 7.21, ..., 0.78) 2 R

Beautiful word2vec relationships:


king − man + woman = queen
Tokyo − Japan + France = Paris
best − good + strong = strongest

And of course some mistakes:


England − London + Baghdad = ?
WORD2VEC 80

How to represent a word as a vector?

TF-IDF?

=> Learning word embedding

100
Italy = (5.12, 7.21, ..., 0.78) 2 R

Beautiful word2vec relationships:


king − man + woman = queen
Tokyo − Japan + France = Paris
best − good + strong = strongest

And of course some mistakes:


England − London + Baghdad = Mosul ?
WORD2VEC 81

How to represent a word as a vector?

TF-IDF?

=> Learning word embedding

100
Italy = (5.12, 7.21, ..., 0.78) 2 R

Beautiful word2vec relationships:


king − man + woman = queen
Tokyo − Japan + France = Paris
best − good + strong = strongest

And of course some mistakes:


England − London + Baghdad = Mosul Iraq
RNN ON MNIST 82

=
RNN ON MNIST 83

=
RNN ON MNIST 84

=
RNN ON MNIST 85

=
RNN ON MNIST 86

[28,28] = sequence of 28 vectors of size 28


RNN ON MNIST 87

• Complete the exo « rnn_exo »


MACHINE LEARNING 88
REINFORCEMENT LEARNING 89

RL = Reinforcement Learning
RL: FEW EXAMPLES 90

https://www.youtube.com/watch?v=V1eYniJ0Rnk
RL IN A FINITE STATE SPACE 91

st at st , st+1 2 S
st+1
at 2 A(st )
t = 0, 1, 2, ...
rt+1
FROZEN LAKE EXAMPLE 92

2 3
S F F F F = frozen surface, safe
6F H F H7 G = goal, where the frisbee is located
6 7
4F F F H5 S = starting point, safe
H F F G H = hole, fall to your doom

• Possible actions:
- Up
- Down
- Left
- Right

Ice is slippery: you won’t always move in the direction you intend
RL: DEFINITIONS 93
The agent learns to assign values to state-action pairs

Discounted return:

2 T 1
Rt = rt+1 + rt+2 + rt+3 + ... + rT
where 2 [0; 1] is the discount rate

Action - value function for policy ⇡:


Q⇡ (s, a) = E⇡ {Rt |st = s, at = a}
« How good an action is for the future given a
certain state? »
FORMULATION OF THE Q-FUNCTION 94
Actions
2 3
Q(0, Up) Q(0, Down) Q(0, Left) Q(0, Right)
6 ... 7
6 7
Q⇡ = 6
6 Q(st , Up) Q(st , Down) Q(st , Left) Q(st , Right) 7
7 States
4 ... 5
Q(16, Up) Q(16, Down) Q(16, Left) Q(16, Right)

Optimal value function unrolled recursively


⇤ ⇤
Q (st , at ) = Est+1 {rt+1 + ⇥ max Q (st+1 , at+1 )}
at+1
immediate reward
optimal action-value function discount factor
the best possible value of the next state

Express state-value function by neural network with parameters ✓:



Q(s, a, ✓) ⇡ Q (s, a)
But what is our target vector to compute the loss function???
DEEP Q-LEARNING 95

Q value function
2 3
Q(0, Up, ✓t ) Q(0, Down, ✓t ) Q(0, Left, ✓t ) Q(0, Right, ✓t )
6 Q(1, Up, ✓t ) Q(1, Down, ✓t ) Q(1, Left, ✓t ) Q(1, Right, ✓t ) 7
Q ✓t =6
4
7
5
... ... ... ...
Q(15, Up, ✓t ) Q(15, Down, ✓t ) Q(15, Left, ✓t ) Q(15, Right, ✓t )

Loss function
X
2
J(✓t ) = (Q(st , at , ✓t ) rt+1 + ⇥ max Q(st+1 , at+1 , ✓t ))
at+1

value at this state best possible value at next state

Train it using SGD! Q-target


NN Q-FUNCTION FOR FROZEN LAKE 96

W Qup
Qdown argmax
S Q A
Qright
state from 0 to 15 Qlef t action
DEEP Q-FUNCTION FOR REAL GAMES 97

ConvNets … here we go again!


EXERCISE 98

• Complete the exo « q_learning_frozen_lake_exo »

Or go back to the fish classification if you want ;)


99

WHAT ABOUT YOUR FIRST EXPERIENCE WITH


TENSORFLOW?

You might also like