100% found this document useful (1 vote)
313 views87 pages

Deep Learning PDF

This document provides an overview of deep learning. It begins by listing some of the tasks that can be accomplished with deep learning, such as handwritten digit recognition, image classification, machine translation, and more. It then explains how deep learning works using neural networks with multiple layers that can learn complex patterns in large datasets. The document details how individual neurons and layers in the network are connected and how weights are assigned and updated during training using backpropagation to minimize errors. In the end, it provides a high-level explanation of how gradients are computed and backpropagation is used to update weights to reduce the cost function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
313 views87 pages

Deep Learning PDF

This document provides an overview of deep learning. It begins by listing some of the tasks that can be accomplished with deep learning, such as handwritten digit recognition, image classification, machine translation, and more. It then explains how deep learning works using neural networks with multiple layers that can learn complex patterns in large datasets. The document details how individual neurons and layers in the network are connected and how weights are assigned and updated during training using backpropagation to minimize errors. In the end, it provides a high-level explanation of how gradients are computed and backpropagation is used to update weights to reduce the cost function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Deep

 Learning

Kairit  Sirts
Lecture  in  TUT  19.12.2016
Outline

• What can be done with deep learning?


• Deep learning demystified
• How can you get started with deep learning?

2
Why deep learning?
Deep learning Gradient boosting

Random Forest Linear model

3
http://www.infoworld.com/article/3003315/big-data/deep-learning-a-brief-guide-for-practical-problem-solvers.html
What can be done with deep learning?
Handwritten digit recognition

MNIST benchmark dataset


The best reported error rate is 0.21%

5
Street view number recognition

• Obtained from house numbers in


Google Street View images
• Best error rate is 1.69%

6
Image classification

7
Image classification
10 objects
6000 labeled instances for each object
Best accuracy so far 96.53%

8
Image classification

9
Image classification

20 superclasses
100 finegrained classes
600 labeled images per class
Best classification accuracy 75.72%

10
Detecting doodles

https://quickdraw.withgoogle.com
There are other simple and fun AI
experiments launched by Google
https://aiexperiments.withgoogle.com

11
Image captioning

12
Image captioning – not so great results

13
Automatic colorization of images

14
http://richzhang.github.io/colorization/resources/images/teaser3.jpg
Automatic colorization of images - failed

15
DeepDream

https://deepdreamgenerator.com
16
DeepDream

17
DeepDream

18
DeepDream

19
Word embeddings

20
http://metaoptimize.s3.amazonaws.com/cw-embeddings-ACL2010/embeddings-mostcommon.EMBEDDING_SIZE=50.png
Word embeddings
months

weekdays

numbers

21
Word embeddings

• 𝑊 man − 𝑊 woman ≈ 𝑊 king − 𝑊(queen)


• 𝑊 walking − 𝑊 walked ≈ 𝑊 swimming − 𝑊(swam)
22
Automatic text generation – pseudo Shakespeare

23
http://karpathy.github.io/2015/05/21/rnn-effectiveness
Machine translation

• Google Translate app

24
Learning to play Atari Arcade games

25
https://www.youtube.com/watch?v=cjpEIotvwFY
AlphaGo

26
https://www.youtube.com/watch?v=PQCrX1sQSzY
Other tasks tackled with deep neural networks

• Speech recognition
• Various tasks in robotics
• Log analysis/risk detection
• Recommendation systems
• Motion detection from videos
• Business and Economics analytics
• Etc …

27
Deep learning demystified
How does deep learning work?
• Biological neuron • Artificial neuron

http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7
29
• Biological neural network • Artificial neural network

30
https://www.eeweb.com/blog/rob_riemen/deep-machine-learning-and-the-google-brain http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7
What happens inside a neuron?

<

ℎ = 𝑥7 𝑤7 + 𝑥: 𝑤: + ⋯ + 𝑥< 𝑤< = = 𝑥> 𝑤>


>?7

Output: ℎ = 𝑓(𝑧)

31
Activation function

1  if  𝑧 ≥ th 1 𝑒 E − 𝑒 DE
𝑓 𝑧 =J 𝑓 𝑧 = 𝑓 𝑧 = E 𝑓 𝑧 = max  (0, 𝑧)
0  if  𝑧 < th 1 + 𝑒 DE 𝑒 + 𝑒 DE

32
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html
Single neuron logic gates

• Threshold activation function

33
https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/
XOR gate
• Cannot be done with a single neuron
• A hidden layer is necessary

𝒙𝟏 𝒙𝟐 OR NOT AND AND


0 0 𝕀 0 ∙ 1 + 0 ∙ 1 > 0.5 = 0 𝕀 0 ∙ −1 + 0 ∙ −1 > −1.5 = 1 𝕀 0 ∙ 1 + 1 ∙ 1 > 1.5 = 0
0 1 𝕀 0 ∙ 1 + 1 ∙ 1 > 0.5 = 1 𝕀 0 ∙ −1 + 1 ∙ −1 > −1.5 = 1 𝕀 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1
1 0 𝕀 1 ∙ 1 + 0 ∙ 1 > 0.5 = 1 𝕀 1 ∙ −1 + 0 ∙ −1 > −1.5 = 1 𝕀 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1
1 1 𝕀 1 ∙ 1 + 1 ∙ 1 > 0.5 = 1 𝕀 1 ∙ −1 + 1 ∙ −1 > −1.5 = 0 𝕀 1 ∙ 1 + 0 ∙ 1 > 1.5 = 0
34
https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/
How to assign weights?

8Y9+9Y9+9Y9+9Y4=
= 270 weights

35
http://neuralnetworksanddeeplearning.com/
Backpropagation

• Standard and efficient method for training neural networks


• The general idea:
• Compute the error with a forward pass
• Propagate the error back to change the weights such that the error would become smaller

ERROR à ERROR’
ERROR’ < ERROR

36
Diversion to calculus - derivative

• 𝑦_ = 𝑓 _ 𝑥
• Derivative is the slope of the tangent
line
• It is the rate of change when going in
the direction of steepest ascent

37
Derivatives

• When 𝑓 _ 𝑥 = 0 then it is the local or


global maximum or minimum or a
saddle point
• When 𝑓 _ 𝑥 > 0 then the function is
increasing
• When 𝑓 _ 𝑥 < 0 then the function is
decreasing

38
Gradients
• Generalization of derivatives to
multivariate functions
• Derivative is a vector pointing to the
direction of steepest ascent
ab ab
• ∇𝑓(𝑥, 𝑦) = ,
ac ad
ab ab
• , - partial derivatives – take
ac ad
derivative wrt one variable while
treating all others as constant

39
Gradients and backpropagation

• Backpropagation is used to compute the gradients with respect to all parameters in a


neural network.
• The gradients are then used in a general method of gradient descent for minimizing
functions.
• We want to minimize the cost function that measures the error made by the neural
network.
• In order to do that we need to move to the direction of deepest descent given by the
gradients.

40
Gradient descent
• An iterative algorithm
• Start with initial parameter values 𝜃 f
• Update parameters iteratively until
convergence:
𝜃 gh7 =:  𝜃 g − 𝛼∇𝑓 𝜃
• 𝛼 - learning rate, controls the step size

41
Deep learning demystified
How does backpropagation work?
Backpropagation explained

• Example from:
https://mattmazur.com/2015/03/17/

• 2 inputs
• 1 hidden layer with 2 neurons
• Bias terms in both the hidden and
output layer
• 2 outputs

43
Initial configuration

• Training values

• Initial weights: 𝑤7 , … , 𝑤l

• Initial biases: 𝑏7 , 𝑏:

44
Forward pass – first hidden unit

45
Forward pass – first hidden unit

46
Forward pass – second hidden unit

47
Forward pass – first output unit

48
Forward pass – second output unit

49
Forward pass – error of the first output

50
Forward pass – output error

51
Forward pass – output error

52
Backwards pass
• Consider 𝑤n
• How much a change in 𝑤n affects the
total error?
• Apply the chain rule:

53
Chain rule
• Formula for computing derivative of the composition of two or more functions
• 𝐹 𝑥 ≡ 𝑓(𝑔 𝑥 ) ≡ (𝑓 ∘ 𝑔)(𝑥) – composition of functions 𝑓 and 𝑔
• 𝐹 _ 𝑥 = 𝑓 _ 𝑔 𝑥 𝑔_ 𝑥

• 𝐹 𝑥 =   𝑒 sc 𝑔 𝑥 = 3𝑥 𝑓 𝑔 𝑥 = 𝑒 u(c) = 𝑒 sc

• 𝐹 _ 𝑥 = 𝑓 _ 𝑔 𝑥 𝑔_ 𝑥 = (𝑒 u(c) )′𝑔′(𝑥) = 𝑒 u c (3𝑥)′ = 𝑒 sc Y 3 = 3𝑒 sc

54
Backwards pass
• Consider 𝑤n
• How much a change in 𝑤n affects the
total error?
• Apply the chain rule:

55
How much does error change wrt the output?

56
How much does output change wrt its net input?

57
Derivative of the sigmoid function

1
𝑓 𝑧 =
1 + 𝑒 DE

𝑓 _ 𝑧 = 𝑓(𝑧)(1 − 𝑓 𝑧 )

58
How much does output change wrt its net input?

59
How much does net input change wrt 𝑤n ?

60
Putting it all together

61
This is known as the delta rule

• Delta rule is the gradient descent rule for updating the weights of the inputs to
neurons in a single-layer neural network

62
Apply delta rule to outer layer weights

63
Update the weights with gradient descent
• set learning rate 𝛼 = 0.5 𝜽𝒕h𝟏 =:  𝜽𝒕 − 𝜶𝜵𝒇 𝜽

64
Backpropagation to hidden layer

• Continue backwards pass to


calculate new values for 𝑤7 , 𝑤: , 𝑤s
and 𝑤|

65
BP through hidden layer

• 𝑜𝑢𝑡€7 affects both 𝑜7 and 𝑜: and thus


needs to take into account both:

66
BP through hidden layer
• Consider one of those:

• First term can be calculated using values


computed before:

• Second term is just 𝑤n

67
BP through hidden layer
• Plug the values in:

• Compute the same value for 𝑜: :

• Compute the total:

68
BP through hidden layer
a•‚gƒ„ a<…gƒ„
• Next we need and for each
a<…gƒ„ a†
weight 𝑤

• Compute the partial derivative wrt a weight

69
BP through hidden layer
• Putting it together

• We can now update 𝑤7

70
BP through hidden layer
• Compute the partial derivatives in the same
way for 𝑤: , 𝑤s and 𝑤|
• Update 𝑤: , 𝑤s and 𝑤|

71
After first update with backpropagation

72
Did the error decrease?

• Old error was: 0.298371109


• Improvement: 0.007343335

• After 10000 updates the error will be


ca 0.000035085
• The generated outputs will be
0.015912196 for 0.01 target and
0.984065734 for 0.99 target

73
In conclusion
• Neural networks consist of artificial neurons organized into layers and connected
to each other with learnable weights.
• Backpropagation with gradient descent is the standard method for training neural
networks.
• Backpropagation can be used to compute the gradients of a neural network,
regardless of the depth of the network.
• Of course, there are other important tricks and tips but this is the basis of
understanding neural networks and deep learning.

74
Common neural network architectures
Feed-forward network

• Simplest type of neural network


• Connections between units do not
form cycles
• Information always moves in one
direction
• It never goes backwards

76
https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif
Recurrent neural network

• Connections between units form cycles


• They possess internal memory – they “remember” the past inputs
• Suitable for modeling sequential/temporal data, such as for instance text and
language data

77
Convolutional neural networks

• Convolutional layers have neurons


arranged in 3 dimensions
• Especially suitable for processing
image data

78
http://parse.ele.tue.nl/education/cluster2
Autoencoders
• Output layer attempts to reconstruct
the input
• Used for unsupervised feature learning
• The hidden layer has typically less
neurons, thus performing data
compression

79
Getting started with neural networks
Courses and tutorials
• https://www.coursera.org/learn/machine-learning -
• Introductory course on machine learning, provides necessary background

• https://www.coursera.org/learn/neural-networks
• Course on neural networks – assumes knowledge about machine learning

• http://ufldl.stanford.edu/tutorial/
• Tutorial on deep learning but covers also some simpler machine learning

• http://cs231n.stanford.edu/
• Course on convolutional neural networks

• https://www.udacity.com/course/deep-learning--ud730
• Course on deep learning

• There are many others … just google …

81
Books

• http://www.deeplearningbook.org/
• Deep Learning: A Practitioner’s approach – not released yet
• Fundamentals of deep learning – not released yet

• See more from:


• http://machinelearningmastery.com/deep-learning-books/

82
Low level libraries
• Theano - http://deeplearning.net/software/theano/
• Tensorflow - https://www.tensorflow.org/get_started/
• Python-based
• Automatic differentiation
• Can use cuda for computing on GPU

• Torch – http://torch.ch/
• Based on Lua
• Modular pieces that are easy to combine
• Lots of pretrained models

• See more: https://deeplearning4j.org/compare-dl4j-torch7-pylearn

83
Higher level libraries
• Keras - https://keras.io/
• On top of theano and tensorflow
• Based on python
• Modular
• Supports both convolutional and recurrent networks
• Supports arbitrary connectivity
• Runs on both CPU and GPU

84
Keras – example code

85
What else?
• Take the Machine Learning course in spring semester
• Use neural networks for your thesis work
• Potential supervisors in UT:
• Kairit Sirts (problems involving natural language)
• Mark Fishel (machine translation)
• Raul Vicente (computational neuroscience)
• Ilya Kuzovkin (computational neuroscience)

• Potential supervisors in TUT


• Juhan Ernits
• Tanel Alumäe (speech data)
• There are possibly others

86
In conclusion - Deep learning
• Can be used to solve very complex problems
• Based on artificial neural networks with many hidden layers
• Each artificial neuron is a simple computational unit
• Neural networks are trained with gradient descent algorithm
• Backpropagations algorithm is used to compute the gradients with respect to
tunable parameters
• There are many tutorials and online courses about deep learning
• There are various software libraries that enable to get started with deep learning
relatively easily
87

You might also like