0% found this document useful (0 votes)

25 views95 pages

Deep Learning

Uploaded by

trungstatheros2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views95 pages

Deep Learning

Uploaded by

trungstatheros2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

Introduction

to Deep Learning
Hoang-Quynh Le (PhD), VNU-UET
Outline
Introduction to Deep Learning
o What is Deep Learning
o Why is it useful
Neural Networks
○ Neural Networks: Perceptron, MLP
○ Gradient Descent, Activation Functions
○ Multi-layer NN: Forward- and Backward Propagation
Typical architectures
○ Convolutional neural networks (CNN)
○ Recurrent neural networks (RNN)
○ Attention mechanism
Data representation
Deep learning framworks

2
1.
Introduction
to Deep Learning

3
AlphaGo

https://www.youtube.com/watch?v=8tq1C8spV_g

4
Tesla X

5
Emotion Detection
• Anger
Check out
• Disgust https://faceinmotion.preferred.ai
• Fear
• Happiness
• Neutral
• Sadness
• Surprise

https://www.freecodecamp.org/news/facial-emotion-recognition-develop-a-c-n-n-and-break-into-kaggle-top-10-f618c024faa7/

6
Google Translation

https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

7
Google Assistant

8
YouTube

9
GPT-3 Applications

https://www.youtube.com/watch?v=_x9AwxfjxvE

10
Machine Learning
Machine learning is a field of computer
science that gives computers the ability to
learn without being explicitly programmed

11
Machine Learning Basis (1)

12
Machine Learning Basis (2)
Machine learning is a field of computer science that gives computers the ability to learn without
being explicitly programmed

Machine Learning
Labeled Data algorithm

Training
Prediction

Learned
Labeled Data model Prediction

Methods that can learn from and make predictions on data

13
Types of Learning
Supervised: Learning with a labeled training set
Example: email classification with already labeled emails

Unsupervised: Discover patterns in unlabeled data

Example: cluster similar documents based on text

Reinforcement learning: learn to act based on feedback/reward

Example: learn to play Go, reward: win or lose

class A

Classification Clustering
Regression
14
Traditional Machine Learning
Traditional ML methods work well because of human-designed
representations and input features
ML becomes just optimizing weights to best make a final prediction

15
Traditional Rule-based approach
Machine learning/
Traditional Featured-based machine learning

Deep learning

16
Image from https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
What is Deep Learning?
A machine learning subfield of learning representations of data. Exceptional
effective at learning patterns.
Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers

https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
18
Traditional ML:
Trainable
Extract Output
Classifier
Hand (e.g.
(e.g. SVM,
Crafted Outdoor Yes
Random
Features or No)
Forrest)

Deep Learning:
Low Mid High Output
Trainable (e.g. outdoor,
Level Level Level
Classifier
Features Features Features indoor)

19
“
“Deep Learning doesn’t do
different things, it does things
differently”.

20
Machine Learning vs. Deep Learning

https://lerablog.org/technology/ai-artificial-intelligence-vs-machine-learning-vs-deep-learning/

21
Why is DL useful?
o Manually designed features are often over-specified, incomplete and take a
long time to design and validate
o Learned features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable
framework for representing world, visual and linguistic information.
o Can learn both unsupervised and supervised i thm
s
r
o Effective end-to-end joint system learning g al g o
nin
L ear
o Utilize large amounts of training data ep
De

Performance
Traditional ML algorithms

Size of Data
22
How does DL learn features?

Indoor
Answer: Indoor
Outdoor

24
How does DL learn features?

Indoor
Answer: Indoor
Outdoor

25
Image classification

https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53
26
2.
Neural Networks

27
Neural in the Brain (1)

28
Neural in the Brain (2)

29
Artificial Neural Network
An Artificial Neural Network is an information processing paradigm
that is inspired by the biological nervous systems, such as the
human brain’s information processing mechanism.
x1 a1(1)

x2 a2(1)
a1(2) Y
x3 a3(1)

x4 a4(1)

Input Hidden Layers Output

#parameters: 4*4 + 4 +1 30
Artificial Perceptron

31
Artificial Perceptron (2)

32
Perceptron Training Rule

33
Loss function
• The quantity to be minimized (optimized) during training
• the only thing the network cares about
• there might also be other metrics you care about
• Common tasks have “standard” loss functions:
• mean squared error for regression
• binary cross-entropy for two-class classification
• categorical cross-entropy for multi-class classification
• etc.
• https://lossfunctions.tumblr.com/
Optimizer

• How to update the weights

based on the loss function
• Learning rate (+scheduling)
• Stochastic gradient descent,
momentum, and their
variants
• RMSProp is usually a good
first choice
• more info: http://ruder.io/optimizing-
gradient-descent/

Animation from: https://imgur.com/s25RsOr

Gradient Descent (1)

36
Gradient Descent (2)

37
Gradient Descent (3)

38
Gradient Descent (4)

39
Gradient Descent (5)

https://machinelearningcoban.com/2017/01/12/gradientdescent/ 40
Gradient Descent (5)

https://machinelearningcoban.com/2017/01/12/gradientdescent/ 41
Batch, Mini-Batch, Iterative Training

Batch Training Mini-batch Training Iterative Training

Use all training points to Use subset training points Use a single training point
compute gradients for to compute gradients for to compute gradients for
each iteration each iteration each iteration

42
Iteration, Epoch

An iteration respects to the training for a mini-batch

An epoch respects to the training for full dataset

43
Neural Networks with Activation Functions

Non-linear function or Activation function

The purpose of the activation function is to introduce non-linearity into the network

44
Activation: Sigmoid

Takes a real-valued number and

“squashes” it into range between 0
and 1.

http://adilmoujahid.com/images/activation.png

45
Activation: Tanh

Takes a real-valued number and

“squashes” it into range between -1
and 1.

http://adilmoujahid.com/images/activation.png

46
Activation: ReLu

Takes a real-valued number and

thresholds it at zero

http://adilmoujahid.com/images/activation.png

47
Multi Layer Perceptron

48
Multi-layer Neural Networks with Sigmoid
Layer 4 Output layer

Layer 3

Hidden
Layers

Layer 2

Each node (“neuron”) is the sigmoid unit

Input layer
Layer 1
49
Forward Propagation Output layer
Layer 4
Notations
○ Input vector at level l
○ Weight matrix of level l
Forward-propagation Layer 3

Hidden
Layers

Layer 1
Input layer

50
Back Propagation Output layer
Layer 4
Error at level l
Error at last level (L=4)

Layer 3

Hidden
Layers

Layer 1
Input layer

51
3.
Typical Architectures

52
Number of Parameters

x1 a1(1) Softmax

x2 a2(1)
a1(2) Y
x3 a3(1)

x4 a4(1)

Input Hidden Layers Output

21 = 4*4 + 4 +1
53
If the input is an Image?
x1 a1(1)

x2 a2(1)
a1(2) Y
x3 a3(1)

400 X 400 X 3
a480000(1)

x480000

Input Hidden Layers Output

Number of Parameters
480000*480000 + 480000 +1 = approximately 230 Billion !!!
480000*1000 + 1000 +1 = approximately 480 million !!!
54
Convolution Layers
0 1 0  Inspired by the neurophysiological
Filter 1 -4 1
0 1 0
experiments conducted by Hubel and
Wiesel 1962.
1 1 1 1 1 1 0.015686 0.015686 0.011765 0.015686 0.015686 0.015686 0.015686 0.964706 0.988235 0.964706 0.866667 0.031373 0.023529 0.007843
0.007843 0.741176 1 1 0.984314 0.023529 0.019608 0.015686 0.015686 0.015686 0.011765 0.101961 0.972549 1 1 0.996078 0.996078 0.996078 0.058824 0.015686
0.019608 0.513726 1 1 1 0.019608 0.015686 0.015686 0.015686 0.007843 0.011765 1 1 1 0.996078 0.031373 0.015686 0.019608 1 0.011765
0.015686 0.733333 1 1 0.996078 0.019608 0.019608 0.015686 0.015686 0.011765 0.984314 1 1 0.988235 0.027451 0.015686 0.007843 0.007843 1 0.352941
0.015686 0.823529 1 1 0.988235 0.019608 0.019608 0.015686 0.015686 0.019608 1 1 0.980392 0.015686 0.015686 0.015686 0.015686 0.996078 1 0.996078
0.015686 0.913726 1 1 0.996078 0.019608 0.019608 0.019608 0.019608 1 1 0.984314 0.015686 0.015686 0.015686 0.015686 0.952941 1 1 0.992157
0.019608 0.913726 1 1 0.988235 0.019608 0.019608 0.019608 0.039216 0.996078 1 0.015686 0.015686 0.015686 0.015686 0.996078 1 1 1 0.007843
0.019608 0.898039 1 1 0.988235 0.019608 0.015686 0.019608 0.968628 0.996078 0.980392 0.027451 0.015686 0.019608 0.980392 0.972549 1 1 1 0.019608
0.043137 0.905882 1 1 1 0.015686 0.035294 0.968628 1 1 0.023529 1 0.792157 0.996078 1 1 0.980392 0.992157 0.039216 0.023529
1 1 1 1 1 0.992157 0.992157 1 1 0.984314 0.015686 0.015686 0.858824 0.996078 1 0.992157 0.501961 0.019608 0.019608 0.023529
0.996078 0.992157 1 1 1 0.933333 0.003922 0.996078 1 0.988235 1 0.992157 1 1 1 0.988235 1 1 1 1
0.015686 0.74902 1 1 0.984314 0.019608 0.019608 0.031373 0.984314 0.023529 0.015686 0.015686 1 1 1 0 0.003922 0.027451 0.980392 1
0.019608 0.023529 1 1 1 0.019608 0.019608 0.564706 0.894118 0.019608 0.015686 0.015686 1 1 1 0.015686 0.015686 0.015686 0.05098 1
0.015686 0.015686 1 1 1 0.047059 0.019608 0.992157 0.007843 0.011765 0.011765 0.015686 1 1 1 0.015686 0.019608 0.996078 0.023529 0.996078
0.019608 0.015686 0.243137 1 1 0.976471 0.035294 1 0.003922 0.011765 0.011765 0.015686 1 1 1 0.988235 0.988235 1 0.003922 0.015686
0.019608 0.019608 0.027451 1 1 0.992157 0.223529 0.662745 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.023529 0.996078 0.011765 0.011765
0.015686 0.015686 0.011765 1 1 1 1 0.035294 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.015686 0.964706 0.003922 0.996078
0.007843 0.019608 0.011765 0.054902 1 1 0.988235 0.007843 0.011765 0.011765 0.015686 0.011765 1 1 1 0.015686 0.015686 0.015686 0.023529 1
0.007843 0.007843 0.015686 0.015686 0.960784 1 0.490196 0.015686 0.015686 0.015686 0.007843 0.027451 1 1 1 0.011765 0.011765 0.043137 1 1
0.023529 0.003922 0.007843 0.023529 0.980392 0.976471 0.039216 0.019608 0.007843 0.019608 0.015686 1 1 1 1 1 1 1 1 1

Convoluted Image
Input Image
55
Convolution Layers

a b c d w1 w2
h1 h2
e f g h w3 w4
i j k l

m n o p

Filter Convolved Image

Input Image
(Feature Map)
ℎ2 = 𝑓 𝑏 ∗ 𝑤1 + 𝑐 ∗ 𝑤2 + 𝑓 ∗ 𝑤3 + 𝑔 ∗ 𝑤4

Number of Parameters for one feature map = 4

Number of Parameters for 100 feature map = 4*100
56
Lower Level to More Complex Features

w1 w2

w3 w4
w5 w6

w7 w8
Filter 1
Input Image Filter 2
Layer 1
Feature Map
Layer 2
Feature Map
In CNNs, hidden units are only connected to local receptive field.
57
Pooling
Max pooling: reports the maximum output within a rectangular neighborhood.
Average pooling: reports the average output of a rectangular neighborhood.

1 3 5 3
MaxPool with 2X2 filter with stride of 2
4 2 3 1
4 5
3 1 1 3
3 4
0 1 0 4

Input Matrix Output Matrix

58
Convolutional Neural Networks (1)
Maxpool
Output
Feature Extraction Architecture Vector

Living Room

Bed Room
128

256
256

512
512

512

512
128

256

512

512
64

Kitchen
64

Bathroom

Max Pool Outdoor

Filter

Fully Connected
Layers

59
Convolutional Neural Networks (2)
Output: Binary, Multinomial, Continuous, Count
Input: Fixed size, can use padding to make all images same size.
Architecture: Choice is ad hoc
○ requires experimentation.
Optimization: Backward propagation
○ Hyper parameters for very deep model can be estimated
properly only if you have billions of images.
• Use an architecture and trained hyper parameters from
other papers (ImageNet or Microsoft/Google APIs etc)
Computing Power: Buy a GPU!!

60
Automatic Colorization of Black and White Images

61
Optimizing Images

Post Processing Feature Optimization

(Color Curves and Details)

Post Processing Feature Optimization Post Processing Feature Optimization

(Illumination) (Color Tone: Warmness)
62
63
CNN for
text classification

64
Recurrent Neural
Networks (RNN)

65
Why RNN?
The limitations of the Convolutional Neural Networks
Take fixed length vectors as input and produce fixed length vectors
as output.
Allow fixed amount of computational steps.
We need to model the data with temporal or sequential structures and
varying length of inputs and outputs
e.g.:
This movie is ridiculously good.
This movie is very slow in the beginning but picks up pace later on and
has some great action sequences and comedy scenes.
66
Modeling Sequences
A person riding a
Image
motorbike on dirt
Captioning
road

Awesome tutorial. Positive Sentiment

Analysis

Happy
Machine
Diwali श◌ु
भ
Translation
द प◌ा
वल

67
What is RNN?
Recurrent neural networks are connectionist models with the ability to selectively
pass information across sequence steps, while processing sequential data one
element at a time.
Allows a memory of the previous inputs to persist in the model’s internal state and
influence the outcome.
OUTPUT
h(t) h(t)
Hidden Layer Delay
h(t-1)
x(t)
INPUT

68
RNN (rolled over time)

ℎ 𝑡 =𝑓 𝑤 ∗ℎ 𝑡−1 +𝑤 ∗𝑥 𝑡

69
RNN (rolled over time)

70
The Vanishing Gradient Problem
RNN’s use back propagation.
Back propagation uses chain rule.
○ Chain rule multiplies derivatives
If these derivatives are between 0 and 1 the product vanishes as the
chain gets longer.
○ or the product explodes if the derivatives are greater than 1.
Sigmoid activation function in RNN leads to this problem.
ReLu, in theory, avoids this problem but not in practice.

71
Problem with Vanishing or Exploding Gradients
Don’t allow us to learn long term dependencies.
○ Param is a hard worker.
VS.
○ Param, student of Yong, is a hard worker.

BAD!!!!
Misguided!!!!
Unacceptable!!!!

72
Long Short-Term Memory

LSTM provide solution to the vanishing/exploding gradient problem.

Solution: Memory Cell, which is updated at each step in the sequence.
Three Gates control the flow of information to and from the Memory cell
○ Input Gate: protect the current step from irrelevant inputs
○ Output Gate: prevents current step from passing irrelevant
information to later steps.
○ Forget Gate: limits information passed from one cell to the next.

73
LSTM (1)

c0 Forget f1
+ c1

Input i1
𝑤 = . + .
h0 𝑓 () h1
u1
= . + .
𝑤

74
LSTM (2)

c0 Forget f1
+ c1

𝑤 𝑓 () Input i1
𝑤
=
h0 𝑓 () h1
u1
=
𝑤 𝑤

75
LSTM (3)

c0 Forget f1
+ c1

𝑤 𝑓 () Input i1
𝑤 =
h0 𝑓 () h1
u1
=
𝑤
𝑤

76
LSTM (4)
=
c0 Forget f1
+ c1
=
Input i1 𝑓 ()
𝑤

h0 𝑓 () h1
u1 Output o1 h1

h0 () h1
x1

x1
77
LSTM (5)

c0 Forget + c1 Forget + c2
f1 f2
Input 𝑓 () 𝑓 ()
Input
𝑤 i1 i2
𝑤
h0 𝑓 () u1 Output h1 𝑓 () u2 Output h2
o1 o2
𝑤 𝑤

x1 x2

78
Attention
Mechanism

79
Attention Mechanism (1)

80
Attention Mechanism (2)

81
4.
Data representation

82
MACHINES learn
BETTER
by using
a deep understanding of data

83
Traditional Rule-based approach

Machine learning/
Traditional Featured-based machine learning

Deeper? Deep learning

From Shallow to Deep Pre-Training Model for representing data

84
Transformer architecture

Attention is all you need!

The dominant sequence
transduction models are based on
complex recurrent or
convolutional neural networks in
an encoder and decoder
configuration.
Text: BERT, Bart, ViT5
Image: VIT, Resnet, VGG
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you
need. Advances in neural information processing systems, 30. 85
Language model

86
Word Embedding vs Language Models

87
Pre-trained language models

88
Large language models

A large language model

(LLM) is a type of language
model notable for its
ability to achieve general-
purpose language
understanding and
generation.
Large language models
use transformer models
and are trained using
massive datasets.

89
5.
Deep Learning
Framework

90
Deep learning frameworks
+

• Actually tools for defining static or

+
dynamic general-purpose computational
graphs
• Automatic differentiation ✕ ✕

• Seamless CPU / GPU usage

• multi-GPU, distributed x y 5

• Python/numpy or R interfaces
• instead of C, C++, CUDA or HIP
• Open source
Deep learning Lasagne Keras TF Estimator torch.nn Gluon

frameworks (2)
Theano TensorFlow CNTK PyTorch MXNet Caffe

• Keras is a high-level CUDA, cuDNN

MKL, MKL-DNN
HIP, MIOpen
neural networks API
• we will use TensorFlow
GPUs CPUs
as the compute backend
• included in TensorFlow 2 as tf.keras
• https://keras.io/ , https://www.tensorflow.org/guide/keras
• PyTorch is:
• a GPU-based tensor library
• an efficient library for dynamic neural networks
• https://pytorch.org/
Summary
Introduction to Deep Learning
o What is Deep Learning
o Why is it useful
Neural Networks
○ Neural Networks: Perceptron, MLP
○ Gradient Descent, Activation Functions
○ Multi-layer NN: Forward- and Backward Propagation
Typical architectures
○ Convolutional neural networks (CNN)
○ Recurrent neural networks (RNN)
○ Attention mechanism
Data representation
Deep learning Frameworks

94
Thanks!
Any questions?
[email protected]

DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Data Scientist Roadmap 2025-26
No ratings yet
Data Scientist Roadmap 2025-26
32 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
205 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
Lecture 3 - Introduction To Deep Learning
No ratings yet
Lecture 3 - Introduction To Deep Learning
27 pages
Lec 07 IntroDL
No ratings yet
Lec 07 IntroDL
39 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Introduction To Deep Learning
100% (1)
Introduction To Deep Learning
24 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Deep Learning and CNN
No ratings yet
Deep Learning and CNN
38 pages
Module 1
No ratings yet
Module 1
22 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Deep Learning Basics Presentation
No ratings yet
Deep Learning Basics Presentation
14 pages
Foundation of AIML
No ratings yet
Foundation of AIML
5 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Chest CT Image Segmentation Using Deep Learning
No ratings yet
Chest CT Image Segmentation Using Deep Learning
44 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Deep Representation Learning Techniques For Audio Signal Processing
No ratings yet
Deep Representation Learning Techniques For Audio Signal Processing
152 pages
Deep Learning Day 27
No ratings yet
Deep Learning Day 27
43 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Machine Learning For Beginners
No ratings yet
Machine Learning For Beginners
16 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
ML System Design
No ratings yet
ML System Design
11 pages
Data Science Roadmap
No ratings yet
Data Science Roadmap
41 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
Unit 4
No ratings yet
Unit 4
18 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
DL Intro
No ratings yet
DL Intro
64 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Analyzing Types of Neural Networks in Deep Learning
No ratings yet
Analyzing Types of Neural Networks in Deep Learning
15 pages
Unit I
No ratings yet
Unit I
90 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
No ratings yet
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
92 pages
Machine Learning Deep Learning Overview AIST
No ratings yet
Machine Learning Deep Learning Overview AIST
86 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Artificial Neural Network Concepts and Examples
No ratings yet
Artificial Neural Network Concepts and Examples
61 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
Group I
No ratings yet
Group I
20 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Gradient Descent Algorithm in Machine Learning - Analytics Vidhya
No ratings yet
Gradient Descent Algorithm in Machine Learning - Analytics Vidhya
11 pages
DNN - 1 - M1 - Fundamentals of Neural Network
No ratings yet
DNN - 1 - M1 - Fundamentals of Neural Network
95 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
22 pages
Radar and Camera Early Fusion For Vehicle Detection in Advanced Driver Assistance Systems
No ratings yet
Radar and Camera Early Fusion For Vehicle Detection in Advanced Driver Assistance Systems
11 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
7 Deep Learning
No ratings yet
7 Deep Learning
75 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Unit 1
No ratings yet
Unit 1
70 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Stochastic Gradient Descent 2
No ratings yet
Stochastic Gradient Descent 2
42 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
CS224N: GPT-2 Assignment Documentation
No ratings yet
CS224N: GPT-2 Assignment Documentation
30 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Ai Assignment
No ratings yet
Ai Assignment
6 pages
Machine and Deep Learning Approaches For Forecasting Electricity Price and Energy Load Assessment On Real Datasets
No ratings yet
Machine and Deep Learning Approaches For Forecasting Electricity Price and Energy Load Assessment On Real Datasets
18 pages
Deep Learning Introduction
No ratings yet
Deep Learning Introduction
14 pages
Maritime Vessel Images Classification Using Deep
No ratings yet
Maritime Vessel Images Classification Using Deep
6 pages
Electrical Thermal Image Semantic Segmentation Large-Scale Dataset and Baseline
No ratings yet
Electrical Thermal Image Semantic Segmentation Large-Scale Dataset and Baseline
13 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Reviewer
No ratings yet
Reviewer
7 pages
A Deep Learning Framework For Solution and Discovery in Solid Mechanics
No ratings yet
A Deep Learning Framework For Solution and Discovery in Solid Mechanics
24 pages
Learning Rate Free Learning by D Adaptation
No ratings yet
Learning Rate Free Learning by D Adaptation
35 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Gradient Descent Example PDF
No ratings yet
Gradient Descent Example PDF
3 pages

Deep Learning

Uploaded by

Deep Learning

Uploaded by

Introduction

Methods that can learn from and make predictions on data

Unsupervised: Discover patterns in unlabeled data

Reinforcement learning: learn to act based on feedback/reward

Input Hidden Layers Output

• How to update the weights

Animation from: https://imgur.com/s25RsOr

Batch Training Mini-batch Training Iterative Training

An iteration respects to the training for a mini-batch

An epoch respects to the training for full dataset

Non-linear function or Activation function

Takes a real-valued number and

Takes a real-valued number and

Takes a real-valued number and

Each node (“neuron”) is the sigmoid unit

Input Hidden Layers Output

Input Hidden Layers Output

Filter Convolved Image

Number of Parameters for one feature map = 4

Input Matrix Output Matrix

Max Pool Outdoor

Post Processing Feature Optimization

Post Processing Feature Optimization Post Processing Feature Optimization

Awesome tutorial. Positive Sentiment

LSTM provide solution to the vanishing/exploding gradient problem.

Deeper? Deep learning

From Shallow to Deep Pre-Training Model for representing data

Attention is all you need!

A large language model

• Actually tools for defining static or

• Seamless CPU / GPU usage

• Keras is a high-level CUDA, cuDNN

You might also like