0% found this document useful (0 votes)
18 views66 pages

Lecture1 ANN - Full

The document outlines a course on Advanced Deep Learning, detailing grading components and referencing key resources. It discusses the real-world impact of AI across various sectors such as economy, politics, and law, and highlights significant advancements in deep learning technologies like CNNs and RNNs. Additionally, it covers the evolution of neural networks, challenges faced, and applications in speech recognition and machine translation.

Uploaded by

amnashoaib0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views66 pages

Lecture1 ANN - Full

The document outlines a course on Advanced Deep Learning, detailing grading components and referencing key resources. It discusses the real-world impact of AI across various sectors such as economy, politics, and law, and highlights significant advancements in deep learning technologies like CNNs and RNNs. Additionally, it covers the evolution of neural networks, challenges faced, and applications in speech recognition and machine translation.

Uploaded by

amnashoaib0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Advanced Deep Learning

Dr. Arshad Iqbal


Assistant Professor,
Administrative stuffs
• Grade
- Homework+Project +Attendance 15%
- Quizes 10%
- Midterm 25%
- Final / Project 50%

• Reference
- Stanford CS231 (http://cs231n.stanford.edu/)
- Stanford CS224 (http://cs224d.stanford.edu/ )
- Deep Learning Book (http://www.deeplearningbook.org/ )
AI is having real-
world impact
• Public imagination
• Text assistants
AI is having real-world impact
▪ Public imagination
▪ Text assistants
▪ Image generation
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ 454 billion USD globally

https://www.precedenceresearch.com/artificial-intelligence-market
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law

Bloomberg Law, 2023


AI is having real-world impact
▪ Public imagination The Economist, 2021
▪ Economy
▪ Politics
▪ Law
▪ Labor New York Times, 2023

MarketWatch, 2023
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences

Nature, 2022
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences

Wired, 2022
AI is having real-world impact
▪ Public imagination
▪ Economy
▪ Politics
▪ Law
▪ Labor
▪ Sciences
▪ Education
Forbes, 2023
Deep learning hype on media
• New York Times(2012)
- Google Brain project identifying cat from YouTube
videos without any labels
Deep learning hype on media
• MIT Technology Review
- One of top 10 most promising breakthrough techs
Recent impacts
• Real industry impacts!
Speech Recognition Image Recognition

• TIMIT Phone Error Rate (PER) • Top-5 Error on ImageNet


40 Start of using
35 Deep Learning!
30
25
(%)

20
15
10
5
0
1996

2010
1990
1992
1994

1998
2000
2002
2004
2006
2008

2013
2014

- Deep Learning used for Apple Siri,


GoogleVoice Search,Samsung S-Voice
etc..
Recent impacts
• Some more
Face Identification Machine Translation

• Tell whether same or not • Real-time translation


→→ Same

→→Different

4
(%)

0
Microsoft Facebook CUHK Google Human
Recent impacts
• Even more…
Image Captioning Visual QA system
• Generate captions onimages • Answer a question about an image

[https://www.srijan.net/how-we-do/] [Baidu & UCLA,2015]


Recent impacts
• Getting somewhat scary… ☺☺
Playingvideo games Teaching robots

• Playing Atari video games • Robot folding laundry

[Mnih et.al.,Human-level control through deep [Levine et. al.,End-to-end training of


reinforcement learning,2015] deep visuomotor policies,2015]
Recent impacts
• Getting somewhat scary… ☺☺
Deep Art AlphaGo

• Learning painting styles • Everyone knows the story

[Silver et.al.,2016]
http://deepart.io
Supervised learning
• Teach computers with many (input, output) pairs

[Figures from Fei-Fei Li]


Supervised learning
• Examples
– Imageclassification

“Cat”
Supervised learning
• Examples
– Speechrecognition

“How are you.”


Supervised learning
• Examples
– Object detection / Scene labeling (self-driving cars,robots)
Supervised learning
• Some on-going research
– Activity classification based on radar micro-Doppler
Supervised learning
• Going back to the image classification example

[Figures from Fei-Fei Li]


Conventional approach for
supervised learning
• How can we teach a computer what cat is?

[Figures from Fei-Fei Li]


Conventional approach for
supervised learning
• Devise hand-crafted features (e.g., SIFT,HOG,…)
- Requires some domain knowledge

[Figures from Fei-Fei Li]


Conventional approach for
supervised learning
• Apply a classifier on extracted features
→→Separation of feature extractors and a classifier

[Figures from Fei-Fei Li]


Problem of hand-crafted features
• What about following cat?
Problem of hand-crafted features
• What about following cat?
- Need to devise different features
Problem of hand-crafted features
• What about following cats?
- Not easy to devise correct features for all cats!
Deep learning takes a different route
• Main idea : Learn features (“representations”) from data too
→→Then, everything can be learned end-to-end!
• Inspired by human brain, use deep neural networks
→→May learn hierarchical representation of data

[Human’s visual cortex] [Deep neural network]


An (artificial) neuron
• Also known as a “perceptron”
- Inner product + non-linearity
- Weights are the parameters to learn
✓ ◆
y = f X = w ix i+ b
i

1
e.g., f (x) =
1 + e—x

f (x) = max(0, x)
Deep neural networks
• Multiple (e.g., 5~20) layers of multiple neurons
- “Weights” updated using stochastic gradient descent
Forward Pass Backward Pass

Error back-propagation
Compute prediction

(chain rule)
Deep neural networks
• Multiple (e.g., 5~20) layers of multiple neurons
- “Weights” updated using stochastic gradient descent

[Figures from Jeff Dean]


Why multi-layer neural networks?
• It can nonlinearly distort the input space
- In results, a simple classifier can easily separate classes
→→It learns right transformation for a given learning task

[Figures from [LeCun, Bengio, and Hinton, 15]]


Why multi-layer neural networks?
• It can nonlinearly distort the input space
- In results, a simple classifier can easily separate classes
→→It learns right transformation for a given learning task

[Figures from Christopher Olah’s blog]


Founders of deep neural networks
• 3 Godfathers of neuralnetworks [LeCun, Bengio, and Hinton, Deep learning, 2015]

Geoff Hinton Yann LeCun Yoshua


U. of Toronto NYU Bengio
Google Facebook U. of Montreal
• Main concepts already proposed in the 80’s~90’s
- Back-propagation
[Rumelhart et.al.,Learning representations by back-propagating errors,1986]
- Convolution neural networks
[LeCun et.al.,Handwritten digit recognition with a back-propagation network,1990]
- Recurrent neural networks
[Bengio et.al.,Learning long-term dependencies with gradient descent is difficult,1994]
Dark age of neural networks : 90’s
• Vanishing gradient problem vanishing gradient
- Gradient becomes very small
x1 y1
at the lower layers x2 y2
→→Lower layers may not be learned
appropriately backward error information vanishing

• Local optima problem


- Parameter space is highly
non-convex
→→Solution can be trapped in
a bad local minima
Two catalysts for the renaissance
• Unsupervised pre-training (2006)
- First pre-train layer-by-layer without
labels (unsupervised training)
→→Fine-tune with labels in the end
→→Re-ignited interests in deep learning
[Hinton and Salakhutdinov, Reducing dimensionality of data with neural networks,2006]

• Big data & GPU


- Big data got available in many applications
- Large-scale experiment became possible
due to fast GPUs (Nvidia CUDA)
→→Parallelizes matrix-matrix multiplication
[Raina et.al., Large-scale deep unsupervised learning with graphic processors,2009]
[Coates et.al.,Deep learning with COTS HPC systems,2013]
Speech recognition was the first target
• Estimate original spoken text from the speech waveforms
Estimate
“Hello world”
“Hello world”

• Breakthrough in acoustic models (recognize phonemes)


→→DNNs replaced GMMs (20% relative improvements)
[Hinton et.al.,Deep neural networks for acoustic modeling in speech recognition,2012]

<
Gaussian Mixture Models Deep NeuralNetworks
(GMM) (DNN)
Two pillars of recent advances
• Convolutional Neural Networks (CNN)
→→Excellent for image data

• Recurrent Neural Networks (RNN)


→→Excellent for sequential data
Applications of CNN: Image recognition
• ImageNet
- 15 million labeled images(224x224) for 22000 classes
- Managed by Stanford (Prof. Fei-Fei Li) & UNC Chapel Hill

• ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)


- 1.2 million (training), 50K (validation),150K (testing)
- 1000 classes (roughly 1000 images/class)
- Annual challenge since 2010
2 main tasks in ILSVRC : Classification
• Classification
- Predict the class of an image with objects
- Ex)

“Siberian husky” “Eskimo dog”


- Metric: top-5 error
→→Correct if the true class is in top 5 predicted classes
2 main tasks in ILSVRC : Detection
• Detection
- Locate the objects in image and predict their classes
- Ex)

- Metric: mAP (mean average precision)


→→AP per class :
Evolution of CNN models
• Winning CNN classification models
CNN not built over night
• 1998 vs.2012
CNN being a base model for many tasks

• More sophisticated vision tasks and others (e.g., Go)


Motivation for RNN
: Learning with sequential data
• What about sequential data?
- Ex) language model (predicting next word given past), etc.

• RNN: recurrence (Markov connection) among hidden units


- General non-linear dynamical system
- Model weights shared across all time stamps
Learning RNNs
: Back-propagation through time (BPTT)
• When unfolded, RNN is a “deep” network in time

• Forward pass, backward pass done in a usual way (back-


propagation) →→gradient averaged over the time stamps
Problems with BPTT
:Vanishing/exploding gradients
• When unfolded, depth can reach 1000s
- Delta’s get multiplied →→gradients either vanish or
explode

• Language model example →→RNN tend to predict better for


sentence 1
A variant of RNN
: Long Short-Term Memory (LSTM)
• Invented by Jurgen Schmidhuber
IDSIA,Swiss

• Vanishing/exploding gradient problem mitigated


- Long-term dependency controlled by gates and cells

[Graves,Supervised sequence labeling with recurrent neural networks,2012]


A variant of RNN
: Long Short-Term Memory (LSTM)
• Additional gates and memory cells (much more parameters)

[Graves and Schmidhuber,Framewise phoneme classification with bidirectional LSTM and other neural network
architectures,2005] [Greff et al.,LSTM:A search space odyssey,2015]
A variant of RNN
: Long Short-Term Memory (LSTM)
• Instead of simple hidden nodes, LSTM has memory block
Forward
Pass
Input gate

Forget gate

Cell

Output gate

Cell output
A variant of RNN
: Long Short-Term Memory (LSTM)
• Instead of simple hidden nodes, LSTM has memory block
Backward
Pass
Cell output

Output gate

Forget gate Grad fromt+1 States

Cells

Forget gate

Input gate
A variant of RNN
: Long Short-Term Memory (LSTM)
• Deep, bidirectional LSTM
- Multiple layers of LSTMs
- LSTMs running both directions
Deep LSTM Bidirectional LSTM(BLSTM)
Applications of RNN (LSTM)
: Speech Recognition
• 3 components
- AM : estimate phoneme probability given input waveform
- LM : estimate word probability given past word sequence
- Decoder : combine AM+LM to estimate best sentence
Speech Recognition System
corpus

Language
speech
Model (LM)
Ex) n-‐gram, DNN, …

text
Acoustic
Decoder
Model (AM) “I love you”
Ex) WFST-‐based
Ex) GMM, DNN, …
Applications of RNN (LSTM)
: Speech Recognition (acoustic model)
• BLSTM takes entire speech for recognition at time t
- Long-term memory can improve the accuracy!

GMM / DNN BidirectionalLSTM


• Use finite-size window • Use entire sequence
- model complexity - model complexity fixed with
exponential in window size sequence length
‘a ‘a
’ ’

t t
Applications of RNN (LSTM)
: Speech Recognition (acoustic model)
• TIMIT: standard benchmark for phoneme recognition
- 3.5 hours (small set)
Phone Error Rate (PER) on TIMIT
40
Start of Start of using
35 using DNN! DBLSTM!
(20.7%) (18.0%) • Similar result in much
30 larger set (>2000hours)
with large vocabulary
(%)

SAIT
25 DBLSTM+ as well!
RNNDrop
(16.3%)
20 • LSTM-based LM also
gives significant
15 performance boost!

10
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2013 2014
[Graves et.al., Speech recognition with deep recurrent neural networks,2013]
[Hannun et.al.,Deep speech:Scaling up end-to-end speech recognition,2014]
Applications of RNN (LSTM)
: MachineTranslation
• Statistical Machine Translation (SMT)
- Statistically estimate target sentence from source sentence
- Challenge: word order difference, one-to-many
→→Find (stochastic) mapping between sentences
SMT for English →→French
Applications of RNN (LSTM)
: MachineTranslation
• Neural Machine Translation : LSTM plays a central role
• Main idea: Use Encoder-Decoder idea
- ENC:find a representation of source
- DEC:generate a translation with encoded representation
Target sentence

Encoder LSTM

Decoder
LSTM
Source sentence Encoded representation
v of source sentence

[Cho et.al., Learning phrase representations using RNN encoder-decoder for statistical machine translation,2014]
[Sutskever et.al.,Sequence to sequence learning with neural networks,2014]
Advanced applications
: Image captioning
• “Translate” image to text
– Same principle as machine translation
– Combine CNN(encoder) + RNN/LSTM (decoder)

[Karpathy and Fei-Fei L.,Deep visual-semantic alignments for generating image descriptions,2015] @ Stanford
[Donahue et.al.,Long-term recurrent convolutional networks for visual recognition and description,2015] @ UC Berkeley
[Vinyals et.al.,Show and tell:A neural image caption generator,2015] @ Google
[Mao et.al.,Explain images with multimodal recurrent neural networks,2015] @ Baidu & UCLA
[Kiros et.al.,Unifying visual-semantic embeddings with multimodal neural language models,2015] @ U.Toronto
RNN summary
• Flexible for applications involving sequential data

Vanilla Image Sentiment Machine Speech


NN Captioning Classification Translation Recognition /
Video
classification
Open source tools
• Caffe
- Maintained by UC Berkeley BLVC

• Theano
- Maintained by University of Montreal
- Strong Python integration

• Torch
- Maintained by NYU,Facebook
- Based on Lua

• Tensorflow
- Maintained by Google (most recent)

You might also like