Machine Learning Deep Learning Overview AIST
Machine Learning Deep Learning Overview AIST
Deep learning
Hoang Van Nam
MICA Institute - HUST
Agenda
• Introduction
• Machine Learning
• Deep Learning
• CNNs
• Discussion
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introduction
Artificial Intelligence (AI)
• What is Artificial Intelligence (AI)?
• Using computers to solveproblems
• Or make automated decisions
• For tasks that, when done by humans,
• Typically require intelligence
Timeline Of Intelligent Machines
Machine Playing Checker Stanford Cart Deep Blue Beats Google NN recognizing DeepMind Wins Go
(Author Samuel)
Kasparov cat in Youtube
1950 1952 1957 1979 1986 1997 2011 2012 2014 2016
Limits of Artificial Intelligence
• “Strong” Artificial Intelligence ✘
• Computers thinking at a level that meets or surpassespeople
• Computers engaging in abstract reasoning & thinking
• This is not what we have today
• There is no evidence that we are close to Strong AI
Human
Programmer
Input Output
Age Gender Purchase Items
Date Age Gender Purchase Items
Rule 1: 15 <age< 30 Date
30 M 3/1/2017 Toy Rule 2: Bought Toy=Y, Last
30 M 3/1/2017 Toy
Purchase<30 days
40 M 1/3/2017 Books Rule 3: Gender = ‘M’, Bought
…. …… ….. …..
Toy =‘Y’
…. …… ….. ….. Rule 4: ……..
Rule 5: ……..
Scalability
Problem with Hand Adaptability
Designed Rules
Closed Loop
Option 2 - Learn The Business Rules From Data
Input - New Unseen Data
Age Gender Items
35 F
39 M Toy
Output
Historical Purchase Data
(Training Data)
Option 2 - Learn The Business Rules From Data
Input - New Unseen Data
Age Gender Items
35 F
39 M Toy
X Y
Output
Historical Purchase Data f(X)=Y’
Y’~Y
(Training Data)
Machine learning as programming
We Call This Approach Machine Learning
Why Use Machine Learning?
• Use ML when you can’t code it
• Complex tasks where deterministic solution don’t suffice
• E.g. Recognizing speech/images
• Use ML when you can’t scale it
• Replace repetitive tasks needing human like expertise
• E.g Recommendations, spam, fraud detection, machine translation.
• Use ML when you have to adapt/personalize
• E.g. Recommendation and personalization
It is a cat.
Supervised Learning – How Machine Learn
Human intervention and validation required
e.g. Photo classification and tagging
Training Data Adjust Model
Input
Machine
?
Learning Prediction Label
Algorithm
Machine
Input Learning Prediction
Algorithm
Literature review onML
Learning hierarchical
representations
Bellman-1957: through deep SL,UL,
Dynamic Baum- Dempster- RL-Good Old-
Programming 1966:HMM Fashioned Artificial SVM Kernel-SVM
1977:EM
Intelligence
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Training – Split training data
70% 30%
Training Data
Model Training – Training w/ training data
70% 30%
Trial
Training Data Training Model
Model Training – Split the test data
Trial
Training Data Training Model
Model Training – Model evaluation
Trial
Training Data Training Model
Evaluation
Result
Model Training - PerformanceMeasurement
Trial
Training Data Training Model
Accuracy
Evaluation
Result
Model Training - PerformanceMeasurement
Deep Learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Deep Learning?
• Deep Learning is a subfield of machine learning
concerned with algorithms inspired by the structure
and function of the brain called artificial neural
networks.
Traditional Machine
Learning Algorithms
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data
Sample Deep Learning Use Cases
Programming
Models Data
natural language
processing
Input: X0 w0
Vector of training data x
Output: X1 w1
Linear function of input Neuron
Nonlinearity: Inputs w2 Output
⟨w, x⟩ !
Transform output into desired X2
range of value wn
…
Training
Learn the weights and bias b
Xn
by minimize loss
f(x) = ! (⟨w, x⟩ + b)
Human Brain Neuron
Inputs Output
Neural Network
X0 w1 0 Neuron 0 Neuron 0
w11
……
……
……
Output
Neuron
w1 2
Xn w1 3 Neuron n Neuron n
X0 w1 0 Neuron 0 Neuron 0
w11
……
5
……
……
Output
Input Neuron
w1 2
Xn w1 3 Neuron n Neuron n
……
……
……
Output
Input
?
Neuron
w12
Label
4
Xn w13 Neuron n Neuron n
Error/Loss Error/Loss
Error/Loss
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
Neural Network – Backpropagation
Update
weights
W1’0
X0 Neuron 0 Neuron 0
W1’1 Error/Loss Error/Loss
……
……
……
Output
Input
W1’2
?
Neuron
W1’3 Label
4
Xn Neuron n Neuron n
Error/Loss Error/Loss
Error/Loss
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
NNs & DL: Neural Networks
Naming conventions:
◦ N-Layer: not include input layer
◦ “Artificial Neural Networks” (ANN) or “Multi-Layer Perceptrons” (MLP)
Output layer: normally don’t have activation function (linear identity activation
function) – output score (0-1)
47
Sizing neural networks: number of neurons, or more commonly the number
of parameters.
DL: NN-like model with many such stages
NNs & DL: Neural Networks Variations
NNs & DL:Neural Networks drawbacks
◦ Local maximum: who is afraid of non-convex loss functions – YannLecun
◦ Unsupervised learning
◦ No-memory networks: recurrent net, LSTM
◦ Computational consumption on conv-layer: GPU
◦ Memory bottleneck
◦ Network compression (Squeeze Net)
◦ Model re-designing
NNs & DL: DL vs Traditional
NNs & DL: DL vs Traditional
NNs & DL:Use cases
1.Data security: malware prediction, detect abnormal data accession behaviour
2.Personal security: speed up, spot things human screeners miss
3. Financial trading: stock market prediction
4.Healthcare: cancer prediction
5.Marketing personalization: target audiences
6. Fraud detection: spot potential cases of fraud
7.Recommendations: Amazon, Netflix
8.Online Search
9. NLP
10.Smart Cars And so on...
NNs & DL: DL vs Traditional
NNs & DL: Number and size of layers
◦ Ratio of weights:updates
◦ First-layer Visualizations
◦ Choose solver
http://cs231n.github.io/neural-networks-3/#loss
NNs & DL: Trainingnetworks
Introduction to CNNs: How brain’s visual system works
Introduction to CNNs: How brain’s visual system works
Introduction to CNNs: Neural networks
Neural networks
Introductionto CNNs: Image Convolution
Introductionto CNNs: ConvolutionLayer
Convolution as a neural layer:
◦ Goals: not to use predefined kernels, but instead to learndata-specific
kernels.
Introductionto CNNs: ConvolutionLayer
n Convolutional layers are locallyconnected
u A filter/kernel/window slides on the image or the
previous map
u The position of the filter explicitlyprovides
information for localizing
n Convolutional layers share weightsspatially:
translation-invariant
u Translation-invariant: a translated region will produce
the same response at the correspondingly translated
position
u A local pattern’s convolutional response can be re-
used by different candidate regions
n Convolutional layers can be applied toimages of
any sizes, yielding proportionally-sizedoutputs
Convolution: Principle
Cross-correlation: computing a series of dot-products and putting them
into an output vector
Convolution: similar with cross-correlation but flipping the kernel
Key feature: shift-invariant , linear –>simple
Different
◦ Convolution: associative properties F*(GI) = (FG)*I
◦ Cross-correlation: match a template to an image
How tocalculate Convolution
Complexity: O(w*h*Fw*Fh)
67
Introduction to CNNs: HOG by Convolutional Layers
70
CNNs: FC and ReLU layers
As seen in regular network
Have full connections to all activations in the previous layer.
n Common use: Predict a label
n Convert FC-CONV
n Example: K=4096 input: 7×7×512 can be equivalently expressed as a
CONVlayer with F=7,P=0,S=1,K=4096
n Filter size is exactly the size of the input volume, output will simply be
1×1×4096
ReLU(Rectified LinearUnits):
◦ Apply the non-saturating activation function f(x)=max(0,x)
◦ Increases the nonlinear properties of the decision function without affecting the
receptive fields of the convolutionlayer.
Introductionto CNNs: ConvolutionNeural Network
72
CNNs: Learning
CNNs: Transferlearning
Very few people train from scratch (randominit)
◦ Rare dataset has sufficient size comparing to Imagenet (1.2 mil, 1000 classes)
◦ Training Imagenet takes 2-3 weeks using modern GPU(eg: Titan X)
VGGNet from Karen Simonyan and Andrew Zisserman: The runner-up in ILSVRC
2014.
◦ Showing that the depth of the network is a critical component for good performance.
◦ Two well-known architectures: VGG-16, VGG-19
79
CNNs: Applications
CNNs: Applications
CNNs: Applications
CNNs: Applications
CNNs: Applications
References
1.CS231n Convolutional Neural Networks for Visual Recognition –
Standford University
2.Jürgen Schmidhuber 2005 - Deep learning in neural networks:An
overview
3. Yann Le Cun - Unsupervised Learning: The Next Frontier In AI
85
UsefulLinks
Understanding LSTM
Yolo – Realtime object detection
Visualize CNN Google
Deepdream