0% found this document useful (0 votes)

151 views25 pages

CS771: Introduction To Machine Learning Piyush Rai

1) Deep learning uses neural networks with multiple layers between the input and output layers to learn nonlinear and complex patterns in data. 2) A multi-layer perceptron (MLP) consists of an input layer, one or more hidden layers, and an output layer, where each node in one layer connects to nodes in adjacent layers. 3) The hidden layers allow the network to learn nonlinear decision boundaries by applying nonlinear activation functions like sigmoid, tanh, or ReLU to the outputs of linear combinations of inputs at each node.

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views25 pages

CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Introduction to Deep Learning (1)

CS771: Introduction to Machine Learning

Piyush Rai
2
Limitation of Linear Models
 Linear models: Output produced by taking a linear combination of input features
Linear regression,
logistic regression, SVM,
etc

Some monotonic function

(e.g., sigmoid)

 This basic architecture is classically also known as the “Perceptron” (not to be

confused with the Perceptron “algorithm”, which learns a linear classification
Althoughmodel)
can kernelize
to make them nonlinear

 This can’t however learn nonlinear functions or nonlinear decision boundaries

CS771: Intro to ML
3
Neural Networks: Multi-layer Perceptron (MLP)
 An MLP consists of an input layer, an output layer, and one or more hidden layers

Output Layer
(with a scalar-valued output)
Hidden layer Learnable
units/nodes act as new
features
weights
Hidden Layer
(with K=2 hidden units)

Can think of this model as

a combination of two
predictions and of two
simpler models

The effective to
Input Layer mapping is nonlinear
(with D=3 visible units) (will see justification
shortly)

CS771: Intro to ML
4
Illustration: Neural Net with One Hidden Layer
 Each input transformed into several pre- Can even be identity
activations using linear models (e.g., for regression yn = sn )

 Nonlinear activation applied on each pre-act.

 Linear model learned on the new “features”

 Finally, output is produced as

 Unknowns () learned by minimizing some
loss function, for example )
(squared, logistic, softmax, etc)
CS771: Intro to ML
Will denote a linear
5
Neural Nets: A Compact Illustration combination of inputs
followed by a nonlinear
operation on the result

 Note: Hidden layer pre-act and post-act will be shown together for brevity

Will directly
show the final
output

Will combine pre-act and post-act and

directly show only to denote the value
Single computed by a hidden node
Hidden More succinctly..
Layer

 Different layers may use different non-linear activations. Output layer may have
none. CS771: Intro to ML
6
Activation Functions: Some Common Choices
sigmoid tanh
Preferred more than
For sigmoid as well as tanh, sigmoid. Helps keep the
gradients saturate (become mean of the next layer’s
close to zero as the function
h h inputs close to zero (with
tends to its extreme values) sigmoid, it is close to 0.5)

a a

Leaky ReLU
ReLU
ReLU and Leaky
Helps fix the dead ReLU are among the
neuron problem of most popular ones
h ReLU when is a Without
negative number nonlinear
activation, a
deep neural
a network is
equivalent to a
linear model no
matter how
many layers we
CS771: Intro to ML
use
7
MLP Can Learn Nonlin. Fn: A Brief Justification
 An MLP can be seen as a composition of multiple linear models combined
High-score in the middle and
nonlinearly Obtained by composing the two one-
sided increasing score functions score
low-score on either of the two
sides of it. Exactly what we want
Score monotonically (using = 1, and = -1 to “flip” the for the given classification
second one before adding). This can problem
increases. One-sided increase
(not ideal for learning now learn nonlinear decision
nonlinear decision boundary
boundaries) score score
score

A nonlinear
classification
problem

Standard Single “Perceptron” Classifier (no hidden units)

A single hidden layer MLP with sufficiently
A Multi-layer Perceptron Classifier
large number of hidden units can approximate
(one hidden layer with 2 units)
any function (Hornik, 1991) CS771:
Capable of learning nonlinear boundaries Intro to ML
8

Examples of some basic NN/MLP architectures

CS771: Intro to ML
9
Single Hidden Layer and Single Outputs
 One hidden layer with nodes and a single output (e.g., scalar-valued regression or
binary classification)

CS771: Intro to ML
10
Single Hidden Layer and Multiple Outputs
 One hidden layer with nodes and a vector of output (e.g., vector-valued regression
or multi-class classification or multi-label classification)

CS771: Intro to ML
11
Multiple Hidden Layers (One/Multiple Outputs)
 Most general case: Multiple hidden layers with (with same or different number of
hidden nodes in each) and a scalar or vector-valued output

CS771: Intro to ML
12
Neural Nets are Feature Learners
 Hidden layers can be seen as learning a feature rep. for each input

The last hidden

layer’s values

CS771: Intro to ML
13
Kernel Methods vs Neural Nets
 Recall the prediction rule for a kernel method (e.g., kernel SVM)

 This is analogous to a single hidden layer NN with fixed/pre-defined hidden nodes

and output weights
 The prediction rule for a deep neural network Also note that neural nets are faster
than kernel methods at test time
since kernel methods need to store
the training examples at test time
whereas neural nets do not

 Here, the ’s are learned from data (possibly after multiple layers of nonlinear
transformations)
 Both kernel methods and deep NNs be seen as using nonlinear basis functions for
making predictions. Kernel methods use fixed basis functions (defined by the
kernel) whereas NN learns the basis functions adaptively from data CS771: Intro to ML
14
Feature Learned by a Neural Network
 Node values in each hidden layer tell us how much a “learned” feature is active in
 Hidden layer weights are like pattern/feature-detector/filter

All the incoming weights (a vector) on this hidden node

can be seen as representing a pattern/feature-detector/filter
Here, denotes a -dim
template/pattern/feature-detector
All the incoming weights (a vector) on this hidden node
can be seen as representing a template/pattern/feature-
detector
Here, denotes a D-dim
pattern/feature-detector/filter

CS771: Intro to ML
15
Why Neural Networks Work Better: Another View
 Linear models tend to only learn the “average” pattern
 Deep models can learn multiple patterns (each hidden node can learn one pattern)
 Thus deep models can learn to capture more subtle variations that a simpler linear model

CS771: Intro to ML
16
Backpropagation
 Backpropagation = Gradient descent using chain rule of derivatives
 Chain rule of derivatives: Example, if and then
Start taking the derivatives of the
loss function w.r.t. params of the
last layer and then proceed
backwards
Reuse already calculated gradients
computed by the previous layer Can reuse previous derivative
computations due to the recursive
nature of the neural net architecture

CS771: Intro to ML
17
Backpropagation through an example

CS771: Intro to ML
18
Backpropagation
Computes loss using current
 Backprop iterates between a forward pass and a backward pass values of the parameters

Computes the gradient of

the loss, starting with
params in the last layer and
going backwards

Backward Pass
Forward Pass

Using computational
graphs

 Software frameworks such as Tensorflow and PyTorch support this already so you
don’t need to implement it by hand (so no worries of computing derivativesCS771:
etc)Intro to ML
19
Neural Nets: Some Aspects
 Much of the magic lies in the hidden layers

 Hidden layers learn and detect good features

Choosing the right NN architecture is
important and a research area in
itself. Neural Architecture Search
 Need to consider a few aspects (NAS) is an automated technique to
do this
 Number of hidden layers, number of units in each hidden layer
 Why bother about many hidden layers and not use a single
very wide hidden layer (recall Hornik’s universal function
approximator theorem)?
 Complex networks (several, very wide hidden layers) or
simpler networks (few, moderately wide hidden layers)?
 Aren’t deep neural network prone to overfitting (since they
contain a huge number of parameters)?

CS771: Intro to ML
20
Representational Power of Neural Nets
 Consider a single hidden layer neural net with hidden nodes
K=3 K=6 K = 20

 Recall that each hidden unit “adds” a function to the overall function
 Increasing (number of hidden units) will result in a more complex function
 Very large seems to overfit (see above fig). Should we instead prefer small ?
 No! It is better to use large and regularize well. Reason/justification:
 Simple NN with small will have a few local optima, some of which may be bad
 Complex NN with large K will have many local optimal, all equally good (theoretical results on
this)
 We can also use multiple hidden layers (each sufficiently large) and regularize well
CS771: Intro to ML
Various other tricks, such as
weight sharing across 21
Preventing Overfitting in Neural Nets different hidden units of the
same layer (used in
convolutional neural nets or
CNN)
 Neural nets can overfit. Many ways to avoid overfitting, such as
 Standard regularization on the weights, such as , etc ( reg. is also called weight decay)
Single Hidden Layer NN with K = 20 hidden units and L2 regularization

 Early stopping (traditionally used): Stop when validation error starts increasing
 Dropout: Randomly remove units (with some probability ) during training

Fig courtesy: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava et al, 2014) CS771: Intro to ML
22
Wide or Deep?
 While very wide single hidden layer can approx. any function, often we prefer many,
less wide, hidden layers

 Higher layers help learn more directly useful/interpretable features (also useful for
compressing data using a small number of features)
CS771: Intro to ML
23
Using a Pre-trained Network
 A deep NN already trained in some “generic” data can be useful for other tasks, e.g.,
 Feature extraction: Use a pre-trained net, remove the output layer, and use the rest of the
network as a feature extractor for a related dataset
This part of a pre-trained net can
be used as a feature extractor on
some new task

Many packages, like Tensorflow

and PyTorch provide such pre-
trained module ready to be used

Sometimes also known as “transfer

learning” in the context of neural nets

 Fine-tuning: Use a pre-trained net, use its weights as initialization to train a deep net for a new
CS771: Intro to ML
24
Deep Neural Nets: Some Comments
 Highly effective in learning good feature rep. from data in an “end-to-end” manner

 The objective functions of these models are highly non-convex

 But fast and robust non-convex opt algos exist for learning such deep networks

 Training these models is computationally very expensive

 But GPUs can help to speed up many of the computations

 Also useful for unsupervised learning problems (will see some examples)
 Autoencoders for dimensionality reduction
 Deep generative models for generating data and (unsupervisedly) learning features – examples
include generative adversarial networks (GAN) and variational auto-encoders (VAE)

CS771: Intro to ML
25
Coming up next
 Convolutional neural nets
 Neural nets for sequential data
 Neural networks for unsupervised learning and generation

CS771: Intro to ML

DL Unit 1
No ratings yet
DL Unit 1
200 pages
Time Series
100% (1)
Time Series
91 pages
Maps For Spelljammer and Light of Xaryxis
No ratings yet
Maps For Spelljammer and Light of Xaryxis
25 pages
NodeJS CDAC Mar22
No ratings yet
NodeJS CDAC Mar22
93 pages
EQP S3 Software
No ratings yet
EQP S3 Software
57 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
Gartner Market Guide For Network Detection and Response 2022
No ratings yet
Gartner Market Guide For Network Detection and Response 2022
13 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
Unit 4
100% (1)
Unit 4
57 pages
Module I
No ratings yet
Module I
109 pages
Allama Iqbal Open University, Islamabad Warning
No ratings yet
Allama Iqbal Open University, Islamabad Warning
2 pages
771 A18 Lec21
No ratings yet
771 A18 Lec21
109 pages
CATERPILLAR - Individual Reflective Report
100% (1)
CATERPILLAR - Individual Reflective Report
12 pages
CX Programer-Help-OMRON FB Library Reference Ver.090612: CPU Position Controller
No ratings yet
CX Programer-Help-OMRON FB Library Reference Ver.090612: CPU Position Controller
31 pages
SQL Joins Interview Questions: Click Here
No ratings yet
SQL Joins Interview Questions: Click Here
34 pages
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
No ratings yet
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
22 pages
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
0% (1)
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
1 page
Junit Tutorial
100% (1)
Junit Tutorial
59 pages
AnycubicSlicer - Usage Instructions - V1.0 - EN
100% (1)
AnycubicSlicer - Usage Instructions - V1.0 - EN
16 pages
VJ628D Service Manual
No ratings yet
VJ628D Service Manual
423 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
CV 3
No ratings yet
CV 3
159 pages
Livro 4 - Deep-Learning
No ratings yet
Livro 4 - Deep-Learning
271 pages
Numpy Interview Questions: Click Here
No ratings yet
Numpy Interview Questions: Click Here
32 pages
Lecture 26
No ratings yet
Lecture 26
17 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learning Sem
No ratings yet
Deep Learning Sem
128 pages
20210501-ML Question Bank
No ratings yet
20210501-ML Question Bank
1 page
Introduction To ANN
No ratings yet
Introduction To ANN
107 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Lecture 02 - Warming-Up and Data and Features - Plain
No ratings yet
Lecture 02 - Warming-Up and Data and Features - Plain
23 pages
Unit I
No ratings yet
Unit I
90 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
Structured Query Language (SQL) : Textbook Reference Database Management Systems: Chapter 5
No ratings yet
Structured Query Language (SQL) : Textbook Reference Database Management Systems: Chapter 5
146 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
DL - Unit II
No ratings yet
DL - Unit II
78 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
8625 De3121 Bba
No ratings yet
8625 De3121 Bba
92 pages
Deep - Learning
No ratings yet
Deep - Learning
49 pages
Cracking Wifi Passwords Using Aircrack-Ng Using A Target-Specific Custom Wordlist Generated by Us
No ratings yet
Cracking Wifi Passwords Using Aircrack-Ng Using A Target-Specific Custom Wordlist Generated by Us
9 pages
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
24 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
HUAWEI IdeaHub S2 Must-See Tips
No ratings yet
HUAWEI IdeaHub S2 Must-See Tips
50 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
771 A18 Lec20
No ratings yet
771 A18 Lec20
107 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
Deep Learning
No ratings yet
Deep Learning
59 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
MLT Unit 4 and 5 Part 2
No ratings yet
MLT Unit 4 and 5 Part 2
34 pages
Supervised Learning Network Introduction: Unit 2
No ratings yet
Supervised Learning Network Introduction: Unit 2
52 pages
Lec 05
No ratings yet
Lec 05
46 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Uncertainity Quantification
No ratings yet
Uncertainity Quantification
88 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Sigma Personal Voice Assistance Mid - Defence - Report
No ratings yet
Sigma Personal Voice Assistance Mid - Defence - Report
27 pages
Warhammer Fantasy Roleplay Lustria 4th Edition Edition Cubicle 7 Entertainment LTD - The Ebook in PDF and DOCX Formats Is Ready For Download Now
No ratings yet
Warhammer Fantasy Roleplay Lustria 4th Edition Edition Cubicle 7 Entertainment LTD - The Ebook in PDF and DOCX Formats Is Ready For Download Now
14 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
SF Dump
No ratings yet
SF Dump
20 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
PSI - ATLAS Cloud System Requirements V8.0
No ratings yet
PSI - ATLAS Cloud System Requirements V8.0
50 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Chapter 4 2025
No ratings yet
Chapter 4 2025
19 pages
Unit 3 Self Made
No ratings yet
Unit 3 Self Made
23 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
22PCOAM16 - Machine Learning - Session 8 Multi Layer Perceptions
No ratings yet
22PCOAM16 - Machine Learning - Session 8 Multi Layer Perceptions
12 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Unit 5
No ratings yet
Unit 5
61 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Lecture07. ANN (Chapter 10-2)
No ratings yet
Lecture07. ANN (Chapter 10-2)
26 pages
Lec 26
No ratings yet
Lec 26
16 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
31 pages
Lecture 21 and 22
No ratings yet
Lecture 21 and 22
28 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Development of E-Learning Models in Database System Courses
No ratings yet
Development of E-Learning Models in Database System Courses
4 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
GDG Sof Week 2
No ratings yet
GDG Sof Week 2
11 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Lecture 23
No ratings yet
Lecture 23
15 pages
The Effects of Technology On Society: Katie Christianson - Josiah Lenthe - Alex Cole
No ratings yet
The Effects of Technology On Society: Katie Christianson - Josiah Lenthe - Alex Cole
30 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Iot (Internet of Things) : Connect The Things, Shrink The World
No ratings yet
Iot (Internet of Things) : Connect The Things, Shrink The World
26 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
Windows Basic Notes December 2024
No ratings yet
Windows Basic Notes December 2024
3 pages
Artificial Intelligence Interview Questions: Click Here
No ratings yet
Artificial Intelligence Interview Questions: Click Here
44 pages
Code No. M1: Series: SA01
No ratings yet
Code No. M1: Series: SA01
9 pages
SSN Project Report PDF
No ratings yet
SSN Project Report PDF
27 pages
08.time Series
No ratings yet
08.time Series
1 page
MME 52106 - Optimization in Matlab - NN Toolbox
No ratings yet
MME 52106 - Optimization in Matlab - NN Toolbox
14 pages
LTS Load Transfer Switch User Manual
No ratings yet
LTS Load Transfer Switch User Manual
21 pages
Chapter3 Gaining Efficiencies
No ratings yet
Chapter3 Gaining Efficiencies
6 pages
Device Management in Operating System
No ratings yet
Device Management in Operating System
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Chapter1-Foundations For Efficiencies
No ratings yet
Chapter1-Foundations For Efficiencies
5 pages
Zoom Phone - Bring Your Own Carrier
No ratings yet
Zoom Phone - Bring Your Own Carrier
3 pages
LPD8 Editor User Guide: To Download and Install The Editor Software
No ratings yet
LPD8 Editor User Guide: To Download and Install The Editor Software
2 pages
Importance of Software Testing in Software Development Life Cycle
No ratings yet
Importance of Software Testing in Software Development Life Cycle
4 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Introduction to Deep Learning (1)

CS771: Introduction to Machine Learning

Some monotonic function

 This basic architecture is classically also known as the “Perceptron” (not to be

 This can’t however learn nonlinear functions or nonlinear decision boundaries

Can think of this model as

 Nonlinear activation applied on each pre-act.

 Linear model learned on the new “features”

 Finally, output is produced as

Will combine pre-act and post-act and

Standard Single “Perceptron” Classifier (no hidden units)

Examples of some basic NN/MLP architectures

The last hidden

 This is analogous to a single hidden layer NN with fixed/pre-defined hidden nodes

All the incoming weights (a vector) on this hidden node

Computes the gradient of

 Hidden layers learn and detect good features

Many packages, like Tensorflow

Sometimes also known as “transfer

 The objective functions of these models are highly non-convex

 Training these models is computationally very expensive

You might also like