0% found this document useful (0 votes)
11 views36 pages

Week 1 CS826 - Review and Getting Started With Neural Networks

Uploaded by

SarthakSharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views36 pages

Week 1 CS826 - Review and Getting Started With Neural Networks

Uploaded by

SarthakSharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CS826 Deep Learning Theory and Practice

Getting Started with Neural Networks


Edited by Nur Naim, 20/9/22
 Understand the fundamental of deep learning, the general concept
 Able to discuss the current trending about deep learning and provide solutions
and opinions regarding deep learning
 Understand the mathematical steps
 Understand the overall architecture
 Use Google Colab or other python IDEs or code editors.
Since 1980s: Form of models hasn’t changed
much,
but lots of new tricks…
– More hidden units
– Better (online) optimization
– New nonlinear functions (ReLUs)
– Faster computers (CPUs and GPUs)
 Not enough training data
 Poor Quality of data
 Irrelevant Features
 Nonrepresentative training data
 Overfitting and Underfitting
9
Even for a basic Neural Network, there are many design
decisions to make:
1. # of hidden layers (depth)
2. # of units per hidden layer (width)
3. Type of activation function (nonlinearity)
4. Form of objective function

11
Inputs
.6 Output
Age 34 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2

Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
Source: © Eric Xing @ CMU, 2006-2011 12
Inputs
.6 Output
Age 34
.5 0.6
.1
Gender 2 S
.7 .8 “Probability of
beingAlive”
Stage 4

Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 13
Inputs
Output
Age 34
.2 .5
0.6
Gender 2 .3
S
“Probability of
.8
beingAlive”
Stage 4 .2

Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 14
Inputs
.6 Output
Age 34
.2 .5
.1 0.6
Gender 1 .3
S
.7 “Probability of
.8
beingAlive”
Stage 4 .2

Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 15
 Independent variable = input variable
 Dependent variable = output variable
 Coefficients = “weights”
 Estimates = “targets”
 Logistic Regression Model (the sigmoid unit)
Inputs Output
Age 34
5
0.6
Gender 1 4
S “Probability of
beingAlive”
4 8
Stage

Independent variables Coefficients Dependent variable


© Eric Xing @ CMU, 2006-2011
x1, x2, x3 a, b, c 16
Prediction
Decision Functions

Output


Hidden Layer


Input

17
 0 hidden layers: linear classifier
 Hyperplanes

x1 x2

Example from to Eric Postma via Jason Eisner 18


 1 hidden layer
 Boundary of convex region (open or closed)

x1 x2

Example from to Eric Postma via Jason Eisner 19


y  2 hidden layers
 Combinations of convex regions

x1 x2

Example from to Eric Postma via Jason Eisner 20


 We don’t know the “right”
levels of abstraction
 So let the model figure it
out!

Face Recognition:
 Deep Network can build
up increasingly higher
levels of abstraction
 Lines, parts, regions

21
Example from Honglak Lee (NIPS 2010)
Output


Hidden Layer 3


Hidden Layer 2


Hidden Layer 1


Input

22
Example from Honglak Lee (NIPS 2010)
Neural Network with
sigmoid activation functions

Output


Hidden Layer


Input

23
Neural Network with
arbitrary nonlinear
activation functions

Output


Hidden Layer


Input

24
Sigmoid / Logistic
So far, we’ve assumed that the
Function activation function
1
logistic(u) º -u
(nonlinearity) is always the
1+ e sigmoid function…

25
 A new change: modifying the nonlinearity
 The logistic is not widely used in modern ANNs

Alternate 1:
tanh

Like logistic function but


shifted to range [-1, +1]

Slide from William Cohen


 A new change: modifying the nonlinearity
 ReLU often used in vision tasks

Alternate 2: rectified linear unit

Linear with a cutoff at zero

(Implementation: clip the


gradient when you pass zero)

Slide from William Cohen


 Regression:
 Use the same objective as linear regression
 Quadratic loss (i.e. mean squared)

 Classification:
 Use the error same objective as logistic regression
 Cross-entropy (i.e. negative log likelihood)
 This requires probabilities, so we add an additional “softmax” layer at
the end of our network

28
Output …

Hidden Layer …

Input …
29
Softmax:


Output


Hidden Layer


Input

30
31
 https://www.slideshare.net/databricks/introduction-to-neural-networks-122033415
 https://www.cs.wmich.edu/~elise/courses/cs6800/Neural-Networks.ppt
 https://www.cs.cmu.edu/~mgormley/courses/10601b-f16/lectureSlides/lecture15-
neural-nets.pptx
 Google (images) – deep learning, why deep learning now, applications.

You might also like