0% found this document useful (0 votes)
28 views24 pages

Mod 2.1,2.2

This document provides an introduction to neural networks and discusses their biological and machine learning perspectives. It describes how neural networks were originally designed to simulate biological learning in the nervous system but are now commonly used in machine learning to increase the power of models by stacking computational graphs. The document outlines the historical origins of neural networks from the perceptron model in 1958 and discusses Minsky and Papert's 1969 book that showed the perceptron's limitations. It also compares biological and artificial neural network learning methods and analyzes reasons for deep learning's recent popularity due to increased data and computational power. Finally, it describes single and multi-layer network architectures like the perceptron.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views24 pages

Mod 2.1,2.2

This document provides an introduction to neural networks and discusses their biological and machine learning perspectives. It describes how neural networks were originally designed to simulate biological learning in the nervous system but are now commonly used in machine learning to increase the power of models by stacking computational graphs. The document outlines the historical origins of neural networks from the perceptron model in 1958 and discusses Minsky and Papert's 1969 book that showed the perceptron's limitations. It also compares biological and artificial neural network learning methods and analyzes reasons for deep learning's recent popularity due to increased data and computational power. Finally, it describes single and multi-layer network architectures like the perceptron.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Charu C.

Aggarwal
I B M T J Watson Research Center
Yorktown Heights, N Y

A n Intro duction to Neural N e tworks

Neural Networks and Deep Learning, Springer, 2018


Chapter 1, Sections 1 . 1 – 1 . 2
Neural Networks: T w o Views

• A way t o simulate biological learning by simulating the


nervous system.

• A way t o increase the power of known models in machine


learning by stacking t h e m in careful ways as computational
graphs.

– T h e number of nodes in the computational graph


controls learning capacity w i t h increasing data.

– The architecture of t he computational graph


sp ecific domain-sp ecific insights ( e.g., images,
incorporates
speech).
– T h e success of deep computational graphs has led t o
the coining of the t e r m “deep learning.”
Historical Origins

• T h e fi r s t model of a computational unit was the perceptron


(195 8).

– Was roughly inspired by the biological model of a neuron.

– Was implemented using a large piece of hardware.

– Generated great excitement b u t failed t o live up t o inflated


expectations.

• Was n o t any more powerful than a simple linear model t h a t


can be implemented in a few lines of code today.
T h e Perceptro n [ I m a g e Courtesy: Smithsonian Institute]
T h e First Neural W i n t e r : Min sky and P a p e rt ’s B o o k

• Minsky and Pape r t ’s B o ok ” Perceptrons” ( 1 9 6 9 )


showed t h a t the perceptron only had limited expressive
power.

– Essential t o p u t together multiple computational units.


T h e Biological Inspiration

• Neural n e tworks were originally designed t o simulate


t he learning process in biological organisms.

• T h e human nervous system contains cells called neurons.

• T h e neurons are connected t o one another w i t h the use


of A x o n s a n d d e n d r i t e s , a n d t h e c o n n e c t i n g r e g i o n s
between axons and dendrites are referred to as
synapses.

– The strengths of synaptic connections often


change in response t o external stimuli.

– T h i s change causes learning in living organisms.


Neural Networks: T h e Biological Inspiration

( a ) Biological neural network ( b ) Artificial neural network

• Neural networks contain co mputat ion units ⇒ Neurons.


• An artificial neural network computes a function of the inputs by propagating
the computed values from the input neurons to the output neuron(s) and
using the weights as intermediate parameters.
• T h e computational units are connected t o one another through
weights ⇒ Strengths of synaptic connections in biological organisms.
• Learning occurs by changing the weights connecting the neurons.
• Each input t o a neuron is scaled w i t h a weight, which a ff ects the
function computed a t t h a t unit.
Learning in Biological vs Artificial Networks
• In living organisms, synaptic weights change in response t o external
stimuli.
– A n unpleasant experience will change the synaptic weights of an
organism, which will train the organism t o behave di ff erently.
• In artificial neural networks, the weights are learned w i t h the use of
training data, which are in p u t - o u t p u t pairs (e.g., im- ages and their
labels).

– An error made in predicting the label of an image is the


unpleasant “stimulus” that changes the weights of the neural
network.
– By successively adjusting the weights between neurons over
many input-output pairs, the function computed by the neural
network is refined over time so that it provides more accurate
predictions
– When trained over many images, the network learns to classify
images correctly.
– This ability to accurately compute functions of unseen inputs by
training over a finite set of input-output pairs is referred to as
model generalization.
M a c h i n e Learning versus D e e p Learning

DEEP LEARNING

ACCURACY
CONVENTIONAL
MACHINE LEARNING

AMOUNT OF DATA

• For smaller data sets, traditional machine learning


methods often provide slightly better performance.

• Traditional models often provide more choices,


interpretable insights, and ways t o handcraft features.

• For larger data sets, deep learning methods tend t o


dominate.
Reasons for R e c e n t Popularity

• The recent success of neural ne tworks has b een


an increase
caused by in data and computational power.

– Increased computational p ower has the cycle


reduced times for experimentation.

– If i t requires a m o n t h t o train a network, one cannot t r y


more than 12 variations in an year on a single platform.

– Reduced cycle times have also led t o a larger number of


successful tweaks of neural networks in recent years.

– M o s t of the models have n o t changed dramatically f r o m


an era where neural networks were seen as impractical.

• W e are now operating in a data and computational regime


where deep learning has become attractive compared t o t ra -
ditional machine learning.
Single L ayer Net works: T he Perceptron

Neural Networks and Deep Learning, Springer, 2018


Chapter 1, Section 1.3
Perceptron
• In the single layer network, a set of inputs is directly mapped to
an output by using a generalized variation of a linear function.

• This simple instantiation of a neural network is referred to as the


perceptron .
• In multi-layer neural networks, the neurons are arranged in
layered fashion, in which the input and output layers are
separated by a group of hidden layers.
• This layer-wise architecture of the neural network is also
referred to as a feed-forward network.
Binary Classification and Linear Regression Problems

• In the binary classification problem, each training pair ( X ,


y ) contains feature variables X = ( x 1 , . . . x d ) , and label y
drawn f r o m { − 1 , + 1 } ( Observed value of binary class
variables- given to us as a part of the training data).

– Example: Feature variables m i g h t be frequencies of


words in an email, and the class variable m i g h t be an
indicator of spam.

– Given labeled emails, recognize incoming spam.

• In linear regression, the dependent variable y is real-valued.

– Feature variables are frequencies of words in a Web page,


and the dependent variable is a prediction of the number
of accesses in a fixed period.

• Perceptron is designed for the binary setting.


T h e Perceptron: Earliest Historical Architecture

INPUT NODES
x1
w1
x2 OUTPUT NODE
w2

x3 w3
∑ y
w4
x4
w5
x5

• T h e d nodes in the input layer only transmit the d features


X = [ x 1 . . . x d ] w i t h o u t performing any computation.

• O u t p u t node multiplies input w i t h weights W= [w 1 . . . w d ]


on incoming edges, aggregates them, and applies sign
activation:

• The sign function maps a real value to either +1 or −1,


which is appropriate for binary classification.
W h a t is t h e Perceptron D o in g ?

• Tries t o fi n d a linear separator W · X = 0 between the t w o


classes.

• Ideally, all positive instances ( y = 1 ) should be on the


side of the separator satisfying W · X > 0.

• All negative instances ( y = − 1 ) should be on the side of the


separator satisfying W · X < 0.
Bias Neurons
INPUT NODES
x1
w1
x2 w2 OUTPUT NODE

x3 w3
∑ y
w4
x4
w5
b
x5
+1 BIAS NEURON

• In many settings (e.g., skewed class distribution) we need


an invariant part of the prediction w i t h bias variable b:

• O n setting w d + 1 = b and x d + 1 as the input f r o m the bias


neuron, i t makes little di ff erence t o learning procedures ⇒
O f t e n implicit in architectural diagrams
Multilayer Neural Networks
INPUT LAYER
x1
HIDDEN LAYER
x2

OUTPUT LAYER
x3 y

x4

x5

• In multilayer networks, the o u t p u t of a node can feed into


other hidden nodes, which in t u r n can feed into other
hidden or o u t p u t nodes.

• Multilayer neural networks are always directed acyclic graphs


and usually arranged in layerwise fashion.
Multilayer Neural Networks
INPUT LAYER
x1
HIDDEN LAYER
x2

OUTPUT LAYER
x3 y

x4

x5

• T h e layers between the input and o u t p u t are referred t o as


hidden because they perform intermediate computations.

• Each hidden node uses a combination of a linear transforma-


tion and an activation function Φ ( · ) (like the o u t p u t node
of the perceptron).

• T h e use of nonlinear activation functions in the hidden layer


is crucial in increasing learning capacity.
Role of Hidden Layers

• Nonlinear hidden layers perform the role of hierarchical fea-


ture engineering.

– Early layers learn primitive features and later layers learn


more complex features

– Image data: Early layers learn elementary edges, the mid-


dle layers contain complex features like honeycombs, and
later layers contain complex features like a part of a face.

– Deep learners are masters of feature engineering.

• T h e final o u t p u t layer is often able t o perform inference w i t h


transformed features in penultimate layer relatively easily.

• P e rcept ron: Cannot classify linearly inseparable data


but can do so w i t h nonlinear hidden layers.
Training a Neural Network with
Backpropagation

 Backpropagation is one of the important concepts of a neural network.


 Our task is to classify our data best.
 For this, we have to update the weights of parameter and bias, In the
linear regression model, we use gradient descent to optimize the
parameter.

 Similarly here we also use gradient descent algorithm using


Backpropagation.
 For a single training example, Backpropagation algorithm calculates
the gradient of the error function.
Training a Neural Network with Backpropagation

The backpropagation algorithm contains two main phases, referred to as the forward and
backward phases, respectively.

 
1. Forward phase: In this phase, the inputs for a training instance are fed into the neural
network. This results in a forward cascade of computations across the layers, using the
current set of weights. The final predicted output can be compared to that of the training
instance and the derivative of the loss function with respect to the output is computed. The
derivative of this loss now needs to be computed with respect to the weights in all layers in
the backwards phase.

2. Backward phase: The main goal of the backward phase is to learn the gradient of the loss
function with respect to the different weights by using the chain rule of di fferential
calculus. These gradients are used to update the weights. Since these gradients are learned
in the backward direction, starting from the output node, this learning process is referred to
as the backward phase.
Training a Neural Network
with Backpropagation
 In the single-layer neural network, the training process is relatively straightforward
because the error (or loss function) can be computed as a direct function of the
weights, which allows easy gradient computation.
 In the case of multi-layer networks, the problem is that the loss is a complicated
composition function of the weights in earlier layers. The gradient of a composition
function is computed using the backpropagation algorithm.

 The backpropagation algorithm leverages the chain rule of differential calculus, which
computes the error gradients in terms of summations of local-gradient products over
the various paths from a node to the output.
 Backpropagation algorithms are a set of methods used to efficiently train artificial
neural networks following a gradient descent approach which exploits the chain
rule.
Illustration of chain rule in
computational graphs
Example to understand how exactly updates the weight using Backpropagation.

You might also like