0% found this document useful (0 votes)

28 views24 pages

Mod 2.1,2.2

This document provides an introduction to neural networks and discusses their biological and machine learning perspectives. It describes how neural networks were originally designed to simulate biological learning in the nervous system but are now commonly used in machine learning to increase the power of models by stacking computational graphs. The document outlines the historical origins of neural networks from the perceptron model in 1958 and discusses Minsky and Papert's 1969 book that showed the perceptron's limitations. It also compares biological and artificial neural network learning methods and analyzes reasons for deep learning's recent popularity due to increased data and computational power. Finally, it describes single and multi-layer network architectures like the perceptron.

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views24 pages

Mod 2.1,2.2

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Charu C.

Aggarwal
I B M T J Watson Research Center
Yorktown Heights, N Y

A n Intro duction to Neural N e tworks

Neural Networks and Deep Learning, Springer, 2018

Chapter 1, Sections 1 . 1 – 1 . 2
Neural Networks: T w o Views

• A way t o simulate biological learning by simulating the

nervous system.

• A way t o increase the power of known models in machine

learning by stacking t h e m in careful ways as computational
graphs.

– T h e number of nodes in the computational graph

controls learning capacity w i t h increasing data.

– The architecture of t he computational graph

sp eciﬁc domain-sp eciﬁc insights ( e.g., images,
incorporates
speech).
– T h e success of deep computational graphs has led t o
the coining of the t e r m “deep learning.”
Historical Origins

• T h e ﬁ r s t model of a computational unit was the perceptron

(195 8).

– Was roughly inspired by the biological model of a neuron.

– Was implemented using a large piece of hardware.

– Generated great excitement b u t failed t o live up t o inﬂated

expectations.

• Was n o t any more powerful than a simple linear model t h a t

can be implemented in a few lines of code today.
T h e Perceptro n [ I m a g e Courtesy: Smithsonian Institute]
T h e First Neural W i n t e r : Min sky and P a p e rt ’s B o o k

• Minsky and Pape r t ’s B o ok ” Perceptrons” ( 1 9 6 9 )

showed t h a t the perceptron only had limited expressive
power.

– Essential t o p u t together multiple computational units.

T h e Biological Inspiration

• Neural n e tworks were originally designed t o simulate

t he learning process in biological organisms.

• T h e human nervous system contains cells called neurons.

• T h e neurons are connected t o one another w i t h the use

of A x o n s a n d d e n d r i t e s , a n d t h e c o n n e c t i n g r e g i o n s
between axons and dendrites are referred to as
synapses.

– The strengths of synaptic connections often

change in response t o external stimuli.

– T h i s change causes learning in living organisms.

Neural Networks: T h e Biological Inspiration

( a ) Biological neural network ( b ) Artiﬁcial neural network

• Neural networks contain co mputat ion units ⇒ Neurons.

• An artificial neural network computes a function of the inputs by propagating
the computed values from the input neurons to the output neuron(s) and
using the weights as intermediate parameters.
• T h e computational units are connected t o one another through
weights ⇒ Strengths of synaptic connections in biological organisms.
• Learning occurs by changing the weights connecting the neurons.
• Each input t o a neuron is scaled w i t h a weight, which a ff ects the
function computed a t t h a t unit.
Learning in Biological vs Artificial Networks
• In living organisms, synaptic weights change in response t o external
stimuli.
– A n unpleasant experience will change the synaptic weights of an
organism, which will train the organism t o behave di ff erently.
• In artificial neural networks, the weights are learned w i t h the use of
training data, which are in p u t - o u t p u t pairs (e.g., images and their
labels).

– An error made in predicting the label of an image is the

unpleasant “stimulus” that changes the weights of the neural
network.
– By successively adjusting the weights between neurons over
many input-output pairs, the function computed by the neural
network is refined over time so that it provides more accurate
predictions
– When trained over many images, the network learns to classify
images correctly.
– This ability to accurately compute functions of unseen inputs by
training over a finite set of input-output pairs is referred to as
model generalization.
M a c h i n e Learning versus D e e p Learning

DEEP LEARNING

ACCURACY
CONVENTIONAL
MACHINE LEARNING

AMOUNT OF DATA

• For smaller data sets, traditional machine learning

methods often provide slightly better performance.

• Traditional models often provide more choices,

interpretable insights, and ways t o handcraft features.

• For larger data sets, deep learning methods tend t o

dominate.
Reasons for R e c e n t Popularity

• The recent success of neural ne tworks has b een

an increase
caused by in data and computational power.

– Increased computational p ower has the cycle

reduced times for experimentation.

– If i t requires a m o n t h t o train a network, one cannot t r y

more than 12 variations in an year on a single platform.

– Reduced cycle times have also led t o a larger number of

successful tweaks of neural networks in recent years.

– M o s t of the models have n o t changed dramatically f r o m

an era where neural networks were seen as impractical.

• W e are now operating in a data and computational regime

where deep learning has become attractive compared t o t ra -
ditional machine learning.
Single L ayer Net works: T he Perceptron

Neural Networks and Deep Learning, Springer, 2018

Chapter 1, Section 1.3
Perceptron
• In the single layer network, a set of inputs is directly mapped to
an output by using a generalized variation of a linear function.

• This simple instantiation of a neural network is referred to as the

perceptron .
• In multi-layer neural networks, the neurons are arranged in
layered fashion, in which the input and output layers are
separated by a group of hidden layers.
• This layer-wise architecture of the neural network is also
referred to as a feed-forward network.
Binary Classiﬁcation and Linear Regression Problems

• In the binary classiﬁcation problem, each training pair ( X ,

y ) contains feature variables X = ( x 1 , . . . x d ) , and label y
drawn f r o m { − 1 , + 1 } ( Observed value of binary class
variables- given to us as a part of the training data).

– Example: Feature variables m i g h t be frequencies of

words in an email, and the class variable m i g h t be an
indicator of spam.

– Given labeled emails, recognize incoming spam.

• In linear regression, the dependent variable y is real-valued.

– Feature variables are frequencies of words in a Web page,

and the dependent variable is a prediction of the number
of accesses in a ﬁxed period.

• Perceptron is designed for the binary setting.

T h e Perceptron: Earliest Historical Architecture

INPUT NODES
x1
w1
x2 OUTPUT NODE
w2

x3 w3
∑ y
w4
x4
w5
x5

• T h e d nodes in the input layer only transmit the d features

X = [ x 1 . . . x d ] w i t h o u t performing any computation.

• O u t p u t node multiplies input w i t h weights W= [w 1 . . . w d ]

on incoming edges, aggregates them, and applies sign
activation:

• The sign function maps a real value to either +1 or −1,

which is appropriate for binary classification.
W h a t is t h e Perceptron D o in g ?

• Tries t o ﬁ n d a linear separator W · X = 0 between the t w o

classes.

• Ideally, all positive instances ( y = 1 ) should be on the

side of the separator satisfying W · X > 0.

• All negative instances ( y = − 1 ) should be on the side of the

separator satisfying W · X < 0.
Bias Neurons
INPUT NODES
x1
w1
x2 w2 OUTPUT NODE

x3 w3
∑ y
w4
x4
w5
b
x5
+1 BIAS NEURON

• In many settings (e.g., skewed class distribution) we need

an invariant part of the prediction w i t h bias variable b:

• O n setting w d + 1 = b and x d + 1 as the input f r o m the bias

neuron, i t makes little di ﬀ erence t o learning procedures ⇒
O f t e n implicit in architectural diagrams
Multilayer Neural Networks
INPUT LAYER
x1
HIDDEN LAYER
x2

OUTPUT LAYER
x3 y

• In multilayer networks, the o u t p u t of a node can feed into

other hidden nodes, which in t u r n can feed into other
hidden or o u t p u t nodes.

• Multilayer neural networks are always directed acyclic graphs

and usually arranged in layerwise fashion.
Multilayer Neural Networks
INPUT LAYER
x1
HIDDEN LAYER
x2

OUTPUT LAYER
x3 y

• T h e layers between the input and o u t p u t are referred t o as

hidden because they perform intermediate computations.

• Each hidden node uses a combination of a linear transforma-

tion and an activation function Φ ( · ) (like the o u t p u t node
of the perceptron).

• T h e use of nonlinear activation functions in the hidden layer

is crucial in increasing learning capacity.
Role of Hidden Layers

• Nonlinear hidden layers perform the role of hierarchical fea-

ture engineering.

– Early layers learn primitive features and later layers learn

more complex features

– Image data: Early layers learn elementary edges, the mid-

dle layers contain complex features like honeycombs, and
later layers contain complex features like a part of a face.

– Deep learners are masters of feature engineering.

• T h e ﬁnal o u t p u t layer is often able t o perform inference w i t h

transformed features in penultimate layer relatively easily.

• P e rcept ron: Cannot classify linearly inseparable data

but can do so w i t h nonlinear hidden layers.
Training a Neural Network with
Backpropagation

 Backpropagation is one of the important concepts of a neural network.

 Our task is to classify our data best.
 For this, we have to update the weights of parameter and bias, In the
linear regression model, we use gradient descent to optimize the
parameter.

 Similarly here we also use gradient descent algorithm using

Backpropagation.
 For a single training example, Backpropagation algorithm calculates
the gradient of the error function.
Training a Neural Network with Backpropagation

The backpropagation algorithm contains two main phases, referred to as the forward and
backward phases, respectively.

1. Forward phase: In this phase, the inputs for a training instance are fed into the neural
network. This results in a forward cascade of computations across the layers, using the
current set of weights. The ﬁnal predicted output can be compared to that of the training
instance and the derivative of the loss function with respect to the output is computed. The
derivative of this loss now needs to be computed with respect to the weights in all layers in
the backwards phase.

2. Backward phase: The main goal of the backward phase is to learn the gradient of the loss
function with respect to the diﬀerent weights by using the chain rule of di ﬀerential
calculus. These gradients are used to update the weights. Since these gradients are learned
in the backward direction, starting from the output node, this learning process is referred to
as the backward phase.
Training a Neural Network
with Backpropagation
 In the single-layer neural network, the training process is relatively straightforward
because the error (or loss function) can be computed as a direct function of the
weights, which allows easy gradient computation.
 In the case of multi-layer networks, the problem is that the loss is a complicated
composition function of the weights in earlier layers. The gradient of a composition
function is computed using the backpropagation algorithm.

 The backpropagation algorithm leverages the chain rule of differential calculus, which
computes the error gradients in terms of summations of local-gradient products over
the various paths from a node to the output.
 Backpropagation algorithms are a set of methods used to efficiently train artificial
neural networks following a gradient descent approach which exploits the chain
rule.
Illustration of chain rule in
computational graphs
Example to understand how exactly updates the weight using Backpropagation.

Introduction To Programming Using Python 1st Edition Ebook
50% (2)
Introduction To Programming Using Python 1st Edition Ebook
13 pages
Unit 5
No ratings yet
Unit 5
61 pages
Facial K: Dynamic Selfie Filters Using ML
No ratings yet
Facial K: Dynamic Selfie Filters Using ML
10 pages
Chap 6
No ratings yet
Chap 6
24 pages
General Physics Lesson 3
No ratings yet
General Physics Lesson 3
10 pages
Module 3
No ratings yet
Module 3
46 pages
A Three-Dimensional Finite-Strain Rod Model. Part Ii: Computational Aspects
No ratings yet
A Three-Dimensional Finite-Strain Rod Model. Part Ii: Computational Aspects
37 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Contoh Paper Internasional
No ratings yet
Contoh Paper Internasional
48 pages
Breakthrough Trading Formulas
100% (1)
Breakthrough Trading Formulas
7 pages
Chaos Theory
No ratings yet
Chaos Theory
14 pages
Lab 2 DSP. Linear Time-Invariant System
No ratings yet
Lab 2 DSP. Linear Time-Invariant System
15 pages
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
No ratings yet
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
19 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
CNN Notes
No ratings yet
CNN Notes
10 pages
Tutorial - 4
No ratings yet
Tutorial - 4
2 pages
A Short Introduction To The Kotlin Language: For Java Developers
No ratings yet
A Short Introduction To The Kotlin Language: For Java Developers
35 pages
Applied Mechanics Ii
No ratings yet
Applied Mechanics Ii
3 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Communication Lab Record
No ratings yet
Communication Lab Record
32 pages
Jawaban Soal Uas Analisa Data
No ratings yet
Jawaban Soal Uas Analisa Data
7 pages
Egalitarianism
No ratings yet
Egalitarianism
16 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
CSE330 Quiz Solutions
No ratings yet
CSE330 Quiz Solutions
5 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
51 pages
UT Dallas Syllabus For Ee6325.001 05s Taught by Poras Balsara (Poras)
No ratings yet
UT Dallas Syllabus For Ee6325.001 05s Taught by Poras Balsara (Poras)
2 pages
Math Anxiety Questionnaire For Children
No ratings yet
Math Anxiety Questionnaire For Children
10 pages
Trigonometry Formulae - Trigo Formulae For LAKSHYA JEE
No ratings yet
Trigonometry Formulae - Trigo Formulae For LAKSHYA JEE
2 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
Dersnot 6452 1668688984
No ratings yet
Dersnot 6452 1668688984
36 pages
1985-Orientational Analysis, Tensor Analysis and The Group Properties of The SI Supplementary units-II
No ratings yet
1985-Orientational Analysis, Tensor Analysis and The Group Properties of The SI Supplementary units-II
18 pages
Grade 12 A&B Term 1 Study Guide
No ratings yet
Grade 12 A&B Term 1 Study Guide
2 pages
Database Management System 1 Oracle
No ratings yet
Database Management System 1 Oracle
9 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
24 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
Intro To Neural Nets PDF
No ratings yet
Intro To Neural Nets PDF
29 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Amc8 2020
No ratings yet
Amc8 2020
16 pages
Unit 4
100% (1)
Unit 4
57 pages
Anna University Question Paper - MA2261 Probability and Random Processes
No ratings yet
Anna University Question Paper - MA2261 Probability and Random Processes
3 pages
ML Unit4
No ratings yet
ML Unit4
38 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
51 pages
Uni2 NN 2023
No ratings yet
Uni2 NN 2023
52 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
No ratings yet
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
15 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
Tensorflow
No ratings yet
Tensorflow
25 pages
All Slides
No ratings yet
All Slides
535 pages
En 10210pdf
No ratings yet
En 10210pdf
34 pages
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
No ratings yet
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
13 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
CBSE Test Paper 01 Chapter 5 Continuity and Differentiability
No ratings yet
CBSE Test Paper 01 Chapter 5 Continuity and Differentiability
7 pages
6 - (9-17) Neural Networks
No ratings yet
6 - (9-17) Neural Networks
32 pages
Neural Networks
No ratings yet
Neural Networks
44 pages
Chapter One
No ratings yet
Chapter One
9 pages
PG 589
No ratings yet
PG 589
1 page
Chap 1 Slides
No ratings yet
Chap 1 Slides
53 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
CCS355 NNDL Unit1
No ratings yet
CCS355 NNDL Unit1
30 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
DL Unit 4 Perfect PDF - 1
No ratings yet
DL Unit 4 Perfect PDF - 1
23 pages
BEng Mechanical 2024
No ratings yet
BEng Mechanical 2024
7 pages
Module 2
No ratings yet
Module 2
84 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Introduction To Neural Networks - Single Layer Perceptrons - Modified
No ratings yet
Introduction To Neural Networks - Single Layer Perceptrons - Modified
26 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
ML Lec11
No ratings yet
ML Lec11
14 pages
Unit 1
No ratings yet
Unit 1
16 pages
Neural Network
No ratings yet
Neural Network
7 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Form 1 Diagnostic Test - Mathematics
No ratings yet
Form 1 Diagnostic Test - Mathematics
5 pages
Introduction Neural
No ratings yet
Introduction Neural
13 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Neural Network
No ratings yet
Neural Network
55 pages
Refined Chapter 5 UceQEJ
No ratings yet
Refined Chapter 5 UceQEJ
79 pages
Unit V
No ratings yet
Unit V
49 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Unit III
No ratings yet
Unit III
29 pages
UNIT1
No ratings yet
UNIT1
72 pages
Neural Network
No ratings yet
Neural Network
82 pages

Mod 2.1,2.2

Uploaded by

Mod 2.1,2.2

Uploaded by

Charu C.

A n Intro duction to Neural N e tworks

Neural Networks and Deep Learning, Springer, 2018

• A way t o simulate biological learning by simulating the

• A way t o increase the power of known models in machine

– T h e number of nodes in the computational graph

– The architecture of t he computational graph

• T h e ﬁ r s t model of a computational unit was the perceptron

– Was roughly inspired by the biological model of a neuron.

– Was implemented using a large piece of hardware.

– Generated great excitement b u t failed t o live up t o inﬂated

• Was n o t any more powerful than a simple linear model t h a t

• Minsky and Pape r t ’s B o ok ” Perceptrons” ( 1 9 6 9 )

– Essential t o p u t together multiple computational units.

• Neural n e tworks were originally designed t o simulate

• T h e human nervous system contains cells called neurons.

• T h e neurons are connected t o one another w i t h the use

– The strengths of synaptic connections often

– T h i s change causes learning in living organisms.

( a ) Biological neural network ( b ) Artiﬁcial neural network

• Neural networks contain co mputat ion units ⇒ Neurons.

– An error made in predicting the label of an image is the

• For smaller data sets, traditional machine learning

• Traditional models often provide more choices,

• For larger data sets, deep learning methods tend t o

• The recent success of neural ne tworks has b een

– Increased computational p ower has the cycle

– If i t requires a m o n t h t o train a network, one cannot t r y

– Reduced cycle times have also led t o a larger number of

– M o s t of the models have n o t changed dramatically f r o m

• W e are now operating in a data and computational regime

Neural Networks and Deep Learning, Springer, 2018

• This simple instantiation of a neural network is referred to as the

• In the binary classiﬁcation problem, each training pair ( X ,

– Example: Feature variables m i g h t be frequencies of

– Given labeled emails, recognize incoming spam.

• In linear regression, the dependent variable y is real-valued.

– Feature variables are frequencies of words in a Web page,

• Perceptron is designed for the binary setting.

• T h e d nodes in the input layer only transmit the d features

• O u t p u t node multiplies input w i t h weights W= [w 1 . . . w d ]

• The sign function maps a real value to either +1 or −1,

• Tries t o ﬁ n d a linear separator W · X = 0 between the t w o

• Ideally, all positive instances ( y = 1 ) should be on the

• All negative instances ( y = − 1 ) should be on the side of the

• In many settings (e.g., skewed class distribution) we need

• O n setting w d + 1 = b and x d + 1 as the input f r o m the bias

• In multilayer networks, the o u t p u t of a node can feed into

• Multilayer neural networks are always directed acyclic graphs

• T h e layers between the input and o u t p u t are referred t o as

• Each hidden node uses a combination of a linear transforma-

• T h e use of nonlinear activation functions in the hidden layer

• Nonlinear hidden layers perform the role of hierarchical fea-

– Early layers learn primitive features and later layers learn

– Image data: Early layers learn elementary edges, the mid-

– Deep learners are masters of feature engineering.

• T h e ﬁnal o u t p u t layer is often able t o perform inference w i t h

• P e rcept ron: Cannot classify linearly inseparable data

 Backpropagation is one of the important concepts of a neural network.

 Similarly here we also use gradient descent algorithm using

You might also like