0% found this document useful (0 votes)
14 views53 pages

Introduction DL

The document provides an overview of deep learning and neural networks, beginning with a description of a 1959 experiment on cat vision that found different neurons in the brain respond to different visual stimuli. It then discusses biological neurons and the first computational model of a neuron proposed by McCulloch and Pitts in 1943. The McCulloch-Pitts neuron is described and used to represent boolean functions. Its limitations are noted. Finally, the document introduces the perceptron model proposed by Rosenblatt, which overcomes some limitations of the McCulloch-Pitts neuron by introducing learned weights and thresholds rather than predefined values.

Uploaded by

jamiesonlara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views53 pages

Introduction DL

The document provides an overview of deep learning and neural networks, beginning with a description of a 1959 experiment on cat vision that found different neurons in the brain respond to different visual stimuli. It then discusses biological neurons and the first computational model of a neuron proposed by McCulloch and Pitts in 1943. The McCulloch-Pitts neuron is described and used to represent boolean functions. Its limitations are noted. Finally, the document introduces the perceptron model proposed by Rosenblatt, which overcomes some limitations of the McCulloch-Pitts neuron by introducing learned weights and thresholds rather than predefined values.

Uploaded by

jamiesonlara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Deep Learning

By Gagandeep Kaur
From Cats to CNN
 Around 1959 Hubel and Wiesel did a famous experiment on cat
 There is the cat and there was a screen in front of it and on the
screen there were lines being displayed at different locations and
in different orientations like slanted, horizontal, vertical and so on.
 There were some electrodes fitted to the cat and they were trying
to measure that which parts of brain actually respond to different
visual stimuli
 So, the outcome of the study was that, that different neurons in
brain fire to only different types of stimuli, it is not that all neurons in
brain always fire to any kind of visual stimuli.
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
MP Neuron (Macullock and Pitts
Neuron)
 The first computational model of a neuron was proposed by Warren
MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943
 It may be divided into 2 parts.
 The first part, g takes an input (dendrite) performs an aggregation and
based on the aggregated value the second part, f makes a decision.
 Lets suppose that I want to predict my own decision, whether to watch a
random football game or not on TV.
 The inputs are all boolean i.e., {0,1} and my output variable is also
boolean {0: Will watch it, 1: Won’t watch it}.
MP Neuron (Macullock and Pitts
Neuron)
 So, x_1 could be isPremierLeagueOn (I like Premier League more)
 x_2 could be isItAFriendlyGame (I tend to care less about the friendlies)
 x_3 could be isNotHome (Can’t watch it when I’m running errands. Can
I?)
 x_4 could be isManUnitedPlaying (I am a big Man United fan. GGMU!)
and so on.
 These inputs can either be excitatory or inhibitory
 Inhibitory inputs are those that have maximum effect on the decision
making irrespective of other inputs i.e., if x_3 is 1 (not home) then my
output will always be 0 i.e., the neuron will never fire, so x_3 is an inhibitory
input
 Excitatory inputs are NOT the ones that will make the neuron fire on their
own but they might fire it when combined together.
MP Neuron (Macullock and Pitts
Neuron)
 Formally, this is what is going on:
 We can see that g(x) is just doing a sum of the inputs — a
simple aggregation.
 And theta here is called thresholding parameter.
 For example, if I always watch the game when the sum
turns out to be 2 or more, the theta is 2 here. This is called
the Thresholding Logic.
Boolean Functions Using M-P Neuron

 So far we have seen how the M-P neuron works.


 Now lets look at how this very neuron can be used to represent a few
boolean functions.
 Mind you that our inputs are all boolean and the output is also boolean so
essentially, the neuron is just trying to learn a boolean function.
 A lot of boolean decision problems can be cast into this, based on
appropriate input variables— like whether to watch a movie, etc. can be
represented by the M-P neuron.
M-P Neuron: A Concise Representation

 This representation just denotes that, for the boolean inputs x_1, x_2 and x_3 if
the g(x) i.e., sum ≥ theta, the neuron will fire otherwise, it won’t.
AND Function

 An AND function neuron would only fire when ALL the inputs are ON
i.e., g(x) ≥ 3 here.
OR Function

 We know that an OR function neuron would fire if ANY of the inputs is ON


i.e., g(x) ≥ 1 here.
A Function With An Inhibitory Input

 Now this might look like a tricky one but it’s really not.
 Here, we have an inhibitory input i.e., x_2 so whenever x_2 is 1, the
output will be 0.
 Keeping that in mind, we know that x_1 AND !x_2 would output 1 only
when x_1 is 1 and x_2 is 0 so it is obvious that the threshold parameter
should be 1.
 Lets verify that, the g(x) i.e., x_1 + x_2 would be ≥ 1 in only 3 cases:
 Case 1: when x_1 is 1 and x_2 is 0
Case 2: when x_1 is 1 and x_2 is 1
Case 3: when x_1 is 0 and x_2 is 1
 But in both Case 2 and Case 3, we know that the output will be 0
because x_2 is 1 in both of them, thanks to the inhibition. And we also
know that x_1 AND !x_2 would output 1 for Case 1 (above) so our
threshold parameter holds good for the given function.
NOR Function

 For a NOR neuron to fire, we want ALL the inputs to be 0 so the threshold
parameter should also be 0 and we take them all as inhibitory input.
NOT Function

 For a NOT neuron, 1 outputs 0 and 0 outputs 1. So we take the input as an


inhibitory input and set the threshold parameter to 0. It works!
 Can any boolean function be represented using the M-P neuron? Before
you answer that, lets understand what M-P neuron is doing geometrically.
Geometric Interpretation Of M-P Neuron

 We already discussed that the OR function’s threshold parameter theta is 1,


for obvious reasons.

 The inputs are obviously boolean, so only 4 combinations are possible —


(0,0), (0,1), (1,0) and (1,1).

 Now plotting them on a 2D graph and making use of the OR function’s


aggregation equation i.e., x_1 + x_2 ≥ 1 using which we can draw the
decision boundary

 This is not a real number graph


Geometric Interpretation Of M-P Neuron
Geometric Interpretation Of M-P Neuron

 We just used the aggregation equation i.e., x_1 + x_2 =1 to graphically show that all
those inputs whose output when passed through the OR function M-P neuron lie ON or
ABOVE that line and all the input points that lie BELOW that line are going to output 0.

 The M-P neuron just learnt a linear decision boundary

 The M-P neuron is splitting the input sets into two classes — positive and negative

 Positive ones (which output 1) are those that lie ON or ABOVE the decision boundary
and negative ones (which output 0) are those that lie BELOW the decision boundary.
AND Function
AND Function

 In this case, the decision boundary equation is x_1 + x_2 =2.


 Here, all the input points that lie ON or ABOVE, just (1,1), output 1 when
passed through the AND function M-P neuron.
 It fits! The decision boundary works!
OR Function With 3 Inputs
OR Function With 3 Inputs
 Lets just generalize this by looking at a 3 input OR function M-P unit. In this
case, the possible inputs are 8 points — (0,0,0), (0,0,1), (0,1,0), (1,0,0),
(1,0,1),… you got the point(s). We can map these on a 3D graph and this
time we draw a decision boundary in 3 dimensions.
Summary

 Just by hand coding a threshold parameter, M-P neuron is able to


conveniently represent the boolean functions which are linearly
separable.

 Linear separability (for boolean functions): There exists a line (plane)


such that all inputs which produce a 1 lie on one side of the line (plane)
and all inputs which produce a 0 lie on other side of the line (plane).
Limitations Of M-P Neuron

 What about non-boolean (say, real) inputs?


 Do we always need to hand code the threshold?
 Are all inputs equal? What if we want to assign more importance to some inputs?
 What about functions which are not linearly separable? Say XOR function.
 I hope it is now clear why we are not using the M-P neuron today.
 Overcoming the limitations of the M-P neuron, Frank Rosenblatt, an American psychologist,
proposed the classical perception model, the mighty artificial neuron, in 1958.
 It is more generalized computational model than the McCulloch-Pitts neuron where weights
and thresholds can be learnt over time.
Perceptron: The Artificial Neuron (An Essential
Upgrade To The McCulloch-Pitts Neuron)
 The most fundamental unit of a deep neural network is called an artificial
neuron, which takes an input, processes it, passes it through an activation
function like the Sigmoid, return the activated output.
 In this post, we are only going to talk about the perceptron model
proposed before the ‘activation’ part came into the picture.
 Frank Rosenblatt, an American psychologist, proposed the classical
perceptron model in 1958.
 Further refined and carefully analyzed by Minsky and Papert (1969) — their
model is referred to as the perceptron model
Perceptron
Perceptron

 The perceptron model, proposed by Minsky-Papert, is a more general


computational model than McCulloch-Pitts neuron.
 It overcomes some of the limitations of the M-P neuron by introducing the concept
of numerical weights (a measure of importance) for inputs, and a mechanism for
learning those weights.
 Inputs are no longer limited to boolean values like in the case of an M-P neuron
 it supports real inputs as well which makes it more useful and generalized.
Perceptron
Perceptron

 Now, this is very similar to an M-P neuron but we take a weighted sum of the inputs
and set the output as one only when the sum is more than an arbitrary threshold
(theta)

 However, according to the convention, instead of hand coding the threshold


parameter theta, we add it as one of the inputs, with the weight -theta like shown
below, which makes it learn-able
Perceptron
Perceptron
 Consider the task of predicting whether I would watch a random game of football on TV or not (the
same example from my M-P neuron post) using the behavioral data available.
 And let's assume my decision is solely dependent on 3 binary inputs (binary for simplicity).
Perceptron
Perceptron
 Here, w_0 is called the bias because it represents the prior (prejudice).

 A football freak may have a very low threshold and may watch any football game
irrespective of the league, club or importance of the game [theta = 0].

 On the other hand, a selective viewer may only watch a football game that is a premier
league game, featuring Man United game and is not friendly [theta = 2].

 The point is, the weights and the bias will depend on the data (viewing history in this case).

 Based on the data, if needed the model may have to give a lot of importance (high
weight) to the isManUnitedPlaying input and penalize the weights of other inputs.
Perceptron vs McCulloch-Pitts Neuron

What kind of functions can be implemented using a perceptron? How


different is it from McCulloch-Pitts neurons?
Perceptron vs McCulloch-Pitts Neuron

 From the equations, it is clear that even a perceptron separates the input space into two
halves, positive and negative.

 All the inputs that produce an output 1 lie on one side (positive half space) and all the inputs
that produce an output 0 lie on the other side (negative half space).

 In other words, a single perceptron can only be used to implement linearly


separable functions, just like the M-P neuron.

 Then what is the difference?

 Why do we claim that the perceptron is an updated version of an M-P neuron?

 Here, the weights, including the threshold can be learned and the inputs can be real values.
Boolean Functions Using Perceptron
OR Function
 Just revisiting the good old OR function the perceptron way.
Boolean Functions Using Perceptron
OR Function
 The above ‘possible solution’ was obtained by solving the linear system of
equations on the left.
 It is clear that the solution separates the input space into two spaces,
negative and positive half spaces.
 Now if you actually try and solve the linear equations above, you will realize
that there can be multiple solutions.
 But which solution is the best?
 To more formally define the ‘best’ solution, we need to understand errors
and error surfaces,
XOR Function — Can’t Do!
 Now let's look at a non-linear boolean function i.e., you cannot draw a line to
separate positive inputs from the negative ones.
XOR Function — Can’t Do!
 Now let's look at a non-linear boolean function i.e., you cannot draw a line to
separate positive inputs from the negative ones.
 Notice that the fourth equation contradicts the second and the third equation.
Point is, there are no perceptron solutions for non-linearly separated data. So the key
take away is that a single perceptron cannot learn to separate the data that are
non-linear in nature.
The XOR Affair

 In the book published by Minsky and Papert in 1969, the authors implied that, since
a single artificial neuron is incapable of implementing some functions such as
the XOR logical function,
 The larger networks also have similar limitations, and therefore should be dropped.
 Later research on three-layered perceptrons showed how to implement such
functions, therefore saving the technique from demolition.
Motivation For Sigmoid Neurons
 The artificial neurons we use today are slightly different from the perceptron we
looked at, the difference is the activation function.
 Some might say that the threshold logic used by a perceptron is very harsh.
 For example, if you look at a problem of deciding if I will be watching a movie or not,
based only on one real-valued input (x_1 = criticsRating) and if the threshold we set is
0.5 (w_0 = -0.5) and w_1= 1 then our setup would look like this
Motivation For Sigmoid Neurons

 What would be the decision for a movie with criticsRating = 0.51? Yes
 What would be the decision for a movie with criticsRating = 0.49? No!
 Some might say that its harsh that we would watch a movie with a rating of 0.51 but
not the one with a rating of 0.49 and this is where Sigmoid comes into the picture.
 Now convince yourself that this harsh threshold is not attributed to just one specific
problem we chose here, it could happen with any or every problem we deal with.
 It is a characteristic of the perceptron function itself which behaves like a step
function.
Motivation For Sigmoid Neurons
Motivation For Sigmoid Neurons

 There will be this sudden change in the decision (from 0 to 1) when z value crosses the
threshold (-w_0).
 For most real-world applications we would expect a smoother decision function which
gradually changes from 0 to 1.
 Introducing sigmoid neurons where the output function is much smoother than the
step function seems like a logical and obvious thing to do.
 Mind you that a sigmoid function is a mathematical function with a characteristic “S”-
shaped curve, also called the sigmoid curve.
 There are many functions that can do the job for you
Motivation For Sigmoid Neurons
Motivation For Sigmoid Neurons

 One of the simplest one to work with is the logistic function.


Sigmoid Neurons
Motivation For Sigmoid Neurons

 We no longer see a sharp transition around the w_0.


 Also, the output is no longer binary but a real value between 0 and 1 which
can be interpreted as a probability.
 So instead of yes/no decision, we get the probability of yes.
 The output here is smooth, continuous and differentiable and just how any
learning algorithm likes it.
 To verify this we will go through the backpropagation concept in Deep
Learning.

You might also like