Introduction DL
Introduction DL
By Gagandeep Kaur
From Cats to CNN
Around 1959 Hubel and Wiesel did a famous experiment on cat
There is the cat and there was a screen in front of it and on the
screen there were lines being displayed at different locations and
in different orientations like slanted, horizontal, vertical and so on.
There were some electrodes fitted to the cat and they were trying
to measure that which parts of brain actually respond to different
visual stimuli
So, the outcome of the study was that, that different neurons in
brain fire to only different types of stimuli, it is not that all neurons in
brain always fire to any kind of visual stimuli.
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
Biological Neurons
MP Neuron (Macullock and Pitts
Neuron)
The first computational model of a neuron was proposed by Warren
MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943
It may be divided into 2 parts.
The first part, g takes an input (dendrite) performs an aggregation and
based on the aggregated value the second part, f makes a decision.
Lets suppose that I want to predict my own decision, whether to watch a
random football game or not on TV.
The inputs are all boolean i.e., {0,1} and my output variable is also
boolean {0: Will watch it, 1: Won’t watch it}.
MP Neuron (Macullock and Pitts
Neuron)
So, x_1 could be isPremierLeagueOn (I like Premier League more)
x_2 could be isItAFriendlyGame (I tend to care less about the friendlies)
x_3 could be isNotHome (Can’t watch it when I’m running errands. Can
I?)
x_4 could be isManUnitedPlaying (I am a big Man United fan. GGMU!)
and so on.
These inputs can either be excitatory or inhibitory
Inhibitory inputs are those that have maximum effect on the decision
making irrespective of other inputs i.e., if x_3 is 1 (not home) then my
output will always be 0 i.e., the neuron will never fire, so x_3 is an inhibitory
input
Excitatory inputs are NOT the ones that will make the neuron fire on their
own but they might fire it when combined together.
MP Neuron (Macullock and Pitts
Neuron)
Formally, this is what is going on:
We can see that g(x) is just doing a sum of the inputs — a
simple aggregation.
And theta here is called thresholding parameter.
For example, if I always watch the game when the sum
turns out to be 2 or more, the theta is 2 here. This is called
the Thresholding Logic.
Boolean Functions Using M-P Neuron
This representation just denotes that, for the boolean inputs x_1, x_2 and x_3 if
the g(x) i.e., sum ≥ theta, the neuron will fire otherwise, it won’t.
AND Function
An AND function neuron would only fire when ALL the inputs are ON
i.e., g(x) ≥ 3 here.
OR Function
Now this might look like a tricky one but it’s really not.
Here, we have an inhibitory input i.e., x_2 so whenever x_2 is 1, the
output will be 0.
Keeping that in mind, we know that x_1 AND !x_2 would output 1 only
when x_1 is 1 and x_2 is 0 so it is obvious that the threshold parameter
should be 1.
Lets verify that, the g(x) i.e., x_1 + x_2 would be ≥ 1 in only 3 cases:
Case 1: when x_1 is 1 and x_2 is 0
Case 2: when x_1 is 1 and x_2 is 1
Case 3: when x_1 is 0 and x_2 is 1
But in both Case 2 and Case 3, we know that the output will be 0
because x_2 is 1 in both of them, thanks to the inhibition. And we also
know that x_1 AND !x_2 would output 1 for Case 1 (above) so our
threshold parameter holds good for the given function.
NOR Function
For a NOR neuron to fire, we want ALL the inputs to be 0 so the threshold
parameter should also be 0 and we take them all as inhibitory input.
NOT Function
We just used the aggregation equation i.e., x_1 + x_2 =1 to graphically show that all
those inputs whose output when passed through the OR function M-P neuron lie ON or
ABOVE that line and all the input points that lie BELOW that line are going to output 0.
The M-P neuron is splitting the input sets into two classes — positive and negative
Positive ones (which output 1) are those that lie ON or ABOVE the decision boundary
and negative ones (which output 0) are those that lie BELOW the decision boundary.
AND Function
AND Function
Now, this is very similar to an M-P neuron but we take a weighted sum of the inputs
and set the output as one only when the sum is more than an arbitrary threshold
(theta)
A football freak may have a very low threshold and may watch any football game
irrespective of the league, club or importance of the game [theta = 0].
On the other hand, a selective viewer may only watch a football game that is a premier
league game, featuring Man United game and is not friendly [theta = 2].
The point is, the weights and the bias will depend on the data (viewing history in this case).
Based on the data, if needed the model may have to give a lot of importance (high
weight) to the isManUnitedPlaying input and penalize the weights of other inputs.
Perceptron vs McCulloch-Pitts Neuron
From the equations, it is clear that even a perceptron separates the input space into two
halves, positive and negative.
All the inputs that produce an output 1 lie on one side (positive half space) and all the inputs
that produce an output 0 lie on the other side (negative half space).
Here, the weights, including the threshold can be learned and the inputs can be real values.
Boolean Functions Using Perceptron
OR Function
Just revisiting the good old OR function the perceptron way.
Boolean Functions Using Perceptron
OR Function
The above ‘possible solution’ was obtained by solving the linear system of
equations on the left.
It is clear that the solution separates the input space into two spaces,
negative and positive half spaces.
Now if you actually try and solve the linear equations above, you will realize
that there can be multiple solutions.
But which solution is the best?
To more formally define the ‘best’ solution, we need to understand errors
and error surfaces,
XOR Function — Can’t Do!
Now let's look at a non-linear boolean function i.e., you cannot draw a line to
separate positive inputs from the negative ones.
XOR Function — Can’t Do!
Now let's look at a non-linear boolean function i.e., you cannot draw a line to
separate positive inputs from the negative ones.
Notice that the fourth equation contradicts the second and the third equation.
Point is, there are no perceptron solutions for non-linearly separated data. So the key
take away is that a single perceptron cannot learn to separate the data that are
non-linear in nature.
The XOR Affair
In the book published by Minsky and Papert in 1969, the authors implied that, since
a single artificial neuron is incapable of implementing some functions such as
the XOR logical function,
The larger networks also have similar limitations, and therefore should be dropped.
Later research on three-layered perceptrons showed how to implement such
functions, therefore saving the technique from demolition.
Motivation For Sigmoid Neurons
The artificial neurons we use today are slightly different from the perceptron we
looked at, the difference is the activation function.
Some might say that the threshold logic used by a perceptron is very harsh.
For example, if you look at a problem of deciding if I will be watching a movie or not,
based only on one real-valued input (x_1 = criticsRating) and if the threshold we set is
0.5 (w_0 = -0.5) and w_1= 1 then our setup would look like this
Motivation For Sigmoid Neurons
What would be the decision for a movie with criticsRating = 0.51? Yes
What would be the decision for a movie with criticsRating = 0.49? No!
Some might say that its harsh that we would watch a movie with a rating of 0.51 but
not the one with a rating of 0.49 and this is where Sigmoid comes into the picture.
Now convince yourself that this harsh threshold is not attributed to just one specific
problem we chose here, it could happen with any or every problem we deal with.
It is a characteristic of the perceptron function itself which behaves like a step
function.
Motivation For Sigmoid Neurons
Motivation For Sigmoid Neurons
There will be this sudden change in the decision (from 0 to 1) when z value crosses the
threshold (-w_0).
For most real-world applications we would expect a smoother decision function which
gradually changes from 0 to 1.
Introducing sigmoid neurons where the output function is much smoother than the
step function seems like a logical and obvious thing to do.
Mind you that a sigmoid function is a mathematical function with a characteristic “S”-
shaped curve, also called the sigmoid curve.
There are many functions that can do the job for you
Motivation For Sigmoid Neurons
Motivation For Sigmoid Neurons