0% found this document useful (0 votes)
143 views8 pages

ECE/CS 559 - Neural Networks Lecture Notes #2 Mathematical Models For The Neuron, Neural Network Architectures

The document summarizes mathematical models of neurons and neural network architectures. It describes the biological structure and function of neurons and synapses. It then presents a mathematical model of a single neuron, defining the inputs, weights, summation, activation function, and output. Finally, it discusses different neural network architectures including single-layer feedforward networks, defining their structure and formulating their input-output relationships using matrix and vector notation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views8 pages

ECE/CS 559 - Neural Networks Lecture Notes #2 Mathematical Models For The Neuron, Neural Network Architectures

The document summarizes mathematical models of neurons and neural network architectures. It describes the biological structure and function of neurons and synapses. It then presents a mathematical model of a single neuron, defining the inputs, weights, summation, activation function, and output. Finally, it discusses different neural network architectures including single-layer feedforward networks, defining their structure and formulating their input-output relationships using matrix and vector notation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ECE/CS 559 - Neural Networks Lecture Notes #2

Mathematical models for the neuron, Neural network architectures


Erdem Koyuncu

1 Mathematical models for the neuron


1.1 The biology of the neuron

(Image taken from Wikipedia)

• Neuron: A highly-specialized cell that can transmit and receive information through electrical/chemical
signaling. The fundamental building block of the brain.
• Synapse: The structure that connects two given neurons, i.e. the functional unit that mediates the
interactions between neurons.

• A chemical synapse: Electrical signal through axon of one neuron → chemical signal / neurotransmitter
→ postsynaptic electrical signal at the dendrite of another neuron.
• Information flows through dendrites to axons: Dendrite (of neuron A) → Cell body (Soma) → Axon
→ Synapses → Dendrite (we are now at neuron B) → Cell body (Soma) → Axon → Synapses → · · · .

• A neuron can receive thousands of synaptic contacts and it can project onto thousands of target
neurons.
• A synapse can impose excitation or inhibition (but not both) on the target neuron. The “strength” of
excitation/inhibition may vary from one synapse to another.
• There are thousands of different types of neurons. There are, however, 3 main categories.

– Sensory neurons: Conveys sensory (light, sound, touch) information.


– Interneurons: Convey information between different types of neurons.
– Motor neurons: Transmit muscle/gland control information from the brain/spinal cord.
• The soma is a few to a hundred micrometers in diameter. Axons are generally much longer, and can
can in fact be up to a meter long! (e.g. neurons of the sciatic nerve).
• Speed of neural impulses: A few to a hundred meters per second.
• Around 1011 neurons in the brain, 1012 neurons in the entire nervous system. Also, an average a neuron
makes thousands of connections to other neurons, resulting in around 1014 synapses in the brain.

1
1.2 A mathematical model for the neuron

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
• Input signals are from dendrites of other neurons.
• The synaptic weights correspond to the synaptic strengths: positivity/negativity → excitation/inhibition.
• The summing unit models the operation of the cell body (soma).

• The nonlinearity ϕ(·) (activation function) models the axon operation.


• The output may be connected to dendrite of another neuron through another synapse.
Pn
• vk = j=1 wkj xj + bk is called the induced local field of neuron k.
P 
n
• yk = ϕ(vk ) = ϕ j=1 wkj xj + bk .

• Alternatively, we may consider a fixed input x0 = 1 with weight wk0 = bk :

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)
P 
n
• yk = ϕ j=0 wkj xj . Note that now the summation starts from index 0.

1.3 Types of activation function


Typically, ϕ(·) has bounded image (e.g. [0, 1] or [−1, 1]), and thus is also called a squashing function. It
limits the amplitude range of the neuron output.

2
1.3.1 Step function
• Threshold function (or the Heaviside/step function):

1, v ≥ 0
ϕ(v) = .
0, v < 0

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)

Such a neuron is referred to as the McCulloch-Pitts model (1943).

1.3.2 Sigmoid function


• The sigmoid function is defined as
1
ϕ(v) = ,
1 + exp(−av)
where a is called the slope parameter.

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)

• As a → ∞ the sigmoid function approaches the step function.


• Unlike the step function, the sigmoid function is continuous and differentiable. Differentiability turns
out to be a desirable property of an activation function, as we shall see later.

1.3.3 Signum function



 1, v ≥ 0
ϕ(v) = sgn(v) = 0, v = 0 .
−1, v < 0

1.3.4 Hyperbolic tangent

eav − e−av
ϕ(v) = tanh(av) = .
eav + e−av
for some parameter a > 0. Approaches the signum function as a → ∞.

3
2 Neural network architectures
Having introduced our basic model of a (mathematical) neuron, we now introduce the different neural network
architectures that we will keep revisiting throughout the course.

2.1 Single-layer feedforward networks

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)

Figure 1: Feedforward network with a single layer of neurons

• We just count the number of layers consisting of neurons (not the layer of source nodes as no compu-
tation is performed there). Thus, the network in Fig. 1 is called a single-layer network.

• Also, note that the information flow is only over one direction, i.e. the input layer of source nodes
project directly onto the output layer of neurons (according to the non-linear transformations as spec-
ified by the neurons.). There is no feedback of the network’s output to network’s input. We thus say
that the network in Fig. 1 is of feedforward type.
• Let us try to formulate the input-output relationship of the network. Putting in the symbols, we have
the following diagram:

4
We have
n
!
X
y1 = φ b1 + w1i xi
i=1
n
!
X
y2 = φ b2 + w2i xi
i=1
..
.
n
!
X
yk = φ bk + wki xi
i=1

We can rewrite all k equations via a single equation:


n
!
X
yj = φ bj + wji xi , j = 1, . . . , k
i=1

5
Here wji is the weight from input i to Neuron j. We can further rewrite everything in a simple matrix form.
Define
 
    1
y1 b1 w11 · · · w1n  x1 
 y2  b2 w21 · · · w2n   
 0  x2 
y =  .  , W0 =  . , x =  .
  
. . .
 ..   .. .. .. .. 

 .. 
 . 
yk bk wk1 · · · wkn
xn
Then, the above input output relationship can just be written as y = φ(W0 x0 ) in the sense that φ is applied
component-wise. Sometimes biases are treated separately. Defining
     
b1 w11 · · · w1n x1
 b2  w21 · · · w2n   x2 
b =  . ,W =  . ..  , x =  ..  .
     
..
 ..   .. . .   . 
bk wk1 ··· wkn xn
we can write y = φ(W0 x0 ) = φ(Wx + b).

2.2 Multilayer feedforward networks

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)

Figure 2: Fully-connected 2-layer feedforward network with one hidden layer and one output layer.

• We can add more layers to the feedforward network in Fig. 1.


• The end-result is a multilayer network with one or more hidden layers.
• Hidden layers refer to those layers that are not seen directly from either the input or the output of the
network.

6
• We call a network with m source nodes in its input layer, h1 hidden nodes in its first hidden layer, h2
nodes in its second hidden layer, . . ., hK nodes in its Kth hidden layer, and finally q nodes in its output
layer a m-h1 -h2 -· · · -hK -q network. For example, the network in Fig. 2 is called a 10-4-2 network as it
has 10 source nodes in its input layer, 4 nodes in the hidden layer, and 2 nodes in its output layer.
• Fully-connected network: Every node in each layer of the network is connected to every other node
in the adjacent layer. Example: The network is Fig. 2 is fully connected. Otherwise, the network is
partially connected.
• Deep network: Many (usually assumed to be > 1) hidden layers. Shallow network: The opposite.
• As will be made more precise later on, theoretically, only one hidden node is sufficient for almost any
application provided that one can afford a large number of neurons. On the other hand, a deep network
can perform the same tasks as a shallow network with the extra advantage of possibly a fewer number
of neurons. Hence, deep networks, provided that they can be properly designed, may be better suited
for complex practical applications.
• The input-output relationships may be formulated in a similar manner as the single-layer network
discussed previously. For example, consider a two-layer network with n inputs, L1 neurons in the first
layer, and L2 neurons in the second (output) layer. Let x ∈ Rn×1 be the vector of inputs, W1 ∈ RL1 ×n
be the matrix of weights connecting the input layer to the first layer of neurons (where the ith row
jth column of W1 corresponds to the weight between input node j and neuron i of the first layer),
b1 ∈ RL1 ×1 be the vector of biases for the first layer of neurons, W2 ∈ RL2 ×L1 be the matrix of weights
connecting the first layer of neurons to the second layer of neurons (where the ith row jth column of
W2 corresponds to the weight between neuron j of the first layer and neuron i of the second layer),
and b2 ∈ RL2 ×1 be the vector of biases for the second layer of neurons, and y ∈ RL2 ×1 be the vector
of outputs. Then, we have y = φ(W2 φ(W1 x + b1 ) + b2 ).

2.3 Recurrent networks

(Image taken from our coursebook: S. Haykin, “Neural Networks and Learning Machines,” 3rd ed.)

Figure 3: Recurrent network with no self-feedback loops and no hidden neurons.

7
• What distinguishes recurrent networks from feedforward networks is that they incorporate feedback.
• In Fig. 3, the boxes with z −1 represent the unit discrete time delays.
• No self-feedback: The output of a given neuron is not fed back to its own input.

• One may have variations of the structure in Fig. 3. We shall discuss these variations and the details
of recurrent networks later on.

You might also like