0% found this document useful (0 votes)
2 views22 pages

13 Nnbasics

The document provides an introduction to neural networks, focusing on their structure, types, and functionality, particularly Feedforward Neural Networks (FNN) and Multilayer Perceptrons (MLP). It discusses the limitations of simple linear models and compares the approaches of Support Vector Machines (SVM) and neural networks in adapting basis functions for large-scale problems. Additionally, it outlines the process of forward propagation in a two-layer perceptron model and the role of activation functions in both hidden and output layers.

Uploaded by

mckaymcfadden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views22 pages

13 Nnbasics

The document provides an introduction to neural networks, focusing on their structure, types, and functionality, particularly Feedforward Neural Networks (FNN) and Multilayer Perceptrons (MLP). It discusses the limitations of simple linear models and compares the approaches of Support Vector Machines (SVM) and neural networks in adapting basis functions for large-scale problems. Additionally, it outlines the process of forward propagation in a two-layer perceptron model and the role of activation functions in both hidden and output layers.

Uploaded by

mckaymcfadden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

13 - Introduction to Neural Networks

UCLA Math156: Machine Learning


Instructor: Lara Kassab
Neural Networks
The origin of Neural Networks is inspired by information processing
models of biological systems, in particular the human brain.
Neural networks are also called Artificial Neural Networks
(ANN) or Neural Nets (NN).
They consist of connected artificial neurons called units or
nodes which loosely model the neurons in a brain.
Neural Networks
Deep learning refers to training neural networks with multiple
hidden layers.
Feedforward Neural Network

A Feedforward Neural Network (FNN) is one of the two main types


of NNs. A FNN has a uni-directional flow of information between
its layers.

The direction of the flow or connections from: input nodes →


(multiple) hidden nodes → output nodes is forward without
any cycles or loops.

This is in contrast to Recurrent Neural Networks (RNN),


which have a bi-directional flow.

FNNs can be regression or classification models depending on


the activation function used in the output layer.
Multilayer Perceptron
A Multilayer Perceptron (MLP) is a FNN where all the nodes of
the previous layer are connected to each input of the succeeding
layer (except for the bias node). This architecture is called
fully-connected.
Review of Linear Models

Generalized linear models for regression and classification have the


form:  
M
X −1
y(x, w) = f  wj ϕj (x)
j=0

The basis functions ϕj (x) are fixed nonlinear functions such


as Gaussian RBF, Sigmoidal functions, etc.
For regression, f is usually the identity function. For
classification, f is usually a nonlinear activation function such
as logistic sigmoid or sign function.
Simple Linear Models: Limitations

These (fixed) linear basis function models have limited practical


applicability on large-scale problems due to the curse of
dimensionality.

The number of coefficients needed to adapt the basis


functions to the data grows with the number of features.
To extend to large-scale problems we need to adapt the basis
functions ϕj to the data. Both SVMs and neural networks
address this limitation in different ways.
SVM Approach

The number of basis functions in SVM is not pre-defined. SVM


varies the number of basis functions centered on training samples:

SVM selects a subset of these during training (support


vectors). This number depends on the characteristics of the
data, choice of kernels, hyperparameters (e.g. regularization
coefficient), etc.
Although training involves nonlinear optimization, the
objective function is convex.
In SVM, the number of basis functions is much smaller than
the number of training points, but it can still be large and
grow with the size of the training set.
Neural Networks Approach

Neural Networks fix the number of basis functions in advance, but


allow them to be adaptive:

The basis functions ϕj have their own parameters {wji }


which are adapted during training.
Neural networks involve a non-convex optimization during
training (many minima), but we get a more compact and
faster model at the expense of training.
Basic Neural Network Model

A neural network can also be represented similar to linear models


but the basis functions are generalized:
 
M
X −1
y(x, w) = f  wj ϕj (x)
j=0

There can be several activation functions f and the process is


repeated over and over.
generalized model = nonlinear function ( linear model )
The parameters wj of the nonlinear basis functions ϕj are
adjusted during training.
Basic Neural Network Model

A basic FNN model can be described by a series of functional


transformations:

We have input x = (x1 , · · · , xD )⊤ and M linear combinations


in the form:
D
(1) (1)
X
aj = wji xi + wj0 for j = 1, · · · , M
i=1

The superscript (1) indicates parameters are in the first layer


of network, the parameters wji are referred to as weights, the
parameters wj0 are biases, with x0 = 1.
Two-layer Perceptron Model
Basic Neural Network Model

Recall from above:


D
(1) (1)
X
aj = wji xi + wj0 for j = 1, · · · , M
i=1

The quantities aj are known as activations which are the


inputs to activation functions.

The number of hidden units in a layer (M in this case) can be


regarded as the number of basis functions.

In neural networks, each basis function has parameters wji


which can be adjusted (learned through the training process).
Basic Neural Network Model

Each activation aj is transformed using differentiable


nonlinear activation functions h,

zj = h (aj ) .

So, for the nodes of the (first) hidden layer we have:


 
D 
X (1) (1) 
zj = h 
 wji xi + wj0   for j = 1, · · · , M
 i=1 
| {z }
linear model
| {z }
generalized linear model

This process is repeated for each pair of consecutive layers until we


reach the output layer.
Notes:
Activation Functions for Hidden Layers
Examples of activation functions for hidden layers:
1
Logistic sigmoid R → (0, 1): σ(a) = 1+e−a
ea −e−a
Hyperbolic tangent R → (−1, 1): tanh(a) = ea +e−a
Rectified Linear unit R → R+ : f (a) = max(0, a)

There are many choices of activation functions. We will later


discuss key properties.
Two-layer Perceptron Model

To give a brief start-to-finish picture, we will consider only a


2-layer perceptron (input layer + 1 hidden layer + output layer).
Second Layer

So, the second layer is the output layer.

The values zi (i = 1, · · · , M ) are linearly combined to give


output unit activations:
M
(2) (2)
X
ak = wki zi + wk0 for k = 1, .., K
i=1

where K is the total number of outputs.


This corresponds to the second layer of the network, and
again wk0 are bias parameters.
The output unit activations ak are transformed by using
appropriate activation function f to give network outputs yk .
Activation Functions for Output Layer

The choice of the activation function in the output layer is


determined by the task (e.g. regression, classification), the nature
of the data, the assumed distribution of the target variables, etc.

For standard regression problems the activation function is


usually the identity function so that yk = ak . Note the
number of output nodes K can be equal to 1.
For multiple binary classification problems, each output unit
activation is usually transformed using a logistic sigmoid
function so that yk = σ(ak ).
For multiclass problems, usually a softmax acivation function
is used.
Two-layer Perceptron Model

→ Forward propagation is the process where the input data is


passed through the network’s layers (i.e. evaluated) to generate an
output.

Putting the 2-layer perceptron model together. The forward


propagation is:
 
M D
!
(2) (1) (1) (2)
X X
yk (x, w) = f  wkj h wji xi + wj0 + wk0 
j=1 i=1

We can write this more generally for a MLP with L layers. Note
how this architecture is fully-connected.
Remarks

A few more remarks on FNN:


1 Multiple distinct choices for a weight vector w in FNN can
give rise to the same mapping function from inputs to outputs.
This property is called weight-space symmetry (Section 5.1.1).
2 FNN can be sparse with not all connections being present (i.e.
not fully-connected).
3 A convolutional neural network (CNN) is a special kind of
FNN with significant use in image and text processing.
Notes:

You might also like