The document provides an introduction to neural networks, focusing on their structure, types, and functionality, particularly Feedforward Neural Networks (FNN) and Multilayer Perceptrons (MLP). It discusses the limitations of simple linear models and compares the approaches of Support Vector Machines (SVM) and neural networks in adapting basis functions for large-scale problems. Additionally, it outlines the process of forward propagation in a two-layer perceptron model and the role of activation functions in both hidden and output layers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views22 pages
13 Nnbasics
The document provides an introduction to neural networks, focusing on their structure, types, and functionality, particularly Feedforward Neural Networks (FNN) and Multilayer Perceptrons (MLP). It discusses the limitations of simple linear models and compares the approaches of Support Vector Machines (SVM) and neural networks in adapting basis functions for large-scale problems. Additionally, it outlines the process of forward propagation in a two-layer perceptron model and the role of activation functions in both hidden and output layers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22
13 - Introduction to Neural Networks
UCLA Math156: Machine Learning
Instructor: Lara Kassab Neural Networks The origin of Neural Networks is inspired by information processing models of biological systems, in particular the human brain. Neural networks are also called Artificial Neural Networks (ANN) or Neural Nets (NN). They consist of connected artificial neurons called units or nodes which loosely model the neurons in a brain. Neural Networks Deep learning refers to training neural networks with multiple hidden layers. Feedforward Neural Network
A Feedforward Neural Network (FNN) is one of the two main types
of NNs. A FNN has a uni-directional flow of information between its layers.
The direction of the flow or connections from: input nodes →
(multiple) hidden nodes → output nodes is forward without any cycles or loops.
This is in contrast to Recurrent Neural Networks (RNN),
which have a bi-directional flow.
FNNs can be regression or classification models depending on
the activation function used in the output layer. Multilayer Perceptron A Multilayer Perceptron (MLP) is a FNN where all the nodes of the previous layer are connected to each input of the succeeding layer (except for the bias node). This architecture is called fully-connected. Review of Linear Models
Generalized linear models for regression and classification have the
form: M X −1 y(x, w) = f wj ϕj (x) j=0
The basis functions ϕj (x) are fixed nonlinear functions such
as Gaussian RBF, Sigmoidal functions, etc. For regression, f is usually the identity function. For classification, f is usually a nonlinear activation function such as logistic sigmoid or sign function. Simple Linear Models: Limitations
These (fixed) linear basis function models have limited practical
applicability on large-scale problems due to the curse of dimensionality.
The number of coefficients needed to adapt the basis
functions to the data grows with the number of features. To extend to large-scale problems we need to adapt the basis functions ϕj to the data. Both SVMs and neural networks address this limitation in different ways. SVM Approach
The number of basis functions in SVM is not pre-defined. SVM
varies the number of basis functions centered on training samples:
SVM selects a subset of these during training (support
vectors). This number depends on the characteristics of the data, choice of kernels, hyperparameters (e.g. regularization coefficient), etc. Although training involves nonlinear optimization, the objective function is convex. In SVM, the number of basis functions is much smaller than the number of training points, but it can still be large and grow with the size of the training set. Neural Networks Approach
Neural Networks fix the number of basis functions in advance, but
allow them to be adaptive:
The basis functions ϕj have their own parameters {wji }
which are adapted during training. Neural networks involve a non-convex optimization during training (many minima), but we get a more compact and faster model at the expense of training. Basic Neural Network Model
A neural network can also be represented similar to linear models
but the basis functions are generalized: M X −1 y(x, w) = f wj ϕj (x) j=0
There can be several activation functions f and the process is
repeated over and over. generalized model = nonlinear function ( linear model ) The parameters wj of the nonlinear basis functions ϕj are adjusted during training. Basic Neural Network Model
A basic FNN model can be described by a series of functional
transformations:
We have input x = (x1 , · · · , xD )⊤ and M linear combinations
in the form: D (1) (1) X aj = wji xi + wj0 for j = 1, · · · , M i=1
The superscript (1) indicates parameters are in the first layer
of network, the parameters wji are referred to as weights, the parameters wj0 are biases, with x0 = 1. Two-layer Perceptron Model Basic Neural Network Model
Recall from above:
D (1) (1) X aj = wji xi + wj0 for j = 1, · · · , M i=1
The quantities aj are known as activations which are the
inputs to activation functions.
The number of hidden units in a layer (M in this case) can be
regarded as the number of basis functions.
In neural networks, each basis function has parameters wji
which can be adjusted (learned through the training process). Basic Neural Network Model
Each activation aj is transformed using differentiable
nonlinear activation functions h,
zj = h (aj ) .
So, for the nodes of the (first) hidden layer we have:
D X (1) (1) zj = h wji xi + wj0 for j = 1, · · · , M i=1 | {z } linear model | {z } generalized linear model
This process is repeated for each pair of consecutive layers until we
reach the output layer. Notes: Activation Functions for Hidden Layers Examples of activation functions for hidden layers: 1 Logistic sigmoid R → (0, 1): σ(a) = 1+e−a ea −e−a Hyperbolic tangent R → (−1, 1): tanh(a) = ea +e−a Rectified Linear unit R → R+ : f (a) = max(0, a)
There are many choices of activation functions. We will later
discuss key properties. Two-layer Perceptron Model
To give a brief start-to-finish picture, we will consider only a
The values zi (i = 1, · · · , M ) are linearly combined to give
output unit activations: M (2) (2) X ak = wki zi + wk0 for k = 1, .., K i=1
where K is the total number of outputs.
This corresponds to the second layer of the network, and again wk0 are bias parameters. The output unit activations ak are transformed by using appropriate activation function f to give network outputs yk . Activation Functions for Output Layer
The choice of the activation function in the output layer is
determined by the task (e.g. regression, classification), the nature of the data, the assumed distribution of the target variables, etc.
For standard regression problems the activation function is
usually the identity function so that yk = ak . Note the number of output nodes K can be equal to 1. For multiple binary classification problems, each output unit activation is usually transformed using a logistic sigmoid function so that yk = σ(ak ). For multiclass problems, usually a softmax acivation function is used. Two-layer Perceptron Model
→ Forward propagation is the process where the input data is
passed through the network’s layers (i.e. evaluated) to generate an output.
Putting the 2-layer perceptron model together. The forward
propagation is: M D ! (2) (1) (1) (2) X X yk (x, w) = f wkj h wji xi + wj0 + wk0 j=1 i=1
We can write this more generally for a MLP with L layers. Note how this architecture is fully-connected. Remarks
A few more remarks on FNN:
1 Multiple distinct choices for a weight vector w in FNN can give rise to the same mapping function from inputs to outputs. This property is called weight-space symmetry (Section 5.1.1). 2 FNN can be sparse with not all connections being present (i.e. not fully-connected). 3 A convolutional neural network (CNN) is a special kind of FNN with significant use in image and text processing. Notes: