Soft Computing 2
Soft Computing 2
Neuralnetworks
NN Architecture Learning methods
Gradient Descent Hebbian Competitive Stochastic
Single layer FFN ADALINE (Adaptive
Linear Neuron
El t)
AM (Associative Memory
H fi ld
LVQ (Learning vector
quantization)
SOFM Element)
Hopfield
Perceptron
Hopfield SOFM
( self organizing feature
map
Multilayer FFN CCM (Cauchy
Machines
RBF(Radial Basis
Neo-cognition
RBF(Radial Basis
Function)
Recurrent networks RNN BAM( Bidirectional AM)
BSB(Brain state in a box)
ART( Adaptive resonance
theory)
Boltzmann and
Cauchy Machines
Hopfield
AdaptiveFilteringProblem
Unconstrainedoptimizationtechniques
Newtons
SteepestDescent
GaussNewtonMethod
SteepestDescent
LMSAlgorithm
TheLeastMeanSquare(LMS)algorithm,introducedbyWidrowandHoffin1959
is an adaptive algorithm, which uses a gradientbased method of steepest decent .
g
isanadaptivealgorithm,whichusesagradient basedmethodofsteepestdecent.
LMSalgorithmusestheestimatesofthegradientvectorfromtheavailabledata.
LMSincorporatesaniterativeprocedurethatmakessuccessivecorrectionstothe
i h i h di i f h i f h di hi h weightvectorinthedirectionofthenegativeofthegradientvectorwhich
eventuallyleadstotheminimummeansquareerror.
ComparedtootheralgorithmsLMSalgorithmisrelativelysimple;itdoesnot p g g y p ;
requirecorrelationfunctioncalculationnordoesitrequirematrixinversions.
LMSAlgorithm
Signal flow Signalflow
graph
representation
Solutionfollowstherandomtrajectoryhence
it is called stochastic gradient algorithm While itiscalledstochasticgradientalgorithm.While
steepestdescentfollowswelldefined
trajectory trajectory
LMSdoesnotrequiretheknowledgeofthe
f h statisticsoftheenvironment
Simpleandrobustasitismodelindependent
Slowrateofconvergence
Learningcurves
MultilayerNeuralNetwork(perceptrons)
MNN Characteristics MNNCharacteristics
Model of each neuron includes a nonlinear Modelofeachneuronincludesanonlinear
activationfunction
Hidden layers Hiddenlayers
Highlyconnected
Backpropagationalgorithm
Backpropagation is a common method of teaching artificial neural
t k h t f i t k It fi t d ib d b networks how to perform a given task. It was first described by
Arthur E. Bryson and YuChi Ho in 1969,
]
but it wasn't until 1974 and
later, through the work of Paul Werbos, David E. Rumelhart,
Geoffrey E. Hinton and Ronald J. Williams, that it gained y , g
recognition, and it led to a renaissance in the field of artificial
neural network research.
It is a supervised learning method, and is a generalization of the
delta rule. It requires a teacher that knows, or can calculate, the
desired output for any input in the training set. It is most useful for
feedforward networks (networks that have no feedback or simply feed forward networks (networks that have no feedback, or simply,
that have no connections that loop). The term is an abbreviation for
"backward propagation of errors". Backpropagation requires that
the activation function used by the artificial neurons (or "nodes")
be differentiable be differentiable.
Backpropagation networks are necessarily multilayer perceptrons
(usually with one input, one hidden, and one output layer). In order for
the hidden layer to serve any useful function, multilayer networks must
have nonlinear activation functions for the multiple layers: a multilayer
network using only linear activation functions is equivalent to some
single layer linear network Nonlinear activation functions that are single layer, linear network. Non linear activation functions that are
commonly used include the logistic function, the softmax function, and
the gaussian function.
When to use BP WhentouseBP
A large amount of input/output data is available, but A large amount of input/output data is available, but
you're not sure how to relate it to the output.
The problem appears to have overwhelming p pp g
complexity, but there is clearly a solution.
It is easy to create a number of examples of the y p
correct behavior.
The solution to the problem may change over time,
within the bounds of the given input and output
parameters (i.e., today 2+2=4, but in the future we
fi d th t 2 2 3 8) may find that 2+2=3.8).
Outputs can be "fuzzy", or nonnumeric.
Limitations Limitations
The convergence obtained from backpropagation g p p g
learning is very slow.
The convergence in backpropagation learning is
not guaranteed not guaranteed.
The result may generally converge to any local
minimum on the error surface, since stochastic minimum on the error surface, since stochastic
gradient descent exists on a surface which is not
flat.
B k i l i i i li Backpropagation learning requires input scaling
or normalization. Inputs are usually scaled into
the range of +0.1f to +0.9f for best performance. g p
TrainingaTwoLayerFeedforwardNetwork
1. Take the set of training patterns you wish the network to learn g p y
{ini p, outj p : i = 1 ninputs, j = 1 noutputs, p = 1 npatterns} .
2. Set up your network with ninputs input units fully connected to
nhidden nonlinear hidden units via connections with weights,
which in turn are fully
connected to noutputs output units via connections with weights connected to noutputs output units via connections with weights
3. Generate random initial weights, e.g. from the range [smwt,
+smwt]
4. Select an appropriate error function and learning rate .
5. Apply the weight update equation for each training pattern p.
One set of updates of all the weights for all the training One set of updates of all the weights for all the training
patterns is called one epoch of training.
6. Repeat step 5 until the network error function is small p p
enough.
The extension to networks with more hidden layers should be
b i
PracticalConsiderationsforBack
PropagationLearning
MostofthepracticalconsiderationsnecessaryforgeneralBackPropagation
learning
1.Doweneedtopreprocessthetrainingdata?Ifso,how?
2.Howdowechoosetheinitialweightsfromwhichwestartthetraining?
3.Howdowechooseanappropriatelearningrateh?
4 Should we change the weights after each training pattern or after the whole 4.Shouldwechangetheweightsaftereachtrainingpattern,orafterthewhole
set?
5.Aresomeactivation/transferfunctionsbetterthanothers?
6 How can we avoid flat spots in the error function? 6.Howcanweavoidflatspotsintheerrorfunction?
7.Howcanweavoidlocalminimaintheerrorfunction?
8.Howdoweknowwhenweshouldstopthetraining?
However,therearealsotwoimportantissues
9.Howmanyhiddenunitsdoweneed? y
10.Shouldwehavedifferentlearningratesforthedifferentlayers?
HowManyHiddenUnits?
The best number of hidden units depends in a complex way on many factors,
including:
1. The number of training patterns g p
2. The numbers of input and output units
3. The amount of noise in the training data
4 The complexity of the function or classification to be learned 4. The complexity of the function or classification to be learned
5. The type of hidden unit activation function
6. The training algorithm
Too few hidden units will generally leave high training and generalisation
errors due to underfitting. Too many hidden units will result in low
training errors, but will make the training unnecessarily slow, and will
l i li i l h h i ( h result in poor generalisation unless some other technique (such as
regularisation) is used to prevent overfitting.
Virtually all rules of thumb you hear about are actually nonsense. A
ibl i f b f hidd i d sensible strategy is to try a range of numbers of hidden units and see
which works best.
DifferentLearningRatesforDifferent
Layers? Layers?
A network as a whole will usually learn most efficiently if all its neurons are
learning at roughly the same speed. So maybe different parts of the
network should have different learning rates h. There are a number of
factors that may affect the choices:
1.Thelaternetworklayers(nearertheoutputs)willtendtohavelargerlocal y ( p ) g
gradients(deltas)thantheearlierlayers(nearertheinputs).
2.Theactivationsofunitswithmanyconnectionsfeedingintooroutofthem
tend to change faster than units with fewer connections tendtochangefasterthanunitswithfewerconnections.
3.ActivationsrequiredforlinearunitswillbedifferentforSigmoidalunits.
4.Thereisempiricalevidencethatithelpstohavedifferentlearningratesh
for the thresholds/biases compared with the real connection weights forthethresholds/biasescomparedwiththerealconnectionweights.
Inpractice,itisoftenquickertojustusethesamerateshforalltheweights
andthresholds,ratherthanspendingtimetryingtoworkoutappropriate
differences A very powerful approach is to use evolutionary strategies to differences.Averypowerfulapproachistouseevolutionarystrategiesto
determinegoodlearningrates.
NNArchitecture
HopfieldNetwork
KohonenSelfOrganizingMap
RadialBasisFunctionNetwork
ART(Adaptiveresonancetheory
ADALINE (Adaptive Linear Neuron Element) ADALINE(AdaptiveLinearNeuronElement)
BSB(BrainstateinaboxModel)
MarkovChains
Helmholtzmachines
Boltzmannmachine
Simulatedannealing
KalmanFilters
Saptio TemporalModelsofaneuron
Bellmantheorem
KullbackLeiblerDivergence
Planning Planning
Expansion,Generation,Transmission
i ib i S l Distribution,Structural
ReactivePower
Reliability
l t plantGenerationscheduling,Economicdispatch,OPF,Unitcommitment,
Reactivepowerdispatch,Voltagecontrol,Securityassessment,Static,
Dynamic,Maintenancescheduling,Contractmanagement
Equipmentmonitoring,
SystemLoadforecasting,Loadmanagement,Alarmprocessing/Fault,
di i S i t ti N t k it hi C ti l i diagnosis,Servicerestoration,Networkswitching,Contingencyanalysis,
FACTs,Stateestimation
Analysis/Modeling,Powerflow,Harmonics,Transientstability,Dynamic
t bilit C t l d i Si l ti / t P t ti stability,Controldesign,Simulation/operators,Protection