Deep Learning Basics in Machine Learnning 1
Deep Learning Basics in Machine Learnning 1
learning frame work,review of fundamental learning techniques. Feed forward neural network:
Artificial Neural Network, activation function, multi-layer neural network
Deep Learning
Deep learning is based on the branch of machine learning, which is a subset of
artificial intelligence. Since neural networks imitate the human brain and so deep
learning will do. In deep learning, nothing is programmed explicitly. Basically, it is
a machine learning class that makes use of numerous nonlinear processing units so
as to perform feature extraction as well as transformation. The output from each
preceding layer is taken as input by each one of the successive layers.
Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful
in solving out the problem of dimensionality. Deep learning algorithmsare used,
especially when we have a huge no of inputs and outputs.
Since deep learning has been evolved by the machine learning, which itself is a
subset of artificial intelligence and as the idea behind the artificial intelligenceis to
mimic the human behavior, so same is "the idea of deep learning to build such
algorithm that can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the
ideabehindthemotivationofNeuralNetworkisthebiologicalneurons,whichisnothing
but a brain cell.
Deeplearningisacollectionofstatisticaltechniquesofmachinelearningfor
learningfeaturehierarchiesthatareactuallybasedonartificialneural networks.
So basically, deep learning is implemented by the help of deep networks, which are
nothing but neural networks with multiple hidden layers.
ExampleofDeep Learning
In the example given above, we provide the raw data of images to the first layer of
the input layer. After then, these input layer will determine the patterns of local
contrastthatmeansitwilldifferentiateonthebasisofcolors,luminosity,etc.Then
the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose,
and lips, etc.Andthen, itwill fixate those face features on thecorrectface template.
So, in the 2ndhidden layer, it will actually determine the correct face here as it can be
seen in the above image, after which it will be sent to the output layer. Likewise,
more hidden layers can be added to solve more complex problems, for example, if
you want to find out a particular kind of face having large or light complexions. So,
as and when the hidden layers increase, we are able to solve complex problems.
Architectures
o DeepNeuralNetworks
It is a neural network that incorporates the complexity of a certain level, which means
severalnumbersofhiddenlayersareencompassedinbetweentheinputandoutputlayers. They
are highly proficient on model and process non-linear associations.
o DeepBeliefNetworks
AdeepbeliefnetworkisaclassofDeepNeuralNetworkthatcomprisesofmulti-layer belief
networks.
StepstoperformDBN:
1. WiththehelpoftheContrastiveDivergencealgorithm,alayeroffeaturesis learned from
perceptible units.
2. Next,theformerlytrainedfeaturesaretreatedasvisibleunits,whichperform learning of
features.
3. Lastly, when the learning of the final hidden layer is accomplished, then the whole
DBN is trained.
o RecurrentNeuralNetworks
Itpermits parallelaswellassequentialcomputation,anditisexactlysimilartothatofthe human
brain (large feedback network of connected neurons). Since they are capable
enoughtoreminiscealloftheimperativethingsrelatedtotheinputtheyhavereceived,so they are
more precise.
TypesofDeepLearningNetworks
1. FeedForwardNeural Network
A feed-forward neural network is none other than an Artificial Neural Network,
which ensures that the nodes do not form a cycle. In this kind of neural network, all
theperceptronsare organized withinlayers, such that theinput layer takesthe input,
and the output layer generates the output. Since the hidden layers do not link with
the outside world, itis named as hidden layers. Each of the perceptrons contained in
one single layer is associated with each node in the subsequent layer. It can be
concludedthatallofthenodesarefullyconnected.Itdoesnotcontainanyvisibleor invisible
connection between the nodes in the same layer. There are no back-loopsin the feed-
forward network. To minimize the prediction error, the backpropagation algorithm
can be used to update the weight values.
Applications:
o DataCompression
o Pattern Recognition
o ComputerVision
o SonarTargetRecognition
o Speech Recognition
o HandwrittenCharactersRecognition
2. RecurrentNeuralNetwork
Recurrent neural networksare yet another variation of feed-forward networks. Here
each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one
must have knowledge about the words that were previously used. It not only
processes the inputs but also shares the length as well as weights crossways time. It
does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow
computational speed as well as it does not contemplate any future input for the
current state. It has a problem with reminiscing prior information.
Applications:
o MachineTranslation
o RobotControl
o TimeSeriesPrediction
o Speech Recognition
o Speech Synthesis
o TimeSeriesAnomalyDetection
o RhythmLearning
o MusicComposition
3. ConvolutionalNeuralNetwork
Convolutional Neural Networksare a special kind of neural network mainly usedfor
image classification, clustering of images and object recognition. DNNs enable
unsupervisedconstructionofhierarchicalimagerepresentations.Toachievethebest
accuracy, deep convolutional neural networks are preferred more than any other
neural network.
Applications:
o IdentifyFaces,StreetSigns,Tumors.
o Image Recognition.
o VideoAnalysis.
o NLP.
o Anomaly Detection.
o Drug Discovery.
o CheckersGame.
o TimeSeriesForecasting.
4. RestrictedBoltzmannMachine
RBMsare yet another variant of Boltzmann Machines. Here the neurons present in
the input layer and the hidden layer encompasses symmetric connections
amidthem.However,thereisnointernalassociationwithintherespectivelayer.Butin
contrasttoRBM,Boltzmannmachinesdoencompassinternalconnectionsinside the
hidden layer. These restrictions in BMs helps the model to train efficiently.
Applications:
o Filtering.
o Feature Learning.
o Classification.
o Risk Detection.
o Business and Economic analysis.
5. Autoencoders
An auto encoder neural network is another kind of unsupervised machine learning
algorithm. Here the number of hidden cells is merely small than that of the input
cells. But the number of input cells is equivalent to the number of output cells. An
auto encoder network is trained to display the output similar to the fed input to force
AEs to find common patterns and generalize the data. The auto encoders are mainly
used for the smaller representation of the input. It helps in the reconstruction of the
original data from compressed data. This algorithm is comparatively simple as it
only necessitates the output identical to the input.
o Encoder:Convert input data in lower dimensions.
o Decoder: Reconstruct the compresse ddata.
Applications:
o Classification.
o Clustering.
o Feature Compression.
ArtificialNeuralNetworks
Artificial Neural Networks contain artificial neurons which are called units. These
units are arranged in a series of layers that together constitute the whole Artificial
Neural Network in a system.A layer can have only a dozen units or millions of
units as this depends on how the complex neural networks will be required to learn
the hidden patterns in the dataset. Commonly, Artificial Neural Network has an
input layer, an output layer as well as hidden layers. The input layer receives data
from the outside world which the neural network needs to analyze or learn about.
Then this data passes through one or multiple hidden layers that transform the
input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of theArtificial Neural Networks to
input data provided.
In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of one
unit on another unit.As the data transfers from one unit to another, the neural
network learns more and more about the data which eventually results in an output
from the output layer.
The structures and operations of human neurons serve as the basis for artificial
neuralnetworks.Itisalsoknownasneuralnetworksorneuralnets. Theinput layer of an
artificial neural network is the first layer, and it receives input from external
sources and releases it to the hidden layer, which is the second layer. Inthe hidden
layer, each neuron receives input from the previous layer neurons, computes the
weighted sum, and sends it to the neurons in the next layer. These connections are
weighted means effects of the inputs from the previous layer are optimized more
or less by assigning different-different weights to each input and it is adjusted
during the training process by optimizing these weights for improved model
performance.
ArtificialneuronsvsBiologicalneurons
The concept of artificial neural networks comes from biological neurons found
inanimal brains So they share a lot of similarities in structure and function wise.
• Structure: The structure of artificial neural networks is inspired bybiological
neurons. A biological neuron has a cell body or soma to process the impulses,
dendrites to receive them, and an axon that transfers them to other neurons.The
input nodes of artificial neural networks receive input signals, the hidden layer
nodes compute these input signals, and the output layer nodes compute the final
output by processing the hidden layer’s results usingactivation functions.
BiologicalNeuron ArtificialNeuron
Dendrite Inputs
CellnucleusorSoma Nodes
Synapses Weights
Axon Output
• Synapses: Synapses are the links between biological neurons that enable the
transmission of impulses from dendrites to the cell body. Synapses are the
weights that join the one-layer nodes to the next-layer nodes in artificialneurons.
The strength of the links is determined by the weight value.
• Learning: In biological neurons, learning happensin the cell body nucleusor
soma, which has a nucleus that helps to process the impulses. An action
potential is produced and travels through the axons if the impulses are powerful
enough to reach the threshold. This becomes possible by synaptic plasticity,
which representsthe ability ofsynapsesto become strongerorweakerover time in
reaction to changes in their activity. In artificial neural networks,
backpropagation is a technique used for learning, which adjusts the weights
between nodes according to the error or differences between predicted andactual
outcomes.
BiologicalNeuron ArtificialNeuron
ActivationfunctionsinNeuralNetworks
ElementsofaNeuralNetwork
InputLayer: This layer accepts input features. It provides information from the
outsideworldtothenetwork,nocomputationisperformedatthislayer,nodes here just
pass on the information(features) to the hidden layer.
HiddenLayer: Nodesofthislayerarenotexposedtotheouterworld,theyare part of the
abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer andtransfers the result
to the output layer.
OutputLayer: This layer bring up the information learned by the network to the
outer world.
Whatisanactivationfunctionandwhyusethem?
The activation function decides whether a neuron should be activated or not by
calculating the weighted sum and further adding bias to it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in
correspondence with weight,bias, and their respective activation function. In a
neuralnetwork,wewouldupdatetheweightsandbiasesoftheneuronsonthe basis of the
error at the output. This process is known as back-propagation. Activation
functions make the back-propagation possible since the gradients are supplied along
with the error to update the weights and biases.
WhydoweneedNon-linearactivationfunction?
A neural network without an activation function is essentially just a linear
regression model. The activation function does the non-linear transformation to the
input making it capable to learn and perform more complex tasks.
Mathematicalproof
SupposewehaveaNeuralnetlikethis:-
Elementsofthediagramareasfollows:
Hiddenlayeri.e.layer1:
z(1)=W(1)X+b(1)a(1)
Here,
• z(1)isthevectorizedoutputoflayer1
•
W(1)bethevectorizedweightsassignedtoneuronsofhiddenlayeri.e.w1,w2,w3andw
4
• Xbethevectorizedinputfeaturesi.e.i1andi2
• bisthevectorizedbiasassignedtoneuronsinhiddenlayeri.e.b1andb2
• a(1)isthevectorizedformofanylinearfunction.
(Note:Wearenotconsideringactivationfunctionhere)
Layer2i.e.outputlayer:-
Note:Inputforlayer2isoutputfromlayer1z(2)=
W(2)a(1)+b(2)
a(2)=z(2)
CalculationatOutputlayer
z(2)=(W(2)*[W(1)X+b(1)])+b(2)
z(2)=[W(2)*W(1)]*X+[W(2)*b(1)+b(2)]
Let,
[W(2)*W(1)]=W
[W(2)*b(1)+b(2)]=b
Finaloutput:z(2)=W*X+bwhichi
sagainalinearfunction
This observation results again in a linear function even after applying a
hiddenlayer,hencewecanconcludethat,doesn’tmatterhowmanyhiddenlayerwe attach
in neural net, all layers will behave same way because
thecompositionoftwolinearfunctionisalinearfunctionitself. Neuron can not learn
with just a linearfunctionattachedtoit. A non-linearactivationfunctionwillletitlearnas
per the difference w.r.t error. Henceweneedanactivationfunction.
VariantsofActivationFunction
LinearFunction
• Equation:Linearfunctionhastheequationsimilartoasofastraightline
i.e.y=x
• No matter how many layers we have, if all are linear in nature, the final
activation function of last layer is nothing but just a linear function of the input
of first layer.
• Range:-infto+inf
• Uses:Linearactivationfunctionisusedatjustoneplacei.e.outputlayer.
• Issues: If we will differentiate linear function to bring non-linearity, result
will no more depend on input“x” and function will become constant, it won’t
introduce any ground-breaking behavior to our algorithm.
Forexample: Calculation of price of a house is a regression problem. House price
may have any big/small value, so we can apply linear activation at output layer.
Even in this case neural net must have any non-linear function at hidden layers.
SigmoidFunction
• Itisafunctionwhichisplottedas‘S’shapedgraph.
• Equation:A=1/(1+e ) -x
• The activation that works almost always better than sigmoid function is
Tanh function also known as TangentHyperbolicfunction. It’s actually
mathematically shifted version of the sigmoid function. Both are similar and
can be derived from each other.
• Equation:-
• ValueRange:--1 to+1
• Nature:-non-linear
• Uses:- Usually used in hidden layers of a neural network as it’s values lies
between -1to1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centeringthedata by bringing mean close to 0. This
makes learning for the next layer much easier.
RELUFunction
The softmax function is also a type of sigmoid function but is handy when we are
trying to handle multi- class classification problems.
• Nature:-non-linear
• Uses:- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
• Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define the
class of each input.
• The basic rule of thumb is if you really don’t know what activation function
to use, then simply use RELU as it is a general activation function in hidden
layers and is used in most cases these days.
• If your output is for binary classification then, sigmoidfunction is very
natural choice for output layer.
• If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
WhatisMultiLayerPerceptronNeuralNetwork?
Amultilayerperceptron(MLP)Neuralnetworkbelongstothefeedforwardneural
network. It is anArtificial Neural Network in which all nodes are interconnected
with nodes of different layers.
The word Perceptron was first defined by Frank Rosenblatt in his perceptron
program. Perceptron is a basic unit of an artificial neural network that defines the
artificial neuron in the neural network. It is a supervised learning algorithm that
containsnodes’values,activationfunctions,inputs,andnodeweightstocalculate the
output.
The Multilayer Perceptron (MLP) Neural Network works only in the forward
direction.Allnodesarefullyconnectedtothenetwork.Eachnodepassesitsvalue to the
coming node only in the forward direction.The MLPneural network uses a
Backpropagation algorithm to increase the accuracy of the training model.
MustRead:DeepLearningvsMachineLearning–
Concepts,Applications,andKeyDifferences
StructureofMultiLayerPerceptronNeuralNetwork
ThisnetwokhasthreemainlayersthatcombinetoformacompleteArtificialNeural Network.
These layers are as follows:
InputLayer
ItistheinitialorstartinglayeroftheMultilayerperceptron.Ittakesinputfromthe training
data set and forwards it to the hidden layer.There are n input nodes in the input
layer.Thenumber of input nodesdepends on thenumber of dataset features. Each
input vector variable is distributed to each of the nodes of the hidden layer.
MustExplore–
DataScienceCoursesHiddenLayer
ItistheheartofallArtificialneuralnetworks.Thislayercomprisesallcomputations of the
neural network.The edges of the hidden layer have weights multiplied by the node
values. This layer uses the activation function.
Therecanbeoneortwohidden layersinthemodel.
Severalhiddenlayernodesshouldbeaccurateasfewnodesinthehiddenlayer
makethemodelunabletoworkefficientlywithcomplexdata.Morenodeswill result
in an overfitting problem.
OutputLayer
ThislayergivestheestimatedoutputoftheNeuralNetwork.Thenumberofnodes in the
output layer depends on the type of problem. For a single targeted variable, use one
node. N classification problem,ANN uses N nodes in the output layer.
WorkingofMultiLayerPerceptronNeuralNetwork
• Theinputnoderepresentsthefeatureofthe dataset.
• Eachinputnodepassesthevectorinputvaluetothehiddenlayer.
• In the hidden layer, each edge has some weight multiplied by the input
variable.Alltheproductionvaluesfromthehiddennodesaresummedtogether. To
generate the output
• Theactivationfunctionisusedinthehiddenlayertoidentifytheactive nodes.
• Theoutputispassedtothe outputlayer.
• Calculatethedifferencebetweenpredictedandactualoutputattheoutput layer.
• Themodelusesbackpropagationaftercalculatingthepredicted output.
BackPropagationAlgorithm
ThebackpropagationalgorithmisusedinaMultilayerperceptronneuralnetworkto
increase the accuracy of the output by reducing the error in predicted output and
actual output.
Accordingtothis algorithm,
• AftercalculatingtheoutputfromtheMultilayerperceptronneuralnetwork,
calculate the error.
• This error is the difference between the output generated by the neural
networkandtheactualoutput.Thecalculatederrorisfedbacktothenetwork, from
the output layer to the hidden layer.
• Now,theoutputbecomestheinputtothenetwork.
• Themodelreduceserrorbyadjustingtheweightsinthehidden layer.
• Calculatethepredictedoutputwithadjustedweightandchecktheerror.The
process is recursively used till there is minimum or no error.
• This algorithm helps in increasing the accuracy of the neural network.
DifferenceBetweenMultilayerPerceptronNeuralNetworkandConventionalNe
uralNetwork
MultiLayerPerceptronNeural ConvolutionalNeuralNetwork
Network
AdvantagesofMultiLayerPerceptronNeuralNetwork
• MultiLayerPerceptronNeuralNetworkscaneasilyworkwithnon-linear
problems.
• Itcanhandlecomplexproblemswhiledealingwithlargedatasets.
• DevelopersusethismodeltodealwiththefitnessproblemofNeural
Networks.
• Ithasahigheraccuracyrateandreducespredictionerrorbyusing
backpropagation.
• Aftertrainingthemodel,theMultilayerPerceptronNeuralNetworkquickly
predicts the output.
DisadvantagesofMultiLayerPerceptronNeuralNetwork
• ThisNeuralNetworkconsistsoflargecomputation,whichsometimes
increases the overall cost of the model.
• Themodelwillperformwellonlywhenitistrained perfectly.
• Duetothismodel’stightconnections,thenumberofparametersandnode
redundancy increases.
Whyareneuralnetworks used?
Neuronalnetworkscantheoreticallyestimateanyfunction,regardlessof its
complexity.
Yet, supervised learning is a method of determining the correctYfor a fresh X by
learning a function that translates a given X into a specifiedY. But what are the
differences between neural networks and other methods of machine learning? The
answer is based on the Inductive Bias phenomenon, a psychological phenomenon.
Machine learning models are built on assumptions such as the one where X andY
arerelated.AnInductiveBiasoflinearregressionisthelinearrelationshipbetween X
andY. In this way, a line or hyperplane gets fitted to the data.
WhenXandYhaveacomplexrelationship,itcangetdifficultforaLinear
RegressionmethodtopredictY.Forthissituation,thecurvemustbemulti-
dimensional or approximate to the relationship.
Amanualadjustmentisneededsometimesbasedonthecomplexityofthefunction
andthenumberoflayerswithinthenetwork.Inmostcases,trialanderrormethods
combinedwithexperience getusedtoaccomplishingthis.Hence, thisisthereason these
parameters are called hyperparameters.
Whatisa feedforwardneuralnetwork?
Feedforwardneuralnetworksareartificialneuralnetworksinwhichnodesdonot form
loops. This type of neural network is also known as a multi-layer neural network as
all information is only passed forward.
Duringdataflow,inputnodesreceivedata,whichtravelthroughhiddenlayers,and exit
output nodes. No links exist in the network that could get used to by sending
information back from the output node.
Afeedforwardneuralnetworkapproximatesfunctionsinthefollowingway:
• Analgorithmcalculatesclassifiersbyusingtheformulay=f*(x).
• Inputxisthereforeassignedtocategoryy.
• Accordingtothefeedforwardmodel,y=f(x;θ).Thisvaluedeterminesthe closest
approximation of the function.
Feedforwardneuralnetworksserveasthebasisforobjectdetectioninphotos,as shown in
the Google Photos app.
Whatistheworkingprincipleofafeedforwardneural network?
When the feed forward neural network gets simplified, it can appear as a single
layer perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum.As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1,while
if it falls below the threshold, it is usually -1.
As a feedforward neuralnetwork model, the single-layer perceptronoftengets used
for classification. Machine learning can also get integrated into single-layer
perceptrons.Throughtraining,neuralnetworkscanadjusttheirweightsbasedona
property called the delta rule, which helps them compare their outputs with the
intended values.
As a result of training and learning, gradient descent occurs. Similarly, multi-
layered perceptrons update their weights. But, this process gets known as back-
propagation. If this is the case, the network's hidden layers will get adjusted
according to the output values produced by the final layer.
Layersoffeedforwardneural network
• Input layer:
The neurons of this layer receive input and pass it on to the other layers of the
network.Featureorattributenumbersinthedatasetmustmatchthenumberof
neurons in the input layer.
• Output layer:
Accordingtothetypeofmodelgettingbuilt,thislayerrepresentstheforecasted
feature.
• Hiddenlayer:
Inputandoutputlayersgetseparatedbyhiddenlayers.Dependingonthetypeof model,
there may be several hidden layers.
Thereare several neuronsin hidden layers that transform theinput beforeactually
transferringittothenextlayer.Thisnetworkgetsconstantlyupdatedwithweights in
order to make it easier to predict.
• Neuronweights:
Neuronsgetconnectedbyaweight,whichmeasurestheirstrengthormagnitude.
Similar to linear regression coefficients, input weights can also get compared.
Weightisnormallybetween0and1,withavaluebetween0and1.
• Neurons:
Artificialneuronsgetusedinfeedforwardnetworks,whichlatergetadaptedfrom biological
neurons.Aneural network consists of artificial neurons.
Neuronsfunctionintwoways:first,theycreateweightedinputsums,andsecond, they
activate the sums to make them normal.
Activationfunctionscaneitherbelinearornonlinear.Neuronshaveweightsbased on their
inputs. During the learning phase, the network studies these weights.
• ActivationFunction:
Neuronsareresponsibleformakingdecisionsinthis area.
According to the activation function, the neurons determine whether to make a
linearornonlineardecision.Sinceitpassesthroughsomanylayers,itpreventsthe
cascading effect from increasing neuron outputs.
Anactivationfunctioncanbeclassifiedintothreemajorcategories:sigmoid,Tanh, and
Rectified Linear Unit (ReLu).
• Sigmoid:
Inputvaluesbetween0and1getmapped tothe outputvalues.
• Tanh:
Avaluebetween-1and1getsmappedtotheinput values.
• Rectifiedlinear Unit:
Onlypositivevaluesareallowedtoflowthroughthisfunction.Negativevaluesget mapped to 0.
Functioninfeedforwardneural network
Costfunction
Inafeedforwardneuralnetwork,thecostfunctionplaysanimportantrole.The categorized
data points are little affected by minor adjustments to weights and biases.
Thus,asmoothcostfunctioncangetusedtodetermineamethodofadjusting weights and
biases to improve performance.
Followingisadefinitionofthemeansquareerrorcostfunction:
ImagesourceWhere,
w=theweightsgatheredinthenetwork b =
biases
n=numberofinputsfortraining a =
output vectors
x =input
‖v‖=vectorv'snormallength
Loss function
Thelossfunctionofaneuralnetworkgetsusedtodetermineifanadjustmentneeds to be
made in the learning process.
Neurons in the output layer are equal to the number of classes. Showing the
differencesbetweenpredictedandactualprobabilitydistributions.Followingisthe
cross-entropy loss for binary classification.
Image source
Asaresultofmulticlasscategorization,across-entropyloss occurs:
Gradientlearning algorithm
In the gradient descent algorithm, the next point gets calculated by scaling the
gradientatthecurrentpositionbyalearningrate.Thensubtractedfromthecurrent position
by the achieved value.
Todecreasethefunction,itsubtractsthevalue(toincrease,itwouldadd).Asan example,
here is how to write this procedure:
Thegradientgetsadjustedbytheparameterη,whichalsodeterminesthestepsize. Performance
is significantly affected by the learning rate in machine learning.
Output units
In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.
There is a close relationship between the choice of output units and the cost
function.Anyunitthatcanserveasahiddenunitcanalsoserveasanoutputunitin a neural
network.
AdvantagesoffeedforwardNeural Networks
• Machinelearningcanbeboostedwithfeedforwardneuralnetworks'
simplified architecture.
• Multi-networkinthefeedforwardnetworksoperateindependently,witha
moderated intermediary.
• Complextasksneedseveral neuronsinthenetwork.
• Neuralnetworkscanhandleandprocessnonlineardataeasilycomparedto
perceptrons and sigmoid neurons, which are otherwise complex.
• Aneuralnetworkdealswiththecomplicatedproblemofdecision boundaries.
• Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally well
inimageprocessing,whereasrecurrentneuralnetworks(RNNs)performwell in
text and voice processing.
• Neural networks need graphics processing units (GPUs) to handle large
datasetsformassivecomputationalandhardwareperformance.SeveralGPUs get
used widely in the market, including Kaggle Notebooks and Google Collab
Notebooks.
Applicationsoffeedforwardneural networks
Therearemanyapplicationsfortheseneuralnetworks.Thefollowingareafewof them.
Physiologicalfeedforwardsystem
Itispossibletoidentifyfeedforwardmanagementinthissituationbecausethe central
involuntary regulates the heartbeat before exercise.
Generegulationandfeedforward
Detectingnon-temporarychangestotheatmosphereisafunctionofthismotifasa feed
forward system.You can find the majority of this pattern in the illustrious networks.
Automationandmachinemanagement
Automationcontrolusingfeedforwardisoneofthedisciplinesinautomation.
Parallelfeedforwardcompensationwithderivative
Anopen-looptransferconvertsnon-minimumpartsystemsintominimumpart systems
using this technique.
Understandingthemathbehindneural networks
Typical deep learning algorithms are neural networks (NNs).As a result of their
unique structure, their popularity results from their 'deep' understanding of data.
Furthermore,NNsareflexibleintermsofcomplexityandstructure.Despiteallthe
advanced stuff, they can't work without the basic elements: they may work better
with the advanced stuff, but the underlying structure remains the same.
Let'sbegin.NNsgetconstructedsimilarlytoourbiologicalneurons,andthey resemble the
following:
Neuronsarehexagonsinthisimage.Inneuralnetworks,neuronsgetarrangedinto layers:
input is the first layer, and output is the last with the hidden layer in the middle.
NNconsistsoftwomainelementsthatcomputemathematicaloperations.Neurons
calculate weighted sums using input data and synaptic weights since neural
networks are just mathematical computations based on synaptic links.
Thefollowingisasimplifiedvisualization:
Inamatrixformat,itlooksasfollows:
Using the output value, we can calculate the result. Understanding these
fundamental concepts will make building NN much easier, and you will be amazed
at how quickly you can do it. Every layer's output becomes the following layer's
input.
Thearchitectureofthe network
Inanetwork,thearchitecturereferstothenumberofhiddenlayersandunitsineach layer
that make up the network.
Afeed forward network based on the UniversalApproximation Theorem must have a
"squashing" activation function at least on one hidden layer.
The network can approximate any Borel measurable function within a finite-
dimensional space with at least some amount of non-zero error when there are
enough hidden units.
It simply states that we can always represent any function using the multi-layer
perceptron (MLP), regardless of what function we try to learn.
Thus, we now know there will always be an MLPto solve our problem, but there is
no specific method for finding it.
It is impossible to say whether it will be possible to solve the given problem if we
use N layers with M hidden units.
Research is still ongoing, and for now, the only way to determine this configuration
is by experimenting with it.
While it is challenging to find the appropriate architecture, we need to try many
configurations before finding the one that can represent the target function.
Therearetwopossibleexplanationsforthis.Firstly,theoptimizationalgorithmmay not
find the correct parameters, and secondly, the training algorithms may use the
wrong function because of overfitting.
Whatisbackpropagationinfeedforwardneural network?
Backpropagationisatechniquebasedongradientdescent.Eachstageofagradient descent
process involves iteratively moving a function in the opposite direction of its
gradient (the slope).
The goal is to reduce the cost function given the training data while learning a
neural network. Network weights and biases of all neurons in each layer determine
the cost function. Backpropagation gets used to calculate the gradient of the cost
functioniteratively.Andthenupdateweightsandbiasesintheoppositedirectionto reduce
the gradient.
We must define the error of the backpropagation formula to specify i-th neuron in
thel-thlayerofanetworkforthej-thtraining.Exampleasfollows(inwhich represents
the weighted input to the neuron, and Lrepresents the loss.)
Inbackpropagationformulas,theerrorisdefinedas above:
Below is the full derivation of the formulas. For each formula below, Lstands for
theoutputlayer,gfortheactivationfunction,∇thegradient,W[l]Tlayerlweights
transposed.
Aproportionalactivationofneuroniatlayerlbased onblibiasfromlayeritolayer
i,wlikweightfromlayerltolayerl-1,andak[l−1](j)activationofneuronkatlayer l-1 for
training example j.
The first equation shows how to calculate the error at the output layer for sample j.
Following that, we can use the second equation to calculate the error in the layerjust
before the output layer.
Based on the error values for the next layer, the second equation can calculate the
error in any layer. Because this algorithm calculates errors backward, it is known as
backpropagation.
For sample j, we calculate the gradient of the loss function by taking the third and
fourth equations and dividing them by the biases and weights.
We can update biases and weights by averaging gradients of the loss function
relative to biases and weights for all samples using the average gradients.
The process is known as batch gradient descent. We will have to wait a long time if
we have too many samples.
Ifeachsamplehasagradient,itispossibletoupdatethebiases/weightsaccordingly. The
process is known as stochastic gradient descent.
Even though this algorithm is faster than batch gradient descent, it does not yield a
good estimate of the gradient calculated using a single sample.
It is possible to update biases and weights based on the average gradients
ofbatches.Itgetsreferredtoasmini-batchgradientdescentandgetspreferredoverthe other
two.
Endingnote
The field of deep learning is one of the most studied in software engineering.
Recurrent neural systems are commonly used for speech and content processing,
while convolutional neural systems are best for handling images.
When processing large datasets, neural networks require massive amounts of
computation power and equipment accelerators, which can be obtained byclustering
graphics processing units (GPUs).
IfyouarenewtoGPUs,youcandownloadandusefreecustomGPUsettingsonthe internet.
The most popular notebooks are Kaggle Notebooks and Google Collaborative
Notebooks.