0% found this document useful (0 votes)
36 views

Deep Learning Basics in Machine Learnning 1

Uploaded by

sirishaksnlp
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Deep Learning Basics in Machine Learnning 1

Uploaded by

sirishaksnlp
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

UNIT-I:Introduction: Various paradigms of learning problems,Perspectives and Issues in deep

learning frame work,review of fundamental learning techniques. Feed forward neural network:
Artificial Neural Network, activation function, multi-layer neural network

Deep Learning
Deep learning is based on the branch of machine learning, which is a subset of
artificial intelligence. Since neural networks imitate the human brain and so deep
learning will do. In deep learning, nothing is programmed explicitly. Basically, it is
a machine learning class that makes use of numerous nonlinear processing units so
as to perform feature extraction as well as transformation. The output from each
preceding layer is taken as input by each one of the successive layers.
Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful
in solving out the problem of dimensionality. Deep learning algorithmsare used,
especially when we have a huge no of inputs and outputs.
Since deep learning has been evolved by the machine learning, which itself is a
subset of artificial intelligence and as the idea behind the artificial intelligenceis to
mimic the human behavior, so same is "the idea of deep learning to build such
algorithm that can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the
ideabehindthemotivationofNeuralNetworkisthebiologicalneurons,whichisnothing
but a brain cell.
Deeplearningisacollectionofstatisticaltechniquesofmachinelearningfor
learningfeaturehierarchiesthatareactuallybasedonartificialneural networks.
So basically, deep learning is implemented by the help of deep networks, which are
nothing but neural networks with multiple hidden layers.
ExampleofDeep Learning

In the example given above, we provide the raw data of images to the first layer of
the input layer. After then, these input layer will determine the patterns of local
contrastthatmeansitwilldifferentiateonthebasisofcolors,luminosity,etc.Then
the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose,
and lips, etc.Andthen, itwill fixate those face features on thecorrectface template.
So, in the 2ndhidden layer, it will actually determine the correct face here as it can be
seen in the above image, after which it will be sent to the output layer. Likewise,
more hidden layers can be added to solve more complex problems, for example, if
you want to find out a particular kind of face having large or light complexions. So,
as and when the hidden layers increase, we are able to solve complex problems.
Architectures
o DeepNeuralNetworks
It is a neural network that incorporates the complexity of a certain level, which means
severalnumbersofhiddenlayersareencompassedinbetweentheinputandoutputlayers. They
are highly proficient on model and process non-linear associations.
o DeepBeliefNetworks
AdeepbeliefnetworkisaclassofDeepNeuralNetworkthatcomprisesofmulti-layer belief
networks.
StepstoperformDBN:
1. WiththehelpoftheContrastiveDivergencealgorithm,alayeroffeaturesis learned from
perceptible units.
2. Next,theformerlytrainedfeaturesaretreatedasvisibleunits,whichperform learning of
features.
3. Lastly, when the learning of the final hidden layer is accomplished, then the whole
DBN is trained.
o RecurrentNeuralNetworks
Itpermits parallelaswellassequentialcomputation,anditisexactlysimilartothatofthe human
brain (large feedback network of connected neurons). Since they are capable
enoughtoreminiscealloftheimperativethingsrelatedtotheinputtheyhavereceived,so they are
more precise.
TypesofDeepLearningNetworks
1. FeedForwardNeural Network
A feed-forward neural network is none other than an Artificial Neural Network,
which ensures that the nodes do not form a cycle. In this kind of neural network, all
theperceptronsare organized withinlayers, such that theinput layer takesthe input,
and the output layer generates the output. Since the hidden layers do not link with
the outside world, itis named as hidden layers. Each of the perceptrons contained in
one single layer is associated with each node in the subsequent layer. It can be
concludedthatallofthenodesarefullyconnected.Itdoesnotcontainanyvisibleor invisible
connection between the nodes in the same layer. There are no back-loopsin the feed-
forward network. To minimize the prediction error, the backpropagation algorithm
can be used to update the weight values.
Applications:
o DataCompression
o Pattern Recognition
o ComputerVision
o SonarTargetRecognition
o Speech Recognition
o HandwrittenCharactersRecognition
2. RecurrentNeuralNetwork
Recurrent neural networksare yet another variation of feed-forward networks. Here
each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one
must have knowledge about the words that were previously used. It not only
processes the inputs but also shares the length as well as weights crossways time. It
does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow
computational speed as well as it does not contemplate any future input for the
current state. It has a problem with reminiscing prior information.
Applications:
o MachineTranslation
o RobotControl
o TimeSeriesPrediction
o Speech Recognition
o Speech Synthesis
o TimeSeriesAnomalyDetection
o RhythmLearning
o MusicComposition
3. ConvolutionalNeuralNetwork
Convolutional Neural Networksare a special kind of neural network mainly usedfor
image classification, clustering of images and object recognition. DNNs enable
unsupervisedconstructionofhierarchicalimagerepresentations.Toachievethebest
accuracy, deep convolutional neural networks are preferred more than any other
neural network.
Applications:
o IdentifyFaces,StreetSigns,Tumors.
o Image Recognition.
o VideoAnalysis.
o NLP.
o Anomaly Detection.
o Drug Discovery.
o CheckersGame.
o TimeSeriesForecasting.
4. RestrictedBoltzmannMachine
RBMsare yet another variant of Boltzmann Machines. Here the neurons present in
the input layer and the hidden layer encompasses symmetric connections
amidthem.However,thereisnointernalassociationwithintherespectivelayer.Butin
contrasttoRBM,Boltzmannmachinesdoencompassinternalconnectionsinside the
hidden layer. These restrictions in BMs helps the model to train efficiently.
Applications:
o Filtering.
o Feature Learning.
o Classification.
o Risk Detection.
o Business and Economic analysis.
5. Autoencoders
An auto encoder neural network is another kind of unsupervised machine learning
algorithm. Here the number of hidden cells is merely small than that of the input
cells. But the number of input cells is equivalent to the number of output cells. An
auto encoder network is trained to display the output similar to the fed input to force
AEs to find common patterns and generalize the data. The auto encoders are mainly
used for the smaller representation of the input. It helps in the reconstruction of the
original data from compressed data. This algorithm is comparatively simple as it
only necessitates the output identical to the input.
o Encoder:Convert input data in lower dimensions.
o Decoder: Reconstruct the compresse ddata.
Applications:
o Classification.
o Clustering.
o Feature Compression.

Deep learning applications


o Self-DrivingCars
Inself-driven cars,it isabletocapturetheimages arounditbyprocessingahugeamount
ofdata,andthenitwilldecidewhichactionsshouldbeincorporatedtotakealeftorright or should
it stop. So, accordingly, it will decide what actions it should take, which will further
reduce the accidents that happen every year.
o VoiceControlledAssistance
Whenwetalkaboutvoicecontrolassistance,thenSiriistheonethingthatcomesintoour mind.
So, you can tell Siri whatever you want it to do it for you, and it will search it for you and
display it for you.
o AutomaticImageCaptionGeneration
Whatever image that you upload, the algorithm will work in such a way that it will
generatecaptionaccordingly.Ifyousaybluecoloredeye,itwilldisplayablue-coloredeye with a
caption at the bottom of the image.
o AutomaticMachineTranslation
Withthehelpofautomaticmachinetranslation,weareabletoconvertonelanguageinto another
with the help of deep learning.
Limitations
o Itonlylearnsthroughthe observations.
o Itcomprisesofbiases issues.
Advantages
o Itlessenstheneedforfeature engineering.
o Iteradicatesallthosecoststhatare needless.
o Iteasilyidentifiesdifficultdefects.
o Itresultsinthebest-in-classperformanceon problems.
Disadvantages
o Itrequiresanampleamountofdata.
o Itisquiteexpensiveto train.
o Itdoesnothave strongtheoreticalgroundwork.
perspectivesandIssuesindeeplearningframework
Deep learning is a subfield of machine learning that focuses on artificial neural
networks, particularly deep neural networks. While deep learning has achieved
remarkablesuccessinvariousdomains,italsocomeswitharangeofperspectives and
issues. Here are some key perspectives and issues in the deep learning framework:
Perspectives:
1. PowerfulRepresentationLearning: Deep learning excels at learning
hierarchicalrepresentationsfromrawdata.Thisabilitytoautomaticallydiscover
features makes it well-suited for tasks like image recognition, natural language
processing, and speech recognition.
2. Scalability: Deep learning models can scale with more data and
computationalresources.Thisscalabilityhasallowedresearcherstotrainlargerand more
complex models, which has led to state-of-the-art results in various domains.
3. TransferLearning: Pre-trained deep learning models can be fine-tuned for
specifictasks,enablingthetransferofknowledgefromonedomaintoanother.This has
reduced the need for massive datasets and computational resources for every new
task.
4. Interpretability: Researchers are actively working on improving the
interpretabilityofdeeplearningmodels.Techniqueslikeattentionmechanismsand
visualization tools help users understand how models make decisions.
5. WideRangeofApplications: Deep learning has found applications in a
widerangeoffields,includinghealthcare,finance,autonomousvehicles,robotics, and
more, making it a versatile technology.
Issues:
1. DataRequirements: Deep learning models often require large amounts of
labeleddatafortraining.Acquiringandannotatingsuchdatasetscanbeexpensive and
time-consuming, particularly in domains with limited data availability.
2. Overfitting: Deeplearningmodelsarepronetooverfitting,especiallywhen
thedatasetissmallornoisy.Regularizationtechniquesareusedtomitigatethis issue, but it
remains a significant concern.
3. ComputationandResourceDemands:Trainingdeepneuralnetworkscan be
computationally intensive and require specialized hardware, such as GPUs or
TPUs. This limits the accessibility of deep learning to researchers with limited
resources.
4. EthicalConcerns: Deeplearningmodelscaninheritbiasespresentinthe
training data, leading to biased or unfair predictions. Ensuring fairness and
addressing bias is an ongoing challenge in deep learning research.
5. Interpretability: Despite progress, deep learning models are
oftenconsidered"blackboxes."Understandingwhyamodelmakesaparticularprediction
remains a challenge, especially for complex models like deep neural networks.
6. AdversarialAttacks: Deeplearningmodelscanbevulnerabletoadversarial
attacks, where small, carefully crafted perturbations to input data can lead to
incorrect predictions. Developing robust models against such attacks is an active
area of research.
7. Generalization:Achieving good generalization to unseen data is a key
challenge.Whiledeeplearningmodelsmayperformwellonthetrainingdata,
ensuring their performance on real-world, out-of-distribution data is crucial.
8. EnvironmentalConcerns: The energy consumption associated with training
largedeeplearningmodelshasraisedenvironmentalconcerns.Effortsareunderway to
develop more energy-efficient training methods.
9. RegulatoryandLegalChallenges: The use of deep learning in critical
applicationslikehealthcareandautonomousvehicleshasraisedregulatoryandlegal
questions about safety, liability, and accountability.
10. Reproducibility:Reproducingandreplicatingresearchresultsindeep
learningcanbechallengingduetovariationsinhardware,software,anddata. Ensuring
reproducibility is a concern for the scientific community.
Inconclusion,deeplearninghasmadesignificantstridesinvariousdomains,butit is not
without its challenges. Researchers continue to work on addressing these
issuestomakedeeplearningmoreaccessible,reliable,andinterpretableforawide
rangeof applications.
reviewoffundamentallearningtechniquesindeeplearning
Fundamentallearningtechniquesindeeplearningarethefoundational
methodsandprinciplesthatformthebasisfortrainingandfine-tuning
deepneuralnetworks.Thesetechniquesareessentialforachieving optimal
performanceandrobustnessinvariousdeeplearningapplications.Hereisareview of
some of the fundamental learning techniques in deep learning:
1. GradientDescentandOptimizationAlgorithms:
• GradientDescent: Gradient descent is the fundamental optimization
technique used to update the parameters of neural networks by minimizing a loss
function. It calculates the gradient of the loss with respect to the model's
parametersandupdatesthemintheoppositedirectionofthegradienttominimize the
loss.
• StochasticGradientDescent(SGD): SGD optimizes the model's
parameters using a random subset (mini-batch) of the training data at each
iteration,makingitcomputationallyefficientandcapableofescapinglocal
minima.
• Adam,RMSprop,andAdagrad: These are popular optimization
algorithmsthatadaptivelyadjustthelearningratesforeachparameter,leadingto faster
convergence in many cases.
2. Backpropagation:
• Backpropagation is the algorithm used for computing gradients in
neuralnetworks.Itefficiently calculatesthegradientsofthelosswithrespect
toeachlayer'sparametersbyapplyingthechainruleofcalculus,allowingfor
efficient weight updates during training.
3. ActivationFunctions:
• Activation functions introduce non-linearity into neural networks,
allowingthemtomodelcomplexrelationshipsindata.Commonactivation
functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.
4. WeightInitialization:
• Proper weight initialization is crucial for training deep networks.
TechniqueslikeXavier/GlorotinitializationandHeinitializationhelpprevent
issues like vanishing or exploding gradients during training.
5. RegularizationTechniques:
• Regularization methods, such as L1 and L2 regularization, dropout,and
batch normalization, are used to prevent overfitting and improve model
generalization.Dropoutrandomlydropsneuronsduringtrainingtoreduceco-
dependencies among them, while batch normalization normalizes activations
within each mini-batch.
6. LearningRateScheduling:
• Adjustingthelearningrateduringtrainingcanhelpconvergetobetter
solutions. Techniques like learning rate annealing and learning rate decay
gradually reduce the learning rate as training progresses.
7. LossFunctions:
• Choosing an appropriate loss function depends on the nature of the
task. Common loss functions include Mean Squared Error (MSE) for
regression,Cross-Entropyforclassification,andcustomlossfunctionsfor
specialized tasks.
8. BatchTrainingvs.Mini-BatchTraining:
• Training on entire datasets (batch training) can be computationally
expensive.Mini-batchtrainingismorecommonlyused,wherethedatasetis
divided into smaller batches for more efficient gradient updates.
9. EarlyStopping:
• Earlystoppingisaregularizationtechniquethatmonitorsthemodel's
performance on a validation dataset and stops training when performance
starts to degrade, preventing overfitting.
10. HyperparameterTuning:
• Fine-tuning hyperparameters, such as the learning rate, batch size, and
networkarchitecture,isacriticalaspectofdeeplearning.Techniqueslikegrid
search and random search are used to find optimal hyperparameter values.
11. TransferLearning:
• Transferlearningleveragespre-trainedmodelsonlargedatasetsand
fine-tunes them for specific tasks.This approach saves time and resources
while often achieving competitive results.
12. DataAugmentation:
• Data augmentation involves applying random transformations to
trainingdata,suchasrotations,flips,andcropping,toincreasetheeffective size
of the training dataset and improve model generalization.
13. EnsembleMethods:
• Combiningthepredictionsofmultipleneuralnetworks(e.g.,bagging,
boosting) can lead to improved performance and model robustness.
These fundamental learning techniques in deep learning provide a solid foundation
for building and training deep neural networks for a wide range of tasks. Proper
understandingandapplicationofthesetechniquesarecrucialforachievingstate-of- the-
art results and addressing various challenges in the field of deep learning

ArtificialNeuralNetworks
Artificial Neural Networks contain artificial neurons which are called units. These
units are arranged in a series of layers that together constitute the whole Artificial
Neural Network in a system.A layer can have only a dozen units or millions of
units as this depends on how the complex neural networks will be required to learn
the hidden patterns in the dataset. Commonly, Artificial Neural Network has an
input layer, an output layer as well as hidden layers. The input layer receives data
from the outside world which the neural network needs to analyze or learn about.
Then this data passes through one or multiple hidden layers that transform the
input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of theArtificial Neural Networks to
input data provided.
In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of one
unit on another unit.As the data transfers from one unit to another, the neural
network learns more and more about the data which eventually results in an output
from the output layer.

The structures and operations of human neurons serve as the basis for artificial
neuralnetworks.Itisalsoknownasneuralnetworksorneuralnets. Theinput layer of an
artificial neural network is the first layer, and it receives input from external
sources and releases it to the hidden layer, which is the second layer. Inthe hidden
layer, each neuron receives input from the previous layer neurons, computes the
weighted sum, and sends it to the neurons in the next layer. These connections are
weighted means effects of the inputs from the previous layer are optimized more
or less by assigning different-different weights to each input and it is adjusted
during the training process by optimizing these weights for improved model
performance.
ArtificialneuronsvsBiologicalneurons
The concept of artificial neural networks comes from biological neurons found
inanimal brains So they share a lot of similarities in structure and function wise.
• Structure: The structure of artificial neural networks is inspired bybiological
neurons. A biological neuron has a cell body or soma to process the impulses,
dendrites to receive them, and an axon that transfers them to other neurons.The
input nodes of artificial neural networks receive input signals, the hidden layer
nodes compute these input signals, and the output layer nodes compute the final
output by processing the hidden layer’s results usingactivation functions.
BiologicalNeuron ArtificialNeuron

Dendrite Inputs

CellnucleusorSoma Nodes

Synapses Weights

Axon Output

• Synapses: Synapses are the links between biological neurons that enable the
transmission of impulses from dendrites to the cell body. Synapses are the
weights that join the one-layer nodes to the next-layer nodes in artificialneurons.
The strength of the links is determined by the weight value.
• Learning: In biological neurons, learning happensin the cell body nucleusor
soma, which has a nucleus that helps to process the impulses. An action
potential is produced and travels through the axons if the impulses are powerful
enough to reach the threshold. This becomes possible by synaptic plasticity,
which representsthe ability ofsynapsesto become strongerorweakerover time in
reaction to changes in their activity. In artificial neural networks,
backpropagation is a technique used for learning, which adjusts the weights
between nodes according to the error or differences between predicted andactual
outcomes.
BiologicalNeuron ArtificialNeuron

Synaptic plasticity Backpropagations

• Activation: In biological neurons, activation is the firing rate of the neuron


which happens when the impulses are strong enough to reach the threshold. In
artificial neural networks, A mathematical function known as an activation
function maps the input to the output, and executes activations.
HowdoArtificialNeuralNetworkslearn?
Artificial neural networks are trained using a training set. For example, supposeyou
want to teach an ANN to recognize a cat. Then it is shown thousands of different
images of cats so that the network can learn to identify a cat. Once the neural
network has been trained enough using images of cats, then you need to check if it
can identify cat images correctly. This is done by making the ANN classify the
images it is provided by deciding whether they are cat images or not. The output
obtained by the ANN is corroborated by a human-provided description
ofwhethertheimageisacatimageornot.IftheANNidentifiesincorrectly then back-
propagationisusedtoadjustwhateverithaslearnedduringtraining. Backpropagationis
done by fine-tuning the weights of the connections in ANN units based on the error
rate obtained. This process continues until the artificial neural network can correctly
recognize a cat in an image with minimal possible error rates.
WhatarethetypesofArtificialNeuralNetworks?
• FeedforwardNeuralNetwork: The feedforward neural network is one of the
most basic artificial neural networks. In this ANN, the data or the input provided
travels in a single direction. It enters into the ANN through the input layer and
exits through the output layer while hidden layers may or may not exist. So the
feedforward neural network has a front-propagated wave only and usually does
not have backpropagation.
• ConvolutionalNeuralNetwork: A Convolutional neural network has some
similarities to the feed-forward neural network, where the connections between
units have weights that determine the influence of one unit on another unit. Buta
CNN has one or more than one convolutional layer that uses a convolution
operation on the input and then passes the result obtained in the form of outputto
the next layer.CNN has applications in speech andimageprocessingwhichis
particularly useful in computer vision.
• ModularNeuralNetwork: A ModularNeuralNetworkcontainsa collection of
different neural networks that work independently towards obtaining the output
with no interaction between them. Each of the different neural networks
performs a different sub-task by obtaining unique inputs
comparedtoothernetworks.Theadvantageofthismodularneuralnetworkis
that it breaks down a large and complex computational process into smaller
components, thus decreasing its complexity while still obtaining the required
output.
• RadialbasisfunctionNeuralNetwork: Radial basis functions are those
functions that consider the distance of a point concerning the center. RBF
functions have two layers. In the first layer, the input is mapped into all the
Radial basis functions in the hidden layer and then the output layercomputesthe
output in the next step. Radial basis function nets are normally used tomodel the
data that represents any underlying trend or function.
• RecurrentNeuralNetwork:The Recurrent Neural Network saves theoutput
of a layer and feeds this output back to the input to better predict the outcome of
the layer. The first layer in the RNN is quite similar to the feed- forward neural
network and the recurrent neural network starts once the outputof the first layer
is computed. After this layer, each unit will remember some information from
the previous step so that it can act as a memory cell in performing computations.
ApplicationsofArtificialNeuralNetworks
1. SocialMedia: Artificial Neural Networks are used heavily in Social Media.
For example, let’s take the ‘Peopleyoumayknow’ feature on Facebook that
suggests people that you might know in real life so that you can send themfriend
requests. Well, this magical effect is achieved by using Artificial Neural
Networksthatanalyzeyourprofile,yourinterests,yourcurrentfriends,and also their
friends and various other factors to calculate the people you might potentially
know. Another common application of Machine Learningin social mediais
facialrecognition.Thisisdonebyfindingaround100reference points on the
person’s face and then matching them with those already available in the
database using convolutional neural networks.
2. MarketingandSales: When you log onto E-commerce sites like Amazon and
Flipkart, they will recommend your products to buy based on your previous
browsing history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc.
will show you restaurant recommendations based on your tastes and previous
order history. This is true across all new-age marketing segments like
Booksites,Movieservices,Hospitalitysites,etc.anditisdoneby implementing
personalizedmarketing.Thisuses ArtificialNeuralNetworks to identify the
customer likes, dislikes, previous shopping history, etc., and then tailor the
marketing campaigns accordingly.
3. Healthcare: Artificial Neural Networks are used in Oncology to train
algorithms that can identify cancerous tissue at the microscopic level at thesame
accuracy as trained physicians. Various rare diseases may manifest in
physicalcharacteristicsandcanbeidentifiedintheirprematurestagesby
usingFacialAnalysisonthepatientphotos.Sothefull-scaleimplementation
of Artificial Neural Networks in the healthcare environment can only enhance
the diagnostic abilities of medical experts and ultimately lead to the overall
improvement in the quality of medical care all over the world.
4. PersonalAssistants: I am sure you all have heard of Siri, Alexa, Cortana,
etc., and also heard them based on the phones you have!!! These are personal
assistants and an example of speech recognition that uses
NaturalLanguageProcessing to interact with the users and formulate a response
accordingly. Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as managing the
language syntax, semantics, correct speech, the conversation that is going on,
etc.

ActivationfunctionsinNeuralNetworks
ElementsofaNeuralNetwork
InputLayer: This layer accepts input features. It provides information from the
outsideworldtothenetwork,nocomputationisperformedatthislayer,nodes here just
pass on the information(features) to the hidden layer.
HiddenLayer: Nodesofthislayerarenotexposedtotheouterworld,theyare part of the
abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer andtransfers the result
to the output layer.
OutputLayer: This layer bring up the information learned by the network to the
outer world.
Whatisanactivationfunctionandwhyusethem?
The activation function decides whether a neuron should be activated or not by
calculating the weighted sum and further adding bias to it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in
correspondence with weight,bias, and their respective activation function. In a
neuralnetwork,wewouldupdatetheweightsandbiasesoftheneuronsonthe basis of the
error at the output. This process is known as back-propagation. Activation
functions make the back-propagation possible since the gradients are supplied along
with the error to update the weights and biases.
WhydoweneedNon-linearactivationfunction?
A neural network without an activation function is essentially just a linear
regression model. The activation function does the non-linear transformation to the
input making it capable to learn and perform more complex tasks.
Mathematicalproof
SupposewehaveaNeuralnetlikethis:-
Elementsofthediagramareasfollows:
Hiddenlayeri.e.layer1:
z(1)=W(1)X+b(1)a(1)
Here,
• z(1)isthevectorizedoutputoflayer1

W(1)bethevectorizedweightsassignedtoneuronsofhiddenlayeri.e.w1,w2,w3andw
4
• Xbethevectorizedinputfeaturesi.e.i1andi2
• bisthevectorizedbiasassignedtoneuronsinhiddenlayeri.e.b1andb2
• a(1)isthevectorizedformofanylinearfunction.
(Note:Wearenotconsideringactivationfunctionhere)

Layer2i.e.outputlayer:-
Note:Inputforlayer2isoutputfromlayer1z(2)=
W(2)a(1)+b(2)
a(2)=z(2)
CalculationatOutputlayer
z(2)=(W(2)*[W(1)X+b(1)])+b(2)
z(2)=[W(2)*W(1)]*X+[W(2)*b(1)+b(2)]
Let,
[W(2)*W(1)]=W
[W(2)*b(1)+b(2)]=b
Finaloutput:z(2)=W*X+bwhichi
sagainalinearfunction
This observation results again in a linear function even after applying a
hiddenlayer,hencewecanconcludethat,doesn’tmatterhowmanyhiddenlayerwe attach
in neural net, all layers will behave same way because
thecompositionoftwolinearfunctionisalinearfunctionitself. Neuron can not learn
with just a linearfunctionattachedtoit. A non-linearactivationfunctionwillletitlearnas
per the difference w.r.t error. Henceweneedanactivationfunction.

VariantsofActivationFunction
LinearFunction
• Equation:Linearfunctionhastheequationsimilartoasofastraightline
i.e.y=x
• No matter how many layers we have, if all are linear in nature, the final
activation function of last layer is nothing but just a linear function of the input
of first layer.
• Range:-infto+inf
• Uses:Linearactivationfunctionisusedatjustoneplacei.e.outputlayer.
• Issues: If we will differentiate linear function to bring non-linearity, result
will no more depend on input“x” and function will become constant, it won’t
introduce any ground-breaking behavior to our algorithm.
Forexample: Calculation of price of a house is a regression problem. House price
may have any big/small value, so we can apply linear activation at output layer.
Even in this case neural net must have any non-linear function at hidden layers.
SigmoidFunction

• Itisafunctionwhichisplottedas‘S’shapedgraph.
• Equation:A=1/(1+e ) -x

• Nature: Non-linear. Notice that X values lies between -2 to 2,Yvalues are


very steep. This means, small changes in x would also bring about large
changes in the value of Y.
• ValueRange:0 to1
• Uses: Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result
can be predicted easily to be1 if value is greater than 0.5 and 0 otherwise.
TanhFunction

• The activation that works almost always better than sigmoid function is
Tanh function also known as TangentHyperbolicfunction. It’s actually
mathematically shifted version of the sigmoid function. Both are similar and
can be derived from each other.
• Equation:-

• ValueRange:--1 to+1
• Nature:-non-linear
• Uses:- Usually used in hidden layers of a neural network as it’s values lies
between -1to1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centeringthedata by bringing mean close to 0. This
makes learning for the next layer much easier.
RELUFunction

• It Stands for Rectifiedlinearunit. It is the most widely used activation


function. Chiefly implemented in hiddenlayers of Neural network.
• Equation:-A(x)=max(0,x). It gives an output x if x is positive and 0
otherwise.
• ValueRange:-[0, inf)
• Nature:- non-linear, which means we can easily backpropagate the errors
and have multiple layers of neurons being activated by the ReLU function.
• Uses:- ReLu is less computationally expensive than tanh and sigmoid
because it involves simpler mathematical operations. At a time only a few
neurons are activated making the network sparse making it efficient and easy
for computation.
Insimplewords,RELUlearnsmuchfasterthansigmoidandTanhfunction.
SoftmaxFunction

The softmax function is also a type of sigmoid function but is handy when we are
trying to handle multi- class classification problems.
• Nature:-non-linear
• Uses:- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
• Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define the
class of each input.
• The basic rule of thumb is if you really don’t know what activation function
to use, then simply use RELU as it is a general activation function in hidden
layers and is used in most cases these days.
• If your output is for binary classification then, sigmoidfunction is very
natural choice for output layer.
• If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
WhatisMultiLayerPerceptronNeuralNetwork?
Amultilayerperceptron(MLP)Neuralnetworkbelongstothefeedforwardneural
network. It is anArtificial Neural Network in which all nodes are interconnected
with nodes of different layers.
The word Perceptron was first defined by Frank Rosenblatt in his perceptron
program. Perceptron is a basic unit of an artificial neural network that defines the
artificial neuron in the neural network. It is a supervised learning algorithm that
containsnodes’values,activationfunctions,inputs,andnodeweightstocalculate the
output.
The Multilayer Perceptron (MLP) Neural Network works only in the forward
direction.Allnodesarefullyconnectedtothenetwork.Eachnodepassesitsvalue to the
coming node only in the forward direction.The MLPneural network uses a
Backpropagation algorithm to increase the accuracy of the training model.

MustRead:DeepLearningvsMachineLearning–
Concepts,Applications,andKeyDifferences
StructureofMultiLayerPerceptronNeuralNetwork
ThisnetwokhasthreemainlayersthatcombinetoformacompleteArtificialNeural Network.
These layers are as follows:
InputLayer
ItistheinitialorstartinglayeroftheMultilayerperceptron.Ittakesinputfromthe training
data set and forwards it to the hidden layer.There are n input nodes in the input
layer.Thenumber of input nodesdepends on thenumber of dataset features. Each
input vector variable is distributed to each of the nodes of the hidden layer.
MustExplore–
DataScienceCoursesHiddenLayer
ItistheheartofallArtificialneuralnetworks.Thislayercomprisesallcomputations of the
neural network.The edges of the hidden layer have weights multiplied by the node
values. This layer uses the activation function.
Therecanbeoneortwohidden layersinthemodel.
Severalhiddenlayernodesshouldbeaccurateasfewnodesinthehiddenlayer
makethemodelunabletoworkefficientlywithcomplexdata.Morenodeswill result
in an overfitting problem.
OutputLayer
ThislayergivestheestimatedoutputoftheNeuralNetwork.Thenumberofnodes in the
output layer depends on the type of problem. For a single targeted variable, use one
node. N classification problem,ANN uses N nodes in the output layer.
WorkingofMultiLayerPerceptronNeuralNetwork
• Theinputnoderepresentsthefeatureofthe dataset.
• Eachinputnodepassesthevectorinputvaluetothehiddenlayer.
• In the hidden layer, each edge has some weight multiplied by the input
variable.Alltheproductionvaluesfromthehiddennodesaresummedtogether. To
generate the output
• Theactivationfunctionisusedinthehiddenlayertoidentifytheactive nodes.
• Theoutputispassedtothe outputlayer.
• Calculatethedifferencebetweenpredictedandactualoutputattheoutput layer.
• Themodelusesbackpropagationaftercalculatingthepredicted output.
BackPropagationAlgorithm
ThebackpropagationalgorithmisusedinaMultilayerperceptronneuralnetworkto
increase the accuracy of the output by reducing the error in predicted output and
actual output.
Accordingtothis algorithm,
• AftercalculatingtheoutputfromtheMultilayerperceptronneuralnetwork,
calculate the error.
• This error is the difference between the output generated by the neural
networkandtheactualoutput.Thecalculatederrorisfedbacktothenetwork, from
the output layer to the hidden layer.
• Now,theoutputbecomestheinputtothenetwork.
• Themodelreduceserrorbyadjustingtheweightsinthehidden layer.
• Calculatethepredictedoutputwithadjustedweightandchecktheerror.The
process is recursively used till there is minimum or no error.
• This algorithm helps in increasing the accuracy of the neural network.
DifferenceBetweenMultilayerPerceptronNeuralNetworkandConventionalNe
uralNetwork

MultiLayerPerceptronNeural ConvolutionalNeuralNetwork
Network

TypesofInput Ittakesvector inputs. Ittakesbothvectorsand matrices


as input.

NetworkType ItisafullyconnectedNeural Itisaspatiallyconnectedneural


network network.
Focus Problem Itcandealwithnon-linear Canonlydealwithlinear problems.
problems.

Application Itisgoodforsimpleimage Itismostlyusedforcomplex image


classification. classification.

AdvantagesofMultiLayerPerceptronNeuralNetwork
• MultiLayerPerceptronNeuralNetworkscaneasilyworkwithnon-linear
problems.
• Itcanhandlecomplexproblemswhiledealingwithlargedatasets.
• DevelopersusethismodeltodealwiththefitnessproblemofNeural
Networks.
• Ithasahigheraccuracyrateandreducespredictionerrorbyusing
backpropagation.
• Aftertrainingthemodel,theMultilayerPerceptronNeuralNetworkquickly
predicts the output.
DisadvantagesofMultiLayerPerceptronNeuralNetwork
• ThisNeuralNetworkconsistsoflargecomputation,whichsometimes
increases the overall cost of the model.
• Themodelwillperformwellonlywhenitistrained perfectly.
• Duetothismodel’stightconnections,thenumberofparametersandnode
redundancy increases.
Whyareneuralnetworks used?
Neuronalnetworkscantheoreticallyestimateanyfunction,regardlessof its
complexity.
Yet, supervised learning is a method of determining the correctYfor a fresh X by
learning a function that translates a given X into a specifiedY. But what are the
differences between neural networks and other methods of machine learning? The
answer is based on the Inductive Bias phenomenon, a psychological phenomenon.
Machine learning models are built on assumptions such as the one where X andY
arerelated.AnInductiveBiasoflinearregressionisthelinearrelationshipbetween X
andY. In this way, a line or hyperplane gets fitted to the data.
WhenXandYhaveacomplexrelationship,itcangetdifficultforaLinear
RegressionmethodtopredictY.Forthissituation,thecurvemustbemulti-
dimensional or approximate to the relationship.
Amanualadjustmentisneededsometimesbasedonthecomplexityofthefunction
andthenumberoflayerswithinthenetwork.Inmostcases,trialanderrormethods
combinedwithexperience getusedtoaccomplishingthis.Hence, thisisthereason these
parameters are called hyperparameters.
Whatisa feedforwardneuralnetwork?
Feedforwardneuralnetworksareartificialneuralnetworksinwhichnodesdonot form
loops. This type of neural network is also known as a multi-layer neural network as
all information is only passed forward.
Duringdataflow,inputnodesreceivedata,whichtravelthroughhiddenlayers,and exit
output nodes. No links exist in the network that could get used to by sending
information back from the output node.
Afeedforwardneuralnetworkapproximatesfunctionsinthefollowingway:
• Analgorithmcalculatesclassifiersbyusingtheformulay=f*(x).
• Inputxisthereforeassignedtocategoryy.
• Accordingtothefeedforwardmodel,y=f(x;θ).Thisvaluedeterminesthe closest
approximation of the function.
Feedforwardneuralnetworksserveasthebasisforobjectdetectioninphotos,as shown in
the Google Photos app.
Whatistheworkingprincipleofafeedforwardneural network?

When the feed forward neural network gets simplified, it can appear as a single
layer perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum.As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1,while
if it falls below the threshold, it is usually -1.
As a feedforward neuralnetwork model, the single-layer perceptronoftengets used
for classification. Machine learning can also get integrated into single-layer
perceptrons.Throughtraining,neuralnetworkscanadjusttheirweightsbasedona
property called the delta rule, which helps them compare their outputs with the
intended values.
As a result of training and learning, gradient descent occurs. Similarly, multi-
layered perceptrons update their weights. But, this process gets known as back-
propagation. If this is the case, the network's hidden layers will get adjusted
according to the output values produced by the final layer.
Layersoffeedforwardneural network

• Input layer:
The neurons of this layer receive input and pass it on to the other layers of the
network.Featureorattributenumbersinthedatasetmustmatchthenumberof
neurons in the input layer.
• Output layer:
Accordingtothetypeofmodelgettingbuilt,thislayerrepresentstheforecasted
feature.
• Hiddenlayer:
Inputandoutputlayersgetseparatedbyhiddenlayers.Dependingonthetypeof model,
there may be several hidden layers.
Thereare several neuronsin hidden layers that transform theinput beforeactually
transferringittothenextlayer.Thisnetworkgetsconstantlyupdatedwithweights in
order to make it easier to predict.
• Neuronweights:
Neuronsgetconnectedbyaweight,whichmeasurestheirstrengthormagnitude.
Similar to linear regression coefficients, input weights can also get compared.
Weightisnormallybetween0and1,withavaluebetween0and1.
• Neurons:
Artificialneuronsgetusedinfeedforwardnetworks,whichlatergetadaptedfrom biological
neurons.Aneural network consists of artificial neurons.
Neuronsfunctionintwoways:first,theycreateweightedinputsums,andsecond, they
activate the sums to make them normal.
Activationfunctionscaneitherbelinearornonlinear.Neuronshaveweightsbased on their
inputs. During the learning phase, the network studies these weights.
• ActivationFunction:
Neuronsareresponsibleformakingdecisionsinthis area.
According to the activation function, the neurons determine whether to make a
linearornonlineardecision.Sinceitpassesthroughsomanylayers,itpreventsthe
cascading effect from increasing neuron outputs.
Anactivationfunctioncanbeclassifiedintothreemajorcategories:sigmoid,Tanh, and
Rectified Linear Unit (ReLu).
• Sigmoid:
Inputvaluesbetween0and1getmapped tothe outputvalues.
• Tanh:
Avaluebetween-1and1getsmappedtotheinput values.
• Rectifiedlinear Unit:
Onlypositivevaluesareallowedtoflowthroughthisfunction.Negativevaluesget mapped to 0.
Functioninfeedforwardneural network

Costfunction
Inafeedforwardneuralnetwork,thecostfunctionplaysanimportantrole.The categorized
data points are little affected by minor adjustments to weights and biases.
Thus,asmoothcostfunctioncangetusedtodetermineamethodofadjusting weights and
biases to improve performance.
Followingisadefinitionofthemeansquareerrorcostfunction:

ImagesourceWhere,

w=theweightsgatheredinthenetwork b =
biases
n=numberofinputsfortraining a =
output vectors
x =input
‖v‖=vectorv'snormallength
Loss function
Thelossfunctionofaneuralnetworkgetsusedtodetermineifanadjustmentneeds to be
made in the learning process.
Neurons in the output layer are equal to the number of classes. Showing the
differencesbetweenpredictedandactualprobabilitydistributions.Followingisthe
cross-entropy loss for binary classification.

Image source
Asaresultofmulticlasscategorization,across-entropyloss occurs:

Gradientlearning algorithm
In the gradient descent algorithm, the next point gets calculated by scaling the
gradientatthecurrentpositionbyalearningrate.Thensubtractedfromthecurrent position
by the achieved value.
Todecreasethefunction,itsubtractsthevalue(toincrease,itwouldadd).Asan example,
here is how to write this procedure:

Thegradientgetsadjustedbytheparameterη,whichalsodeterminesthestepsize. Performance
is significantly affected by the learning rate in machine learning.
Output units
In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.
There is a close relationship between the choice of output units and the cost
function.Anyunitthatcanserveasahiddenunitcanalsoserveasanoutputunitin a neural
network.
AdvantagesoffeedforwardNeural Networks
• Machinelearningcanbeboostedwithfeedforwardneuralnetworks'
simplified architecture.
• Multi-networkinthefeedforwardnetworksoperateindependently,witha
moderated intermediary.
• Complextasksneedseveral neuronsinthenetwork.
• Neuralnetworkscanhandleandprocessnonlineardataeasilycomparedto
perceptrons and sigmoid neurons, which are otherwise complex.
• Aneuralnetworkdealswiththecomplicatedproblemofdecision boundaries.
• Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally well
inimageprocessing,whereasrecurrentneuralnetworks(RNNs)performwell in
text and voice processing.
• Neural networks need graphics processing units (GPUs) to handle large
datasetsformassivecomputationalandhardwareperformance.SeveralGPUs get
used widely in the market, including Kaggle Notebooks and Google Collab
Notebooks.
Applicationsoffeedforwardneural networks

Therearemanyapplicationsfortheseneuralnetworks.Thefollowingareafewof them.
Physiologicalfeedforwardsystem
Itispossibletoidentifyfeedforwardmanagementinthissituationbecausethe central
involuntary regulates the heartbeat before exercise.
Generegulationandfeedforward
Detectingnon-temporarychangestotheatmosphereisafunctionofthismotifasa feed
forward system.You can find the majority of this pattern in the illustrious networks.
Automationandmachinemanagement
Automationcontrolusingfeedforwardisoneofthedisciplinesinautomation.
Parallelfeedforwardcompensationwithderivative
Anopen-looptransferconvertsnon-minimumpartsystemsintominimumpart systems
using this technique.
Understandingthemathbehindneural networks
Typical deep learning algorithms are neural networks (NNs).As a result of their
unique structure, their popularity results from their 'deep' understanding of data.
Furthermore,NNsareflexibleintermsofcomplexityandstructure.Despiteallthe
advanced stuff, they can't work without the basic elements: they may work better
with the advanced stuff, but the underlying structure remains the same.
Let'sbegin.NNsgetconstructedsimilarlytoourbiologicalneurons,andthey resemble the
following:

Neuronsarehexagonsinthisimage.Inneuralnetworks,neuronsgetarrangedinto layers:
input is the first layer, and output is the last with the hidden layer in the middle.
NNconsistsoftwomainelementsthatcomputemathematicaloperations.Neurons
calculate weighted sums using input data and synaptic weights since neural
networks are just mathematical computations based on synaptic links.
Thefollowingisasimplifiedvisualization:
Inamatrixformat,itlooksasfollows:

Inthethirdstep, avectorofonesgetsmultipliedbythe outputofourhidden layer:

Using the output value, we can calculate the result. Understanding these
fundamental concepts will make building NN much easier, and you will be amazed
at how quickly you can do it. Every layer's output becomes the following layer's
input.
Thearchitectureofthe network
Inanetwork,thearchitecturereferstothenumberofhiddenlayersandunitsineach layer
that make up the network.
Afeed forward network based on the UniversalApproximation Theorem must have a
"squashing" activation function at least on one hidden layer.
The network can approximate any Borel measurable function within a finite-
dimensional space with at least some amount of non-zero error when there are
enough hidden units.
It simply states that we can always represent any function using the multi-layer
perceptron (MLP), regardless of what function we try to learn.
Thus, we now know there will always be an MLPto solve our problem, but there is
no specific method for finding it.
It is impossible to say whether it will be possible to solve the given problem if we
use N layers with M hidden units.
Research is still ongoing, and for now, the only way to determine this configuration
is by experimenting with it.
While it is challenging to find the appropriate architecture, we need to try many
configurations before finding the one that can represent the target function.
Therearetwopossibleexplanationsforthis.Firstly,theoptimizationalgorithmmay not
find the correct parameters, and secondly, the training algorithms may use the
wrong function because of overfitting.
Whatisbackpropagationinfeedforwardneural network?
Backpropagationisatechniquebasedongradientdescent.Eachstageofagradient descent
process involves iteratively moving a function in the opposite direction of its
gradient (the slope).
The goal is to reduce the cost function given the training data while learning a
neural network. Network weights and biases of all neurons in each layer determine
the cost function. Backpropagation gets used to calculate the gradient of the cost
functioniteratively.Andthenupdateweightsandbiasesintheoppositedirectionto reduce
the gradient.
We must define the error of the backpropagation formula to specify i-th neuron in
thel-thlayerofanetworkforthej-thtraining.Exampleasfollows(inwhich represents
the weighted input to the neuron, and Lrepresents the loss.)

Inbackpropagationformulas,theerrorisdefinedas above:
Below is the full derivation of the formulas. For each formula below, Lstands for
theoutputlayer,gfortheactivationfunction,∇thegradient,W[l]Tlayerlweights
transposed.
Aproportionalactivationofneuroniatlayerlbased onblibiasfromlayeritolayer
i,wlikweightfromlayerltolayerl-1,andak[l−1](j)activationofneuronkatlayer l-1 for
training example j.
The first equation shows how to calculate the error at the output layer for sample j.
Following that, we can use the second equation to calculate the error in the layerjust
before the output layer.
Based on the error values for the next layer, the second equation can calculate the
error in any layer. Because this algorithm calculates errors backward, it is known as
backpropagation.
For sample j, we calculate the gradient of the loss function by taking the third and
fourth equations and dividing them by the biases and weights.
We can update biases and weights by averaging gradients of the loss function
relative to biases and weights for all samples using the average gradients.
The process is known as batch gradient descent. We will have to wait a long time if
we have too many samples.
Ifeachsamplehasagradient,itispossibletoupdatethebiases/weightsaccordingly. The
process is known as stochastic gradient descent.
Even though this algorithm is faster than batch gradient descent, it does not yield a
good estimate of the gradient calculated using a single sample.
It is possible to update biases and weights based on the average gradients
ofbatches.Itgetsreferredtoasmini-batchgradientdescentandgetspreferredoverthe other
two.
Endingnote
The field of deep learning is one of the most studied in software engineering.
Recurrent neural systems are commonly used for speech and content processing,
while convolutional neural systems are best for handling images.
When processing large datasets, neural networks require massive amounts of
computation power and equipment accelerators, which can be obtained byclustering
graphics processing units (GPUs).
IfyouarenewtoGPUs,youcandownloadandusefreecustomGPUsettingsonthe internet.
The most popular notebooks are Kaggle Notebooks and Google Collaborative
Notebooks.

You might also like