0% found this document useful (0 votes)
43 views19 pages

Unit-Ii DLL

The document provides an overview of deep learning architectures, highlighting the differences between machine learning and deep learning, and discussing concepts such as representation learning, neural network width and depth, activation functions, and unsupervised training methods. It explains the advantages and challenges of deeper and wider networks, as well as various activation functions like ReLU and its variants. Additionally, it covers unsupervised learning techniques including autoencoders and Restricted Boltzmann Machines, emphasizing their applications in dimensionality reduction and feature extraction.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views19 pages

Unit-Ii DLL

The document provides an overview of deep learning architectures, highlighting the differences between machine learning and deep learning, and discussing concepts such as representation learning, neural network width and depth, activation functions, and unsupervised training methods. It explains the advantages and challenges of deeper and wider networks, as well as various activation functions like ReLU and its variants. Additionally, it covers unsupervised learning techniques including autoencoders and Restricted Boltzmann Machines, emphasizing their applications in dimensionality reduction and feature extraction.

Uploaded by

bitsmid167
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT-II DEEPLEARNINGARCHITECTURES

Machine Learning and Deep Learning, Representation Learning, Width and Depth of
Neural Networks, Activation Functions: RELU, LRELU, ERELU, Unsupervised
Training of Neural Networks, Restricted Boltzmann Machines, AutoEncoders, Deep
Learning Applications.

Machine Learning and Deep Learning


Machine learning and deep learning both are subsets of artificial intelligence but there are
many similarities and differences between them.
Machine Learning Deep Learning
Apply statistical algorithms to learn the Uses artificial neural network
hidden patterns and relationships in the architecture to learn the hidden patterns
dataset. and relationships in the dataset.

Can work on the smaller amount of dataset. Requires the larger volume of dataset
compared to machine learning.

Better for the low-label task. Better for complex task like image
processing, natural language
processing,
etc.
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which Relevant features are automatically
are manually extracted from images to detect extracted from images. It is an end-to
an object in the image. end learning process.

Less complex and easy to interpret the result.


More complex,it works like the
blackbox interpretations of the result
are not easy.
It can work on the CPU or requires less It requires a high performance
computing power as compared to deep computer with Graphical Processor
learning. unit(GPU).

Representation Learning
Representation Learning is a process that simplifies raw data into understandable patterns for
machine learning. It enhances interpretability, uncovers hidden features, and aids in transfer
learning.
Data in its raw form (words and letters in text, pixels in images) is too complex for
machines to process directly. Representation learning transforms the data into a
representation that machines can use for classification or predictions.

Deep Learning, a subset of Machine Learning tasks has been revolutionary in the past two
decades. This success of Deep Learning heavily relies on the advancements made in
representation learning.

Previously, manual feature engineering constrained model capabilities, as it required


extensive expertise and effort to identify relevant features. Whereas Deep learning
automated this feature extraction.

Hinton and co-authors’ breakthrough discovery in 2006 marks a pivotal point, shifting the
focus of representation learning towards Deep Learning Architectures. The researchers’
concept of employing greedy layer-wise pre-training followed by fine-tuning deep neural
networks led to further developments.
Deep Neural Network models could learn complex, hierarchical representations of data
through multiple layers. Eg, CNN, RNN, Autoencoder, and Transformers.
The era of Deep neural Networks started in the year 2006.
A good representation has three characteristics: Information, compactness, and
generalization.

 Information: The representation encodes important features of the data into a compressed
form.

 Compactness:

 Low Dimensionality: Learned embedding representations from raw data should be


much smaller than the original input. This allows for efficient storage and retrieval,
and also discards noise from the data, allowing the model to focus on relevant
features and converge faster.
 Preserves Essential Information: Despite being lower-dimensional, the
representation retains important features. This balance between dimensionality
reduction and information preservation is essential.

 Generalization (Transfer Learning): The aim is to learn versatile representations for


transfer learning, starting with a pre-trained model (computer vision models are often
trained on ImageNet first) and then fine-tuning it for specific tasks requiring less data.

Width and Depth of Neural Networks

The architecture of neural networks often specified by the width and the depth of the
networks. The depth of a neural network is defined as its number of layers (including output
layer but excluding input layer); while the width of a neural network is defined to be the
maximal number of nodes in a layer.
Advantages of Deeper Networks:

Hierarchical Feature Extraction: Deep networks are adroit at knowledge hierarchical


likenesses, permissive them to ascertain more and more abstract features from inexperienced
dossier. This is particularly advantageous for tasks accompanying complex structures, in the
way that object acknowledgment in representations.

Representation Power: Deeper architectures can capture a off-course range of patterns, aiding
the displaying of complicated connections in data. This is clear in machine intelligence tasks,
place deep models learn arresting pertaining to syntax shadings.

Transfer Learning: Pre-trained deep networks, such as convolutional neural networks (CNNs)
and transformer models, have shown remarkable transferability. By leveraging learned features
from one task, these networks can be fine-tuned for new tasks with relatively small amounts of
labeled data

Challenges of Deeper Networks:

Vanishing Gradients: As gradients propagate backward through numerous layers, they can
diminish to near-zero values, hindering the learning process. Techniques like batch
normalization and skip connections have alleviated this issue to some extent.

Computational Demands: Deeper architectures require more computational resources,


leading to increased training times and hardware requirements. This can limit the feasibility of
deploying deep models on resource-constrained devices.

Overfitting: Deep networks are susceptible to overfitting, especially when training data is
limited. The high capacity of these models can result in memorization of training examples
rather than generalization.

Advantages of Wider Networks:


Enhanced Parallelization: Wider networks can process multiple features in parallel,
accelerating training and inference times. This is advantageous in applications that require real-
time or near-real-time processing, such as autonomous vehicles.

Generalization: The abundance of neurons in wider networks enables them to capture a


broader range of features, leading to improved generalization across various data distributions.
This is crucial when dealing with noisy or diverse datasets.

Robustness: Wider networks tend to be more robust to adversarial attacks and input
perturbations. The redundancy of information in the increased neuron count can help mitigate
the impact of small perturbations.

Challenges of Wider Networks:

Overfitting: Just as with deeper networks, wider architectures can also be prone to
overfitting, especially when the training dataset is limited. Regularization techniques like
dropout and L2 regularization are often employed to mitigate this issue.

Diminishing Returns: Increasing the width beyond a certain point may lead to diminishing
returns in terms of performance improvement. This implies that while wider networks offer
increased expressiveness, there is a trade-off between computational efficiency and
performance gains.

Curse of Dimensionality: Wider networks can exacerbate the curse of dimensionality, where
the number of parameters grows rapidly with network width. This can lead to increased
memory consumption and training times.

The decision to opt for a deeper or wider architecture should be driven by the specific task at
hand, the available data, and computational resources.

Activation Functions: RELU, LRELU, ERELU


An activation function is a mathematical function applied to the output of a neuron. It
introduces non-linearity into the model, allowing the network to learn and represent
complex patterns in the data.
Neural networks consist of neurons that operate using weights, biases, and activation
functions.

In the learning process, these weights and biases are updated based on the error produced
at the output—a process known as backpropagation. Activation functions enable
backpropagation by providing gradients that are essential for updating the weights and
biases.

Without non-linearity, even deep networks would be limited to solving only simple,
linearly separable problems. Activation functions empower neural networks to model
highly complex data distributions and solve advanced deep learning tasks. Adding non-
linear activation functions introduce flexibility and enable the network to learn more
complex and abstract patterns from data.

Some types of activation functions are:

Sigmoid Activation Function

This function takes any real value as input and outputs values in the range of 0 to 1.

The larger the input (more positive), the closer the output value will be to 1.0, whereas the
smaller the input (more negative), the closer the output will be to 0.0, as shown below.
Mathematically it is represented by:

Tanh Function (Hyperbolic Tangent)


Tanh function is very similar to the sigmoid/logistic activation function, and even has the same S-
shape with the difference in output range of -1 to 1. In Tanh, the larger the input (more positive),
the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer
the output will be to -1.0.
Mathematically it is can be represented as

The output of the tanh activation function is Zero centered; hence we can easily map the output
values as strongly negative, neutral, or strongly positive.

ReLU Activation Function


 ReLU stands for Rectified Linear Unit.
 Although it gives an impression of a linear function, ReLU has a derivative function and
allows for backpropagation while simultaneously making it computationally efficient.

 The main catch here is that the ReLU function does not activate all the neurons at the same
time.

 The neurons will only be deactivated if the output of the linear transformation is less than 0.

Mathematically it can be represented as:

 Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
 The drawback of ReLU is it suffers from dying ReLU problem.
 The negative side of the graph makes the gradient value zero. Due to this reason, during the
backpropagation process, the weights and biases for some neurons are not updated. This can
create dead neurons which never get activated.

 All the negative input values become zero immediately, which decreases the model’s ability
to fit or train from the data properly.

Leaky ReLU Function


Leaky ReLU is an improved version of ReLU function to solve the Dying ReLU problem as it has
a small positive slope in the negative area.

Mathematically it can be represented as:

 The advantages of Leaky ReLU are same as that of ReLU, in addition to the fact that it does
enable backpropagation, even for negative input values.
 By making this minor modification for negative input values, the gradient of the left side of
the graph comes out to be a non-zero value. Therefore, we would no longer encounter dead
neurons in that region.

Exponential Linear Unit (ELU) Activation Function

 Exponential Linear Unit, or ELU for short, is also a variant of ReLU that modifies the slope of
the negative part of the function.
 ELU is a strong alternative for f ReLU because of the following advantages:

 ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
 Avoids dead ReLU problem by introducing log curve for negative values of input. It helps
the network nudge weights and biases in the right direction.
Mathematically it can be represented as: f(x) =

Unsupervised Training of Neural Networks

An unsupervised neural network is a type of artificial neural network (ANN) used in


unsupervised learning tasks. Unlike supervised neural networks, trained on labeled data with
explicit input-output pairs, unsupervised neural networks are trained on unlabeled data. In
unsupervised learning, the network is not under the guidance of features. Instead, it is
provided with unlabeled data sets (containing only the input data) and left to discover the
patterns in the data and build a new model from it.
These neural networks aim to discover patterns, structures, or representations within the data
without specific guidance.
There key components of unsupervised learning are:

1. Encoder-Decoder: As the name itself suggests that it is used to encode and decode the data.

Encoder basically responsible for transforming the input data into lower dimensional
representation on which the neural network works. Whereas decoder takes the encoded
representation and reconstruct the input data from it. There architecture and parameters are
learned during the training of the network.
2. Latent Space: It is the immediate representation created by the encoder. It contains the

abstract representation or features that captures important information about the data's
structures. It is also known as the latent space.
3. Training algorithm: Unsupervised neural network model use specific training algorithms

to get the parameters. Some of the common optimization algorithms are Stochastic gradient
descent, Adam etc. They are used depending on the type of model and loss function.
4. Loss Function: It is a common component among all the machine learning models. It

basically calculates the model's output and the actual/measured output. It quantifies how
well the model understands the data.
Auto Encoders
Autoencoders are a type of deep learning algorithm that are designed to receive an input
and transform it into a different representation. They play an important part in image
construction.

An autoencoder neural networkis an Unsupervised Machine learning


algorithm that applies backpropagation, setting the target values to be equal to the
inputs. Autoencoders are used to reduce the size of our inputs into a smaller
representation. If anyone needs the original data, they can reconstruct it from the
compressed data.

Autoencoders are preferred over Principal Component Analysis(PCA) which is also


used for diamensionality reduction because of the following reasons:

 Anautoencodercanlearn non linear transformations witha non-linear activation


function and multiple layers.
 It doesn’thave to learndense layers. It can use convolutionallayersto learn which is
better for video, image and series data.
 Itismoreefficienttolearnseverallayerswithanautoencoderratherthan learn one huge
transformation with PCA.
 Anautoencoderprovidesa representationofeachlayerastheoutput.
 Itcanmakeuseof pre-trainedlayers fromanothermodeltoapplytransfer learning to
enhance the encoder/decoder.
ApplicationsofAutoencoders
1) ImageColoring
Autoencoders are used for converting any black and white picture into a
colored image. Depending on what is in the picture, it is possible to tell what
thecolor should be.

2) Feature variation
It extracts only the required features of an image and generates the output
by removing any noise or unnecessary interruption.

3) Dimensionality Reduction
The reconstructed image is the same as our input but with reduced
dimensions. It helps in providing the similar image with a reduced pixel value.

4) Denoising Image
The input seen by the autoencoder is not the raw input but a stochastically
corrupted version. A denoising autoencoder is thus trained to reconstruct the
original input from the noisy version.
5) WatermarkRemoval

It is also used for removing water

Architecture of Autoencoders
An Autoencoder consist of three layers:

1. Encoder
2. Code
3. Decoder

 Encoder: This part of the network compresses the input into a latent
space representation.Theencoderlayer encodes theinputimageasa
compressed representation in a reduced dimension. The
compressed imageis the distorted version of the original image.

Code:Thispart of the network represents the compressed input


which is fed to the decoder.
Decoder:This layer decodes the encoded image back to the original
dimension. The decoded image is a lossy reconstruction of the
original image and it is reconstructed from the latent space
representation.

as Bottleneck. This is a well-designed approach to decide which aspects


of observed data are relevant information and what aspects can be discarded.

An autoencoder consists of two parts: an encoder network and a decoder


network. The encoder network compresses the input data, while the
decodernetwork reconstructs the compressed data back into its original form. The
compressed data, also known as the bottleneck layer, is typically much smaller than
the input data.

The encoder network takes the input data and maps it to a lower-dimensional
representation. This lower-dimensional representation is the compressed data. The
decoder network takes this compressed data and maps it back to the original input
data. The decoder network is essentially the inverse of the encoder network.

The bottleneck layer is the layer in the middle of the autoencoder thatcontains the
compressed data. This layer is much smaller than the input data, which is what allows
for compression. The size of the bottleneck layer determines the amount of
compression that can be achieved.
Restricted Boltzmann Machines
 Restricted Boltzmann Machine (RBM) is a type of artificial neural
network that is used for unsupervised learning. It is a type of
generative model that is capable of learning a probability
distribution over a set of input data.
 It is a type of neural network that consists of two layers of neurons –
a visible layer and a hidden layer. The visible layer represents the
input data, while the hidden layer represents a set of features that are
learned by the network.
 They are widely used for dimensionality reduction, classification,
regression, collaborative filtering, feature learning etc..
 RBMs have no intra-layer connections; connections only exist
between nodes in different layers.
 This restriction simplifies the training process, reduces the
computational complexity and make the learning process more
efficient.

Autoencoders vs. Restricted Boltzmann Machine

RBM has two biases, which is one of the most important aspects that
distinguish them from other autoencoders. The hidden bias helps the
RBM provide the activations on the forward pass, while the visible layer
biases help the RBM learns the reconstruction on the backward pass.

Working of Restricted Boltzmann Machine

 A low-level feature is taken by each of the visible node from an item


residing in the database so that it can be learned; for example, from a
dataset of grayscale images, each visible node would receive one-
pixel value for each pixel in one image.

 Let's follow that single pixel value X through the two-layer net. At the
very first node of the hidden layer, X gets multiplied by a weight,
which is then added to the bias.
 Again, the result is provided to the activation function to produce the
output of that node.
 Each of the input X gets multiplied by an individual weight w at each
hidden node. In other words, we can say that a single input would
encounter three weights, which will further result in a total of 12
weights, i.e. (4 input nodes x 3 hidden nodes). The weights between
the two layers will always form a matrix where the rows are equal to
the input nodes, and the columns are equal to the output nodes.

The training of a Restricted Boltzmann Machine is completely different


from that of the Neural Networks via stochastic gradient descent.
Following are the two main training steps:

o Gibbs Sampling

o Contrastive Divergence Step

Deep Learning Applications

Applications of Deep Learning:


1. Fraud detection

Deep learning algorithms can identify security issues to help protect


against fraud. For example, deep learning algorithms can detect suspicious
attempts to log into your accounts and notify you, as well as inform you if
your chosen password isn’t strong enough.

2. Customer service

You may have seen or used customer service help online and interacted
with a chatbot to help answer your questions or utilized a virtual assistant
on your smartphone. Deep learning allows these systems to learn over time
to respond.

3. Financial services

Several financial services can rely on assistance from deep learning.


Predictive analytics helps support investment portfolios and trading assets
in the stock market, as well as allowing banks to mitigate risk relating to
loan approvals.

4. Natural language processing

Natural language processing is an important part of deep learning


applications that rely on interpreting text and speech. Customer service
chatbots, language translators, and sentiment analysis are all examples of
applications benefitting from natural language processing.

5.Facial recognition

An area of deep learning known as computer vision allows deep learning


algorithms to recognize specific features in pictures and videos. With this
technique, you can use deep learning for facial recognition, identifying you
by your own unique features.

6.Self-driving vehicles

Autonomous vehicles use deep learning to learn how to operate and handle
different situations while driving, and it allows vehicles to detect traffic
lights, recognize signs, and avoid pedestrians.

7.Predictive analytics

Deep learning models can analyze large amounts of historical information


to make accurate predictions about the future. Predictive analytics helps
businesses in several aspects, including forecasting revenue, product
development, decision-making, and manufacturing.

8. Recommender systems

Online services often use recommender systems with enhanced capabilities


provided by deep learning models. With enough data, these deep learning
models can predict the probabilities of certain interactions based on the
history of previous interactions. Industries such as streaming services, e-
commerce, and social media implement recommender systems.

9. Health care
Deep learning applications in the health care industry serve multiple
purposes. Not only can they assist in developing treatment solutions, but
deep learning algorithms are also capable of understanding medical images
and helping doctors diagnose patients by detecting cancer cells.

10. Industrial

Deep learning applications in industrial automation help keep workers safe


in factories by enabling machines to detect dangerous situations, such as
when objects or people are too close to the machines.

Advantages of Deep Learning:


High accuracy: Deep Learning algorithms can achieve state-of-the-art
performance in various tasks, such as image recognition and natural
language processing.

Automated feature engineering: Deep Learning algorithms can


automatically discover and learn relevant features from data without the
need for manual feature engineering.

Scalability: Deep Learning models can scale to handle large and


complexdatasets, and can learn from massive amounts of data.

Flexibility: DeepLearning models can be appliedto a wide range of tasks


andcan handle various types of data, such as images, text, and speech.

Continual improvement: Deep Learning models can continually


improve their performance as more data becomes available.

Disadvantages of Deep Learning:

High computational requirements: Deep Learning models require large


amounts of data and computational resources to train and optimize.
Requires large amounts of labeled data: Deep Learning models often
require a large amount of labeled data for training, which can be
expensive and time- consuming to acquire.

Interpretability:Deep Learning models can be challenging to interpret,


making it difficult to understand how they make decisions.

Overfitting:DeepLearning models can sometimes overfit to the training


data, resulting in poor performance on new and unseen data.

Black-box nature: Deep Learning models are often treated as black


boxes,making it difficult to understand how they work and how they
arrived at their predictions.

You might also like