Unit-Ii DLL
Unit-Ii DLL
Machine Learning and Deep Learning, Representation Learning, Width and Depth of
Neural Networks, Activation Functions: RELU, LRELU, ERELU, Unsupervised
Training of Neural Networks, Restricted Boltzmann Machines, AutoEncoders, Deep
Learning Applications.
Can work on the smaller amount of dataset. Requires the larger volume of dataset
compared to machine learning.
Better for the low-label task. Better for complex task like image
processing, natural language
processing,
etc.
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which Relevant features are automatically
are manually extracted from images to detect extracted from images. It is an end-to
an object in the image. end learning process.
Representation Learning
Representation Learning is a process that simplifies raw data into understandable patterns for
machine learning. It enhances interpretability, uncovers hidden features, and aids in transfer
learning.
Data in its raw form (words and letters in text, pixels in images) is too complex for
machines to process directly. Representation learning transforms the data into a
representation that machines can use for classification or predictions.
Deep Learning, a subset of Machine Learning tasks has been revolutionary in the past two
decades. This success of Deep Learning heavily relies on the advancements made in
representation learning.
Hinton and co-authors’ breakthrough discovery in 2006 marks a pivotal point, shifting the
focus of representation learning towards Deep Learning Architectures. The researchers’
concept of employing greedy layer-wise pre-training followed by fine-tuning deep neural
networks led to further developments.
Deep Neural Network models could learn complex, hierarchical representations of data
through multiple layers. Eg, CNN, RNN, Autoencoder, and Transformers.
The era of Deep neural Networks started in the year 2006.
A good representation has three characteristics: Information, compactness, and
generalization.
Information: The representation encodes important features of the data into a compressed
form.
Compactness:
The architecture of neural networks often specified by the width and the depth of the
networks. The depth of a neural network is defined as its number of layers (including output
layer but excluding input layer); while the width of a neural network is defined to be the
maximal number of nodes in a layer.
Advantages of Deeper Networks:
Representation Power: Deeper architectures can capture a off-course range of patterns, aiding
the displaying of complicated connections in data. This is clear in machine intelligence tasks,
place deep models learn arresting pertaining to syntax shadings.
Transfer Learning: Pre-trained deep networks, such as convolutional neural networks (CNNs)
and transformer models, have shown remarkable transferability. By leveraging learned features
from one task, these networks can be fine-tuned for new tasks with relatively small amounts of
labeled data
Vanishing Gradients: As gradients propagate backward through numerous layers, they can
diminish to near-zero values, hindering the learning process. Techniques like batch
normalization and skip connections have alleviated this issue to some extent.
Overfitting: Deep networks are susceptible to overfitting, especially when training data is
limited. The high capacity of these models can result in memorization of training examples
rather than generalization.
Robustness: Wider networks tend to be more robust to adversarial attacks and input
perturbations. The redundancy of information in the increased neuron count can help mitigate
the impact of small perturbations.
Overfitting: Just as with deeper networks, wider architectures can also be prone to
overfitting, especially when the training dataset is limited. Regularization techniques like
dropout and L2 regularization are often employed to mitigate this issue.
Diminishing Returns: Increasing the width beyond a certain point may lead to diminishing
returns in terms of performance improvement. This implies that while wider networks offer
increased expressiveness, there is a trade-off between computational efficiency and
performance gains.
Curse of Dimensionality: Wider networks can exacerbate the curse of dimensionality, where
the number of parameters grows rapidly with network width. This can lead to increased
memory consumption and training times.
The decision to opt for a deeper or wider architecture should be driven by the specific task at
hand, the available data, and computational resources.
In the learning process, these weights and biases are updated based on the error produced
at the output—a process known as backpropagation. Activation functions enable
backpropagation by providing gradients that are essential for updating the weights and
biases.
Without non-linearity, even deep networks would be limited to solving only simple,
linearly separable problems. Activation functions empower neural networks to model
highly complex data distributions and solve advanced deep learning tasks. Adding non-
linear activation functions introduce flexibility and enable the network to learn more
complex and abstract patterns from data.
This function takes any real value as input and outputs values in the range of 0 to 1.
The larger the input (more positive), the closer the output value will be to 1.0, whereas the
smaller the input (more negative), the closer the output will be to 0.0, as shown below.
Mathematically it is represented by:
The output of the tanh activation function is Zero centered; hence we can easily map the output
values as strongly negative, neutral, or strongly positive.
The main catch here is that the ReLU function does not activate all the neurons at the same
time.
The neurons will only be deactivated if the output of the linear transformation is less than 0.
Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
The drawback of ReLU is it suffers from dying ReLU problem.
The negative side of the graph makes the gradient value zero. Due to this reason, during the
backpropagation process, the weights and biases for some neurons are not updated. This can
create dead neurons which never get activated.
All the negative input values become zero immediately, which decreases the model’s ability
to fit or train from the data properly.
The advantages of Leaky ReLU are same as that of ReLU, in addition to the fact that it does
enable backpropagation, even for negative input values.
By making this minor modification for negative input values, the gradient of the left side of
the graph comes out to be a non-zero value. Therefore, we would no longer encounter dead
neurons in that region.
Exponential Linear Unit, or ELU for short, is also a variant of ReLU that modifies the slope of
the negative part of the function.
ELU is a strong alternative for f ReLU because of the following advantages:
ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
Avoids dead ReLU problem by introducing log curve for negative values of input. It helps
the network nudge weights and biases in the right direction.
Mathematically it can be represented as: f(x) =
1. Encoder-Decoder: As the name itself suggests that it is used to encode and decode the data.
Encoder basically responsible for transforming the input data into lower dimensional
representation on which the neural network works. Whereas decoder takes the encoded
representation and reconstruct the input data from it. There architecture and parameters are
learned during the training of the network.
2. Latent Space: It is the immediate representation created by the encoder. It contains the
abstract representation or features that captures important information about the data's
structures. It is also known as the latent space.
3. Training algorithm: Unsupervised neural network model use specific training algorithms
to get the parameters. Some of the common optimization algorithms are Stochastic gradient
descent, Adam etc. They are used depending on the type of model and loss function.
4. Loss Function: It is a common component among all the machine learning models. It
basically calculates the model's output and the actual/measured output. It quantifies how
well the model understands the data.
Auto Encoders
Autoencoders are a type of deep learning algorithm that are designed to receive an input
and transform it into a different representation. They play an important part in image
construction.
2) Feature variation
It extracts only the required features of an image and generates the output
by removing any noise or unnecessary interruption.
3) Dimensionality Reduction
The reconstructed image is the same as our input but with reduced
dimensions. It helps in providing the similar image with a reduced pixel value.
4) Denoising Image
The input seen by the autoencoder is not the raw input but a stochastically
corrupted version. A denoising autoencoder is thus trained to reconstruct the
original input from the noisy version.
5) WatermarkRemoval
Architecture of Autoencoders
An Autoencoder consist of three layers:
1. Encoder
2. Code
3. Decoder
Encoder: This part of the network compresses the input into a latent
space representation.Theencoderlayer encodes theinputimageasa
compressed representation in a reduced dimension. The
compressed imageis the distorted version of the original image.
The encoder network takes the input data and maps it to a lower-dimensional
representation. This lower-dimensional representation is the compressed data. The
decoder network takes this compressed data and maps it back to the original input
data. The decoder network is essentially the inverse of the encoder network.
The bottleneck layer is the layer in the middle of the autoencoder thatcontains the
compressed data. This layer is much smaller than the input data, which is what allows
for compression. The size of the bottleneck layer determines the amount of
compression that can be achieved.
Restricted Boltzmann Machines
Restricted Boltzmann Machine (RBM) is a type of artificial neural
network that is used for unsupervised learning. It is a type of
generative model that is capable of learning a probability
distribution over a set of input data.
It is a type of neural network that consists of two layers of neurons –
a visible layer and a hidden layer. The visible layer represents the
input data, while the hidden layer represents a set of features that are
learned by the network.
They are widely used for dimensionality reduction, classification,
regression, collaborative filtering, feature learning etc..
RBMs have no intra-layer connections; connections only exist
between nodes in different layers.
This restriction simplifies the training process, reduces the
computational complexity and make the learning process more
efficient.
RBM has two biases, which is one of the most important aspects that
distinguish them from other autoencoders. The hidden bias helps the
RBM provide the activations on the forward pass, while the visible layer
biases help the RBM learns the reconstruction on the backward pass.
Let's follow that single pixel value X through the two-layer net. At the
very first node of the hidden layer, X gets multiplied by a weight,
which is then added to the bias.
Again, the result is provided to the activation function to produce the
output of that node.
Each of the input X gets multiplied by an individual weight w at each
hidden node. In other words, we can say that a single input would
encounter three weights, which will further result in a total of 12
weights, i.e. (4 input nodes x 3 hidden nodes). The weights between
the two layers will always form a matrix where the rows are equal to
the input nodes, and the columns are equal to the output nodes.
o Gibbs Sampling
2. Customer service
You may have seen or used customer service help online and interacted
with a chatbot to help answer your questions or utilized a virtual assistant
on your smartphone. Deep learning allows these systems to learn over time
to respond.
3. Financial services
5.Facial recognition
6.Self-driving vehicles
Autonomous vehicles use deep learning to learn how to operate and handle
different situations while driving, and it allows vehicles to detect traffic
lights, recognize signs, and avoid pedestrians.
7.Predictive analytics
8. Recommender systems
9. Health care
Deep learning applications in the health care industry serve multiple
purposes. Not only can they assist in developing treatment solutions, but
deep learning algorithms are also capable of understanding medical images
and helping doctors diagnose patients by detecting cancer cells.
10. Industrial