deep learning
deep learning
Training: Deep Learning models are trained by feeding large amounts of data into
a neural network, and adjusting the weights of the network through a process
known as backpropagation.
2 .ARTIFICIAL INTELLIGENCE
Artificial Intelligence exists when a machine can have human based skills such as
learning, reasoning, and solving problems
It is believed that AI is not a new technology, and some people says that as per
Greek myth, there were Mechanical men in early days which can work and behave
like humans.
Why Artificial Intelligence?
Before Learning about Artificial Intelligence, we should know that what is the
importance of AI and why should we learn it. Following are some main reasons to
learn about AI:
o With the help of AI, you can create such software or devices which can
solve real-world problems very easily and with accuracy such as health
issues, marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as
Cortana, Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new
Opportunities.
2. General AI:
o General AI is a type of intelligence which could perform any intellectual
task with efficiency like a human.
o The idea behind the general AI to make such a system which could be
smarter and think like a human by its own.
o Currently, there is no such system exist which could come under general AI
and can perform any task as perfect as a human.
o The worldwide researchers are now focused on developing machines with
General AI.
o As systems with general AI are still under research, and it will take lots of
efforts and time to develop such systems.
3. Super AI:
o Super AI is a level of Intelligence of Systems at which machines could
surpass human intelligence, and can perform any task better than human
with cognitive properties. It is an outcome of general AI.
o Some key characteristics of strong AI include capability include the ability
to think, to reason,solve the puzzle, make judgments, plan, learn, and
communicate by its own.
o Super AI is still a hypothetical concept of Artificial Intelligence.
Development of such systems in real is still world changing task.
3. Theory of Mind
o Theory of Mind AI should understand the human emotions, people, beliefs,
and be able to interact socially like humans.
o This type of AI machines are still not developed, but researchers are making
lots of efforts and improvement for developing such AI machines.
4. Self-Awareness
o Self-awareness AI is the future of Artificial Intelligence. These machines
will be super intelligent, and will have their own consciousness, sentiments,
and self-awareness.
o These machines will be smarter than human mind.
o Self-Awareness AI does not exist in reality still and it is a hypothetical
concept.
The history of machine learning dates back to the mid-20th century when
researchers started exploring the idea of creating machines that could learn from
data. Here is a brief overview of the major milestones in the history of machine
learning:
➢ 1970s-1980s: During the 1970s and 1980s, there was significant progress in
the field of machine learning. Researchers developed a range of algorithms
and techniques, such as decision trees, linear regression, and clustering,
which are still used today.
➢ 2010s: The 2010s saw a rapid increase in the use of deep learning, which
uses neural networks with many layers to model complex patterns in data.
Deep learning has been applied successfully in a wide range of applications,
such as image recognition, speech recognition, and natural language
processing.
Overall, the history of machine learning is one of steady progress and innovation,
driven by advances in computing power, algorithmic development, and a growing
understanding of the principles of artificial intelligence.
PROBABILISTIC MODELLING IN ML
• Hidden Markov models are used in sequential data modeling, such as speech
recognition or language modeling.
• One limitation of early neural networks was that they could only learn
linearly separable functions, meaning that they could only classify data that
could be separated by a single straight line. This limitation was addressed in
the 1980s with the development of more sophisticated neural network
architectures, such as multi-layer perceptrons and convolutional neural
networks, which allowed for more complex non-linear functions to be
learned.
Kernels or kernel methods (also called Kernel functions) are sets of different types
of algorithms that are being used for pattern analysis. They are used to solve a non-
linear problem by using a linear classifier.
It’s more difficult to imagine how we can separate the data linearly and the
decision boundary. In p-dimensions, a hyperplane is a p-1 dimensional “flat”
subspace within the larger p-dimensional space. The hyperplane is simply a line in
two dimensions.
2. Adaptive Filter
It uses a linear filter that integrates the transfer function, controlled by several
methods and parameters, which we will use to fine-tune these parameters per the
development algorithm. Every adaptive filter is a digital filter due to the
complexity of the optimization algorithm.
3. Kernel perception
3D reconstruction
Bioinformatics
Geostatistics
Chemoinformatics
Handwriting recognition
Information extraction
The second main part is orthogonal in the main part and captures the remaining
variations, the rest of the first main part, and so on. Many principal components are
uncorrelated and organized so that a few principal components define most of the
actual data variations. The kernel principal component analysis extends PCA that
uses kernel methods. In contrast to the standard linear PCA, the kernel variant
works for a large number of attributes but becomes slow for a large number of
examples.
5. Spectral clustering
In the context of image classification, it’s known as segmentation-based object
categorization. Size reduction is performed before merging into smaller sizes in
Spectral Clustering, and this is accomplished by using the eigenvalue matrix for
data matching.
Its roots can be traced back to graph theory, where this method is used to identify
node communities on a graph depending on the edges that connect them. This
method is sufficiently adaptable to allow us to compile data from non-graphs too.
Soft kernel spectral clustering (SKSC) uses algorithm 1 to calculate complex initial
classification of training data. Next, the soft group assignments go calculate cosine
distance between each point and other group prototypes in the speculative space e
(l). In particular, considering the speculation of training points.
s1, …, sp, …, sk , sp ∈ sk − 1
np∑i = 1 be (1.6)
where np is the number of points assigned to cluster p during the initiation step by
KSC.
o In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of
root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
o
o For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches
the leaf node of the tree. The complete process can be better understood
using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
It enables us to combine the predictions from various learner models and build a
final predictive model having the correct prediction.
But here one question may arise if we are applying the same algorithm then how
multiple decision trees can give better predictions than a single decision tree?
Moreover, how does each decision tree capture different information from the
same data?
So, the answer to these questions is that a different subset of features is taken by
the nodes of each decision tree to select the best split. It means, that each tree
behaves differently, and hence captures different signals from the same data.
o Loss function
o Weak learners
o Additive model
1. Continuous response, y ∈ R:
o Gaussian L2 loss function
o Laplace L1 loss function
o Huber loss function, δ specified
o Quantile loss function, α specified
2. Categorical response, y ∈ {0, 1}:
o Binomial loss function
o Adaboost loss function
3. Other families of response variables:
o Loss functions for survival models
o Loss functions count data
o Custom loss functions
2. Weak Learner:
Weak learners are the base learner models that learn from past errors and help in
building a strong predictive model design for boosting algorithms in machine
learning. Generally, decision trees work as a weak learners in boosting algorithms.
Boosting is defined as the framework that continuously works to improve the
output from base models. Many gradient boosting applications allow you to
"plugin" various classes of weak learners at your disposal. Hence, decision trees
are most often used for weak (base) learners.
Continue this process until some mechanism (i.e. cross-validation) tells us to stop.
The final model here is a stagewise additive model of b individual trees:
f(x)=B∑b=1fb(x)
Hence, trees are constructed greedily, choosing the best split points based on purity
scores like Gini or minimizing the loss.
3. Additive Model:
The additive model is defined as adding trees to the model. Although we should
not add multiple trees at a time, only a single tree must be added so that existing
trees in the model are not changed. Further, we can also prefer the gradient descent
method by adding trees to reduce the loss.
In the past few years, the gradient descent method was used to minimize the set of
parameters such as the coefficient of the regression equation and weight in a neural
network. After calculating error or loss, the weight parameter is used to minimize
the error. But recently, most ML experts prefer weak learner sub-models or
decision trees as a substitute for these parameters. In which, we have to add a tree
in the model to reduce the error and improve the performance of that model. In this
way, the prediction from the newly added tree is combined with the prediction
from the existing series of trees to get a final prediction. This process continues
until the loss reaches an acceptable level or is no longer improvement required.
Instead of level-wise growth, Light GBM prefers leaf-wise growth of the nodes of
the tree. Further, in light GBM, the primary node is split into two secondary nodes
and later it chooses one secondary node to be split. This split of a secondary node
depends upon which between two nodes has a higher loss.
Hence, due to leaf-wise split, Light Gradient Boosting Machine (LGBM) algorithm
is always preferred over others where a large amount of data is given.
CATBOOST
The catboost algorithm is primarily used to handle the categorical features in a
dataset. Although GBM, XGBM, and Light GBM algorithms are suitable for
numeric data sets, Catboost is designed to handle categorical variables into
numeric data. Hence, catboost algorithm consists of an essential preprocessing step
to convert categorical features into numerical variables which are not present in
any other algorithm.
The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of
supervised learning are Risk Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which
are given below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or
Blue, etc. The classification algorithms predict the categories present in the
dataset. Some real-world examples of classification algorithms are Spam
Detection, Email filtering, etc.
b) Regression
o Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.
So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from
the data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.
2) Association
o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
Overfitting
Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because
of this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model. The
overfitted model has low bias and high variance.
Example: The concept of the overfitting can be understood by the below graph of
the linear regression output:
As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient, but in reality, it is not so. Because
the goal of the regression model to find the best fit line, but here we have not got
any best fit, so, it will generate the prediction errors.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of
training data can be stopped at an early stage, due to which the model may not
learn enough from the training data. As a result, it may fail to find the best fit of
the dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.
Example: We can understand the underfitting using below output of the linear
regression model:
As we can see from the above diagram, the model is unable to capture the data
points present in the plot.
Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines
how closely the result or predicted values match the true values of the dataset.
The model with a good fit is between the underfitted and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve
it.
As when we train our model for a time, the errors in the training data go down, and
the same happens with test data. But if we train the model for a long duration, then
the performance of the model may decrease due to the overfitting, as the model
also learn the noise present in the dataset. The errors in the test dataset start
increasing, so the point, just before the raising of errors, is the good point, and we
can stop here for achieving a good model.
UNIT – II
Deep learning is a subfield of machine learning that is inspired by the structure and
function of the human brain, and it has become increasingly popular in recent years due
to its impressive ability to solve complex problems in a wide range of domains. Deep
learning algorithms are capable of learning from large amounts of data, using artificial
neural networks that are composed of many layers of interconnected nodes or neurons.
The term "deep" refers to the depth of the neural network, meaning that it has many
layers, with each layer learning increasingly complex features of the input data. These
layers are typically trained using a form of supervised learning, where the network is
presented with input data and corresponding output data, and the weights between the
neurons are adjusted in order to minimize the difference between the predicted output
and the actual output.
Some of the most common applications of deep learning include image recognition,
speech recognition, natural language processing, and autonomous driving. Deep
learning has also been applied to many other fields, such as drug discovery, financial
modeling, and predictive maintenance.
Biological Vision:
Biological vision refers to the study of how the human visual system works.
The human visual system is responsible for processing and interpreting visual
information from the environment.
The human visual system is highly complex and has inspired the development of
deep learning models that attempt to simulate its structure and function.
The visual cortex in the brain is responsible for processing visual information,
and has a layered structure similar to deep learning neural networks.
Deep learning models, such as convolutional neural networks (CNNs), are
modeled after the structure of the visual cortex.
These models can learn to recognize complex patterns in images and video by
extracting increasingly complex features from the input data.
Biological vision has inspired the development of deep learning models for tasks
such as object recognition, image classification, and image segmentation.
One of the challenges of deep learning models inspired by biological vision is the
need for large amounts of labeled data to train the neural network.
Another challenge is the potential for overfitting, which occurs when the model
is trained too well on the training data and fails to generalize to new, unseen data.
The human visual system is highly adaptable and can learn to recognize new
objects and patterns with minimal training data, which is an area of active
research in deep learning.
The human visual system is also highly sensitive to context and can recognize
objects in a variety of different settings, which is another area of active research
in deep learning.
Deep learning models inspired by biological vision have been used for
applications such as medical diagnosis, autonomous driving, and even artistic
style transfer.
Biological vision research is continuing to inspire the development of new deep
learning models and architectures, such as attention mechanisms and capsule
networks.
By combining insights from biological and machine vision, researchers can
continue to develop more sophisticated and powerful deep learning algorithms
for a wide range of applications.
Overall, biological vision has played a significant role in the development of deep
learning and will continue to do so in the future.
Machine Vision:
Machine vision is the study of how computers can process and analyze visual
information.
It is a subfield of computer vision, which focuses on developing algorithms that
can interpret visual information.
Deep learning has enabled significant advances in machine vision, particularly in
areas such as image recognition, object detection, and autonomous driving.
Deep learning models, such as convolutional neural networks (CNNs), are
capable of analyzing images and video and identifying objects and patterns with
high accuracy.
Machine vision has numerous real-world applications, such as industrial quality
control, robotics, and security systems.
One of the challenges in machine vision is dealing with noisy or incomplete data,
which can be mitigated by using deep learning models that can learn to extract
useful features from the data.
Another challenge is the need for large amounts of labeled data to train deep
learning models, which can be time-consuming and expensive to obtain.
Machine vision algorithms are often designed to be robust to changes in lighting,
angle, and other environmental factors that can affect image quality.
Machine vision is a rapidly evolving field, with new applications and techniques
emerging on a regular basis.
One area of active research in machine vision is developing deep learning models
that can reason and make decisions based on visual input, such as autonomous
vehicles navigating complex environments.
Machine vision algorithms have been used to solve a wide range of problems,
including facial recognition, medical diagnosis, and object tracking.
Deep learning models used in machine vision can be fine-tuned to improve their
performance on specific tasks, such as identifying specific types of objects or
detecting subtle changes in images over time.
Overall, machine vision is a powerful tool that is enabling new applications and
driving innovation in a wide range of industries.
Artificial Neural Networks contain artificial neurons which are called units. These units
are arranged in a series of layers that together constitute the whole Artificial Neural
Networks in a system. A layer can have only a dozen units or millions of units as this
depends on the complexity of the system. Commonly, Artificial Neural Network has an
input layer, output layer as well as hidden layers. The input layer receives data from the
outside world which the neural network needs to analyze or learn about. Then this data
passes through one or multiple hidden layers that transform the input into data that is
valuable for the output layer. Finally, the output layer provides an output in the form of
a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another.
Each of these connections has weights that determine the influence of one unit on
another unit. As the data transfers from one unit to another, the neural network learns
more and more about the data which eventually results in an output from the output
layer.
Artificial neural networks are trained using a training set. For example, suppose you
want to teach an ANN to recognize a cat. Then it is shown thousands of different images
of cats so that the network can learn to identify a cat. Once the neural network has been
trained enough using images of cats, then you need to check if it can identify cat images
correctly. This is done by making the ANN classify the images it is provided by deciding
whether they are cat images or not. The output obtained by the ANN is corroborated by
a human-provided description of whether the image is a cat image or not. If the ANN
identifies incorrectly then back-propagation is used to adjust whatever it has learned
during training. Back-propagation is done by fine-tuning the weights of the connections
in ANN units based on the error rate obtained. This process continues until the artificial
neural network can correctly recognize a cat in an image with minimal possible error
rates.
1. Feedforward Neural Network : The feedforward neural network is one of the most
basic artificial neural networks. In this ANN, the data or the input provided travels in a
single direction. It enters into the ANN through the input layer and exits through the
output layer while hidden layers may or may not exist. So the feedforward neural
network has a front propagated wave only and usually does not have backpropagation.
2. Recurrent Neural Network : The Recurrent Neural Network saves the output of a
layer and feeds this output back to the input to better predict the outcome of the layer.
The first layer in the RNN is quite similar to the feed-forward neural network and the
recurrent neural network starts once the output of the first layer is computed. After this
layer, each unit will remember some information from the previous step so that it can
act as a memory cell in performing computations.
5. Radial basis function Neural Network : Radial basis functions are those functions that
consider the distance of a point concerning the center. RBF functions have two layers.
In the first layer, the input is mapped into all the Radial basis functions in the hidden
layer and then the output layer computes the output in the next step. Radial basis
function nets are normally used to model the data that represents any underlying trend
or function.
Applications of Artificial Neural Networks
1. Social Media : Artificial Neural Networks are used heavily in Social Media. For
example, let’s take the ‘People you may know’ feature on Facebook that suggests you
people that you might know in real life so that you can send them friend requests. Well,
this magical effect is achieved by using Artificial Neural Networks that analyze your
profile, your interests, your current friends, and also their friends and various other
factors to calculate the people you might potentially know. Another common application
of Machine Learning in social media is facial recognition. This is done by finding
around 100 reference points on the person’s face and then matching them with those
already available in the database using convolutional neural networks.
2. Marketing and Sales : When you log onto E-commerce sites like Amazon and
Flipkart, they will recommend your products to buy based on your previous browsing
history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you
restaurant recommendations based on your tastes and previous order history. This is true
across all new-age marketing segments like Book sites, Movie services, Hospitality
sites, etc. and it is done by implementing personalized marketing. This uses Artificial
Neural Networks to identify the customer likes, dislikes, previous shopping history, etc.
and then tailor the marketing campaigns accordingly. 3. Healthcare : Artificial Neural
Networks are used in Oncology to train algorithms that can identify cancerous tissue at
the microscopic level at the same accuracy as trained physicians. Various rare diseases
may manifest in physical characteristics and can be identified in their premature stages
by using Facial Analysis on the patient photos. So the full-scale implementation of
Artificial Neural Networks in the healthcare environment can only enhance the
diagnostic abilities of medical experts and ultimately lead to the overall improvement
in the quality of medical care all over the world.
4. Personal Assistants : I am sure you all have heard of Siri, Alexa, Cortana, etc. and
also heard them based on the phones you have!!! These are personal assistants and an
example of speech recognition that uses Natural Language Processing to interact with
the users and formulate a response accordingly. Natural Language Processing uses
artificial neural networks that are made to handle many tasks of these personal assistants
such as managing the language syntax, semantics, correct speech, the conversation that
is going on, etc.
Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.
Storing data on the entire network: Data that is used in traditional programming is stored
on the whole network, not on a database. The disappearance of a couple of pieces of
data in one place doesn't prevent the network from working.
Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies upon
the significance of missing data.
Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.
The duration of the network is unknown: The network is reduced to a specific value of
the error, and this value does not give us optimum results.
Deep learning is a subfield of machine learning that involves the training of deep
neural networks, which are composed of multiple layers of interconnected nodes.
The training of deep neural networks involves feeding large amounts of data into
the network, which then learns to recognize patterns and features in the data.
The data is typically divided into training, validation, and test sets. The training
set is used to train the network, the validation set is used to optimize
hyperparameters, and the test set is used to evaluate the performance of the
network.
During training, the network adjusts its weights and biases in order to minimize
a cost function, which measures the difference between the predicted output and
the true output.
Gradient descent is a common optimization algorithm used in deep learning,
which involves iteratively adjusting the weights and biases of the network in the
direction of the negative gradient of the cost function.
Backpropagation is a technique used to compute the gradient of the cost function
with respect to the weights and biases of the network, which is necessary for
gradient descent.
Stochastic gradient descent is a variant of gradient descent that randomly selects
a small batch of samples from the training set at each iteration, which can improve
convergence and reduce computation time.
Dropout is a regularization technique used to prevent overfitting in deep neural
networks, which involves randomly setting some of the nodes in the network to
zero during training.
Batch normalization is a technique used to improve the stability and performance
of deep neural networks, which involves normalizing the inputs to each layer of
the network.
Convolutional neural networks (CNNs) are a type of deep neural network
commonly used for image and video recognition, which exploit the local spatial
correlation of the data.
Recurrent neural networks (RNNs) are a type of deep neural network commonly
used for sequential data processing, which can maintain an internal memory of
previous inputs.
Long short-term memory (LSTM) networks are a variant of RNNs that can
selectively forget or remember previous inputs, which can improve their ability
to handle long-term dependencies.
Transfer learning is another technique that can be used to improve the
performance of deep networks, by using pre-trained models on similar tasks and
fine-tuning them on new tasks.
Deep networks can be trained on a variety of tasks, including image classification,
object detection, natural language processing, speech recognition, and generative
modeling.
The architecture of the network, including the number of layers, the size of the
layers, and the connectivity between layers, can have a significant impact on the
performance of the network.
Hyperparameter tuning is an important part of training deep networks, and
involves selecting the optimal values for various parameters such as learning rate,
batch size, and regularization strength.
Deep networks can require a significant amount of computational resources to
train, particularly for large datasets and complex architectures.
To accelerate training, techniques such as distributed training, parallel
computing, and hardware accelerators such as GPUs and TPUs can be used.
Visualization techniques such as t-SNE and PCA can be used to visualize the
high-dimensional representations learned by deep networks, and gain insights
into the structure of the data.
Deep networks have achieved state-of-the-art performance on a wide range of
tasks, and have been used to develop a variety of real-world applications in fields
such as healthcare, finance, and autonomous driving.
More data: One way to improve deep networks is to use more data to train them.
Larger datasets help the model learn more robust and generalized features that
can be applied to unseen data.
Better initialization: Initializing the weights and biases in a deep network can
have a big impact on its performance. Techniques such as Xavier initialization
can improve the convergence and accuracy of the network.
Regularization: Deep networks are prone to overfitting, especially when the
number of parameters is high. Regularization techniques such as Dropout or
L1/L2 regularization can help reduce overfitting and improve generalization.
Improved activation functions: The choice of activation function can have a big
impact on the performance of a deep network. Recent advances in activation
functions such as ReLU, LeakyReLU, and Swish have shown improved
performance over traditional functions such as sigmoid and tanh.
Advanced optimization algorithms: Gradient descent is the most common
optimization algorithm used in deep learning. Advanced optimization algorithms
such as Adam, Adagrad, and RMSprop can help speed up the convergence and
improve accuracy.
Network architecture: The choice of network architecture can have a significant
impact on the performance of a deep network. Popular architectures such as
ResNet, VGG, and Inception have shown improved performance on various
tasks.
Transfer learning: Transfer learning is a technique where a pre-trained network
is used as a starting point for a new task. This can help improve performance and
reduce the amount of data required for training.
Data augmentation: Data augmentation involves generating new training
examples by applying various transformations to the existing data. This can help
improve the performance of the network by reducing overfitting and improving
generalization.
Hyperparameter tuning: The performance of a deep network can be highly
dependent on the choice of hyperparameters such as learning rate, batch size, and
regularization strength. Careful tuning of these hyperparameters can help
improve the performance of the network.
Ensemble methods: Ensemble methods involve combining the predictions of
multiple deep networks to improve performance.
UNIT III
Anatomy of Neural networks:
Neural networks are a type of machine learning model that are inspired by the
structure and function of the human brain. They consist of layers of
interconnected nodes, called neurons, which process and transmit information.
As a result, we can say that ANNs are composed of multiple nodes. That imitate
biological neurons of the human brain. Although, we connect these neurons by
links. Also, they interact with each other. Although, nodes are used to take input
data. Further, perform simple operations on the data. As a result, these operations
are passed to other neurons. Also, output at each node is called its activation or
node value.
Input layer: This is the first layer of the network where the input data is received.
Each neuron in this layer corresponds to one input feature.
Hidden layer(s): These are the layers that come between the input and output
layers. They process the input data and learn to recognize patterns in the data. A
neural network can have one or more hidden layers.
Output layer: This is the final layer of the network where the output data is
produced. Each neuron in this layer corresponds to one output feature.
Neurons: These are the basic units of a neural network. They receive input from
other neurons and perform a mathematical operation on the input to produce an
output.
Weights: These are the values that determine the strength of the connections
between neurons. They are adjusted during training to optimize the performance
of the network.
Bias: Bias is an additional parameter that is added to the input of each neuron. It
allows the neuron to adjust the output based on a specific threshold.
Activation function: This is a mathematical function that is applied to the output
of each neuron. It allows the neuron to produce a non-linear output that can model
complex relationships between input and output data.
Backpropagation: This is the process of adjusting the weights and biases of the
network during training to minimize the difference between the predicted output
and the actual output. It is an iterative process that continues until the network
produces accurate predictions.
As each link is associated with weight. Also, they are capable of learning. That takes
place by altering weight values. Hence, the following illustration shows a simple ANN
–
a. Feedforward ANN
In this network flow of information is unidirectional. A unit used to send
information to another unit that does not receive any information. Also, no
feedback loops are present in this. Although, used in recognition of a pattern. As
they contain fixed inputs and outputs.
Figure 2 :Feed forward neural network structure
Introduction to keras:
Deep learning is becoming more popular in data science fields like robotics, artificial
intelligence (AI), audio & video recognition and image recognition. Artificial neural
network is the core of deep learning methodologies. Deep learning is supported by
various libraries such as Theano, TensorFlow, Caffe, Mx net etc., Keras is one of the
most powerful and easy to use python library, which is built on top of popular deep
learning libraries like TensorFlow, Theano, etc., for creating deep learning models.
Overview of Keras:
Features:
Benefits:
Keras is highly powerful and dynamic framework and comes up with the following
advantages: • Larger community support.
• Easy to test.
• Keras neural networks are written in Python which makes things simpler.
• Keras supports both convolution and recurrent networks.
TensorFlow :
TensorFlow is an open-source machine learning library used for numerical
computational tasks developed by Google. Keras is a high level API built on top
of TensorFlow or Theano. We know already how to install TensorFlow using pip.
If it is not installed, you can install using the below command:
Once we execute keras, we could see the configuration file is located at your home
directory inside and go to .keras/keras.json.
keras.json:
{
"image_data_format":
"channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Here,
• image_data_format represent the data format.
• epsilon represents numeric constant. It is used to avoid DivideByZero error.
• floatx represent the default data type float32. You can also change it to
float16
or float64 using set_floatx() method.
• backend denotes the current backend.
Theano:
Theano is an open-source deep learning library that allows you to evaluate
multidimensional arrays effectively. We can easily install using the below
command:
keras.json
{
"image_data_format":
"channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
Now save your file, restart your terminal and start keras, your backend will be
changed.
PC Hardware Setup:
First of all to perform machine learning and deep learning on any dataset, the
software/program requires a computer system powerful enough to handle the
computing power necessary. So the following is required:
Note: In the case of laptops, the ideal option would be to purchase a gaming laptop
from any vendor deemed suitable such as Alienware, ASUS, Lenovo Legion, Acer
Predator, etc.
Nvidia GeForce Experience
This tool is designed to update your NVIDIA GPU drivers and it is far easier to do it
like this and it is highly recommended to be installed if you have an NVIDIA GPU.
Table of Contents
1. Download Anaconda
2. Install Anaconda & Python
3. Start and Update Anaconda
4. Install CUDA Toolkit & cuDNN
5. Create an Anaconda Environment
6. Install Deep Learning API’s (TensorFlow & Keras)
In this step, we will download the Anaconda Python package for your platform.
In this step, we will install the Anaconda Python software on your system.
Installation is very easy and quick once you download the setup. Open the setup and
follow the wizard instructions.
#Note: It will automatically install Python and some basic libraries with it.
Open Anaconda Prompt to type the following command(s). Don’t worry Anaconda
Prompt works the same as cmd.
#Version Support: Here is a guide to check that if your version supports your Nvidia
Graphic Card
2. Download cuDNN Download
Download the latest version of cuDNN. Choose your version depending on your
Operating System and CUDA. Membership registration is required. Don’t worry you
can easily create an account using your email.
C:\cudnn-9.0-windows10-x64-v7
1. Open Run dialogue using (Win + R) and run the command sysdm.cpl
2. In Window-10 System Properties, please select the Tab Advanced.
3. Select Environment Variables
4. Add the following path to your Environment.
C:\cudnn-9.0-windows10-x64-v7\cuda\bin
Step 6: Create an Anaconda Environment
Here we will create a new anaconda environment for our specific usage so that it will
not affect the root of Anaconda.
1. Create a conda environment named “tensorflow” (you can change the name) by
invoking the following command:
activate tensorflow
(tensorflow)C:> # Your prompt should change
In this step, we will install Python libraries used for deep learning, specifically:
TensorFlow, and Keras.
1. TensorFlow
=> For installing TensorFlow, Open Anaconda Prompt to type the following
commands.
If your machine or system is the only CPU supported you can install CPU version for
basic learning and practice.
=> You can test the installation by running this program on shell:
2. Keras
Keras is a high-level neural networks API, written in Python and capable of running
on top of TensorFlow, CNTK, or Theano.
=> For installing Keras Open Anaconda Prompt to type the following commands.
=> Let’s try running Mnist_Mlp.Py in your prompt. you can use other examples as
well.
There are some other famous libraries like Pytorch, Theano, and Caffe2 you can use
as per your choice and use.
Classification is the process of dividing a set of data into distinct classes. It may be
applied to both organized and unstructured data. Predicting the class of data points is
the first step in the procedure. Target, label, and categories are common terms for the
classes.
Approximating the mapping function from discrete input variables to discrete output
variables is the problem of classification predictive modeling. The basic objective is to
figure out which category or class the new data belongs in.
Binary Classification – This is what we’ll discuss a bit more in-depth here.
Classification problems with two class labels are referred to as binary classification. In
most binary classification problems, one class represents the normal condition and the
other represents the aberrant condition.
Multi-Class Classification – Classification jobs with more than two class labels are
referred to as multi-class classification. Multi-class classification, unlike binary
classification, does not distinguish between normal and pathological results. Instead,
examples are assigned to one of a number of pre-defined classes.
For example, the normal class label would be that a patient has the disease, and the
abnormal class label would be that they do not, or vice-versa.
There are quite a few different algorithms used in binary classification. The two that
are designed with only binary classification in mind (meaning they do not support
more than two class labels) are Logistic Regression and Support Vector Machines. A
few other algorithms are: Nearest Neighbours, Decision Trees, and Naive Bayes.
CNNs consist of multiple layers, including convolutional layers, pooling layers, and
fully connected layers. The input to a CNN is a 3-dimensional array representing an
image, where each dimension corresponds to the width, height, and color channels (red,
green, and blue).
Convolutional layers are the most important component of CNNs. They consist of a set
of filters that slide over the input image, performing a mathematical operation called
convolution. The filters are learned during training and are used to extract features from
the input image, such as edges, corners, and textures.
Pooling layers are used to reduce the dimensionality of the feature maps produced by
the convolutional layers. They do this by downsampling the feature maps using
operations like max pooling or average pooling, which reduces the spatial size of the
feature maps.
Fully connected layers are used to produce the final output of the CNN. They take the
feature maps produced by the convolutional and pooling layers and flatten them into a
1-dimensional array. This array is then passed through a series of fully connected layers
that produce the final output of the network, which is typically a probability distribution
over the possible classes.
During training, the parameters of the CNN, including the weights and biases of the
filters and fully connected layers, are optimized using an algorithm called
backpropagation. The goal of the optimization is to minimize a loss function that
measures the difference between the predicted output of the network and the true output.
CNNs have been very successful in a wide range of computer vision tasks, including
image classification, object detection, and segmentation. They have also been used in
natural language processing and speech recognition tasks.
Representation learning
Representation learning is a subfield of machine learning that aims to automatically
discover useful representations of input data. These representations are typically in the
form of feature vectors or embeddings that capture important characteristics of the
input data. The goal of representation learning is to learn a set of features that are both
informative and robust, and that can be used as input to downstream tasks such as
classification, clustering, or retrieval.
Convolutional layers
A convolutional layer is a key building block of convolutional neural networks (CNNs),
which are widely used in image recognition, natural language processing, and other
applications. The purpose of the convolutional layer is to extract features from the input
data using a set of learnable filters called kernels or weights.
There are several types of convolutional layers, each with a different function:
Convolutional layer: This is the most common type of convolutional layer. It applies a
set of filters to the input data, producing a set of output feature maps.
Pooling layer: This layer reduces the size of the input data by downsampling it. The
most common type of pooling is max pooling, which takes the maximum value of each
local region of the input.
Strided convolutional layer: This layer applies convolution with a larger stride than
usual, effectively reducing the resolution of the output feature maps.
Depthwise separable convolutional layer: This layer applies two separate convolutional
operations, one to each channel of the input data, before combining them into a single
output.
Dilated convolutional layer: This layer increases the receptive field of the filters by
inserting gaps between the weights.
Transposed convolutional layer: This layer is used for up sampling the feature maps,
and is sometimes called a deconvolutional layer. It applies a set of filters to the input
data, but instead of sliding them over the input, it slides them over the output, effectively
reversing the convolution operation.
Stride and Padding: The stride and padding in a multi-channel convolution operation
work the same way as in a single-channel convolution operation. The stride determines
the number of pixels by which the filter is shifted each time it is convolved with the
input data. The padding refers to the addition of extra rows and columns of zeros around
the edges of the input data before convolution.
The working of an RNN can be understood with the help of the below example:
Example: Suppose there is a deeper network with one input layer, three hidden layers,
and one output layer. Then like other neural networks, each hidden layer will have its
own set of weights and biases, let’s say, for hidden layer 1 the weights and biases are
(w1, b1), (w2, b2) for the second hidden layer, and (w3, b3) for the third hidden layer.
This means that each of these layers is independent of the other, i.e. they do not
memorize the previous outputs.
Now the RNN will do the following:
• Hence these three layers can be joined together such that the weights and bias of
all the hidden layers are the same, in a single recurrent layer.
where:
where:
whh -> weight at recurrent neuron
wxh -> weight at input neuron
Yt -> output
Why -> weight at output layer
1. An RNN remembers each and every piece of information through time. It is useful
in time series prediction only because of the feature to remember previous inputs
as well. This is called Long Short-Term Memory.
2. Recurrent neural networks are even used with convolutional layers to extend the
effective pixel neighbourhood.
PyTorch Tensor:
PyTorch is an open-source machine learning framework that is widely used for
building and training neural networks. One of the fundamental data structures in
PyTorch is the tensor.
A tensor in PyTorch is a multi-dimensional array that can hold scalar values, vectors,
matrices, and higher-dimensional arrays. Tensors can be created in PyTorch using
various methods, such as:
• Directly specifying the values of the tensor using Python lists or NumPy arrays
• Creating tensors filled with zeros or ones using the torch.zeros() or torch.ones()
functions, respectively
• Creating tensors with random values using the torch.randn() function
• Creating identity matrices using the torch.eye() function
Once a tensor is created, you can perform various operations on it, such as:
Overall, tensors are a critical building block of PyTorch, and understanding how to
create and manipulate them is essential for building and training neural networks using
this framework.
Tensors in PyTorch are similar to NumPy’s n-dimensional arrays which can also be
used with GPUs. Performing operations on these tensors is almost similar to
performing operations on NumPy arrays. This makes PyTorch very user-friendly and
easy to learn.
we built a simple neural network to solve a case study. We got a benchmark accuracy
of around 65% on the test set using our simple model. Now, we will try to improve
this score using Convolutional Neural Networks.
Understanding the Problem Statement:
Let me quickly summarize the problem statement. Our task is to identify the type of
apparel by looking at a variety of apparel images. There are a total of 10 classes in
which we can classify the images of apparels:
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
The dataset contains a total of 70,000 images. 60,000 of these images belong to the
training set and the remaining 10,000 are in the test set. All the images are grayscale
images of size (28*28). The dataset contains two folders – one each for the training set
and the test set. In each folder, there is a .csv file that has the id of the image and its
corresponding label, and a folder containing the images for that particular set.
Loading the dataset
Now, let’s load the dataset, including the train, test and sample submission file:
• The train file contains the id of each image and its corresponding label
• The test file, on the other hand, only has the ids and we have to predict their
corresponding labels
• The sample submission file will tell us the format in which we have to submit
the predictions
We will read all the images one by one and stack them one over the other in an array.
We will also divide the pixels of images by 255 so that the pixel values of images
comes in the range [0,1]. This step helps in optimizing the performance of our model.
Load the images:
As you can see, we have 60,000 images, each of size (28,28), in the training set. Since
the images are in grayscale format, we only have a single-channel and hence the shape
(28,28).
now call this model, and define the optimizer and the loss function for the model:
This is the architecture of the model. We have two Conv2d layers and a Linear layer.
Next, we will define a function to train the model:
Finally, we will train the model for 25 epochs and store the training and validation
losses:
We can see that the validation loss is decreasing as the epochs are increasing. Let’s
visualize the training and validation losses by plotting them:
We can clearly see that the training and validation losses are in sync. It is a good sign
as the model is generalizing well on the validation set.
check the accuracy of the model on the training and validation set:
As we saw with the losses, the accuracy is also in sync here – we got ~72% on the
validation set as well.
We will load all the images in the test set, do the same pre-processing steps as we did
for the training set and finally generate predictions.
So, let’s start by loading the test images:
Now, we will do the pre-processing steps on these images similar to what we did for
the training images earlier:
Replace the labels in the sample submission file with the predictions and finally save
the file and submit it on the leader board:
You will see a file named submission.csv in your current directory. You just have to
upload it on the solution checker of the problem page which will generate the score.
Our CNN model gave us an accuracy of around 71% on the test set.
UNIT-V
Interactive applications of deep learning refer to systems that use deep learning algorithms to
enable users to interact with them in a meaningful way. These applications are designed to
provide real-time feedback or personalized recommendations based on the user's input,
behaviour ,or preferences.
It refer to computer programs or systems that allow users to interact with them in real-time,
providing immediate feedback and responses to user inputs. With the increasing availability
of high-speed internet, mobile devices, and advanced software, interactive applications have
become a ubiquitous part of our daily lives.
Interactive applications can take many forms, ranging from simple web-based tools to complex
software systems. Some examples of interactive applications include:
1.Gaming: Interactive games are one of the most common types of interactive applications.
These games allow players to interact with virtual environments, objects, and other players in
real-time.
2.Virtual assistants: Virtual assistants like Siri, Alexa, and Google Assistant are interactive
applications that allow users to interact with them using voice commands. These applications
use natural language processing and machine learning techniques to understand user requests
and provide relevant responses.
3.Social media platforms: Social media platforms like Facebook, Twitter, and Instagram are
interactive applications that allow users to interact with each other by sharing messages,
photos, and videos.
4.E-commerce websites: E-commerce websites like Amazon and eBay are interactive
applications that allow users to search for products, compare prices, and make purchases.
5.Data visualization tools: Data visualization tools like Tableau and Power BI are interactive
applications that allow users to explore and analyze data by creating visualizations and
dashboards.
Machine Vision:
Inspection and quality control: Machine vision systems can be used to inspect and
evaluate the quality of products, such as printed circuit boards, automotive parts, and
food products.
Robotics and automation: Machine vision can be used to guide robots and automated
systems, allowing them to accurately identify and manipulate objects.
Object recognition and tracking: Machine vision systems can be used to identify and
track objects in real-time, making them useful for surveillance and security applications.
Medical imaging: Machine vision techniques can be used to analyze medical images,
such as X-rays and MRI scans, to assist in diagnosis and treatment planning
Natural Language Processing (NLP) is a field of study that focuses on the interaction
between computers and human language.
• Natural language Pre-processing refers to the process of preparing text data for NLP
tasks, such as text classification, sentiment analysis, and machine translation.
2. Stop-word removal: Stop words are common words such as "and," "the," and "in,"
which do not carry significant meaning in the context of a sentence. Removing these words
can improve the accuracy of NLP models.
3. Stemming and lemmatization: These techniques involve reducing words to their root
form, which can help to reduce the number of unique words in a text dataset and improve
the accuracy of text analysis.
4. Part-of-speech tagging: This involves assigning each word in a text dataset to its
appropriate part of speech, such as noun, verb, or adjective.
5. Named entity recognition: This technique involves identifying and categorizing named
entities, such as people, places, and organizations, within a text dataset.
Spam Filters: One of the most irritating things about email is spam. Gmail uses natural
language processing (NLP) to discern which emails are legitimate and which are spam.
These spam filters look at the text in all the emails you receive and try to figure out what it
means to see if it’s spam or not.
Algorithmic Trading: Algorithmic trading is used for predicting stock market conditions.
Using NLP, this technology examines news headlines about companies and stocks and
attempts to comprehend their meaning in order to determine if you should buy, sell, or hold
certain stocks.
Questions Answering: NLP can be seen in action by using Google Search or Siri Services.
A major use of NLP is to make search engines understand the meaning of what we are
asking and generate natural language in return to give us the answers.
Summarizing Information: On the internet, there is a lot of information, and a lot of it comes
in the form of long documents or articles. NLP is used to decipher the meaning of the data
and then provides shorter summaries of the data so that humans can comprehend it more
quickly.
Future Scope:
Bots: Chatbots assist clients to get to the point quickly by answering inquiries and referring
them to relevant resources and products at any time of day or night. To be effective,
chatbots must be fast, smart, and easy to use, To accomplish this, chatbots employ NLP to
understand language, usually over text or voice-recognition interactions
Supporting Invisible UI: Almost every connection we have with machines involves human
communication, both spoken and written. Amazon’s Echo is only one illustration of the
trend toward putting humans in closer contact with technology in the future. The concept
of an invisible or zero user interface will rely on direct communication between the user
and the machine, whether by voice, text, or a combination of the two. NLP helps to make
this concept a real-world thing.
Smarter Search: NLP’s future also includes improved search, something we’ve been
discussing at Expert System for a long time. Smarter search allows a chatbot to understand
a customer’s request can enable “search like you talk” functionality (much like you could
query Siri) rather than focusing on keywords or topics. Google recently announced that
NLP capabilities have been added to Google Drive, allowing users to search for documents
and content using natural language.
Future Enhancements:
Companies like Google are experimenting with Deep Neural Networks (DNNs) to push the
limits of NLP and make it possible for human-to-machine interactions to feel just like
human-to-human interactions.
Basic words can be further subdivided into proper semantics and used in NLP algorithms.
The NLP algorithms can be used in various languages that are currently unavailable such
as regional languages or languages spoken in rural areas etc.
Generative Adversarial Networks (GANs) are a type of deep learning model that consists
of two neural networks, a generator and a discriminator, that are trained together in a game-
like manner.
The generator network takes random noise as input and produces output that is meant to
resemble data from a target distribution, such as images or audio. The discriminator
network takes both real data from the target distribution and generated data from the
generator network as input and outputs a probability score indicating whether the input is
real or fake. The goal of the generator network is to produce output that can fool the
discriminator network into thinking it is real, while the goal of the discriminator network
is to correctly classify whether the input is real or fake.
During training, the generator and discriminator networks are optimized in a minimax
game-like fashion, where the generator seeks to maximize the probability of the
discriminator classifying its generated output as real, while the discriminator seeks to
minimize the probability of misclassifying generated output as real. This game-like
competition between the two networks leads to the generator network learning to produce
realistic output that is similar to the target distribution.
GANs can be used for a variety of applications, such as image synthesis, style transfer, and
data augmentation. One of the main advantages of GANs is that they can generate novel
data that is similar to the target distribution, which can be useful in situations where there
is a limited amount of real data available. However, GANs can be difficult to train and
require careful tuning of hyperparameters and regularization techniques to prevent
overfitting.
Overall, GANs are a powerful and popular deep learning model for generating realistic
data, and have been applied to a wide range of applications in computer vision, natural
language processing, and audio synthesis.
Deep Reinforcement Learning
Deep reinforcement learning is a subfield of machine learning that combines deep learning
with reinforcement learning to enable machines to learn and make decisions in complex
environments.
One of the main challenges in deep reinforcement learning is the problem of exploration-
exploitation trade-off. The agent needs to explore the environment to learn about the
optimal policy, but at the same time it needs to exploit the learned policy to maximize the
reward. Finding the right balance between exploration and exploitation is essential for the
success of the agent in the long term.
Deep reinforcement learning has been successfully applied in a wide range of applications,
such as game playing, robotics, and autonomous driving. Some of the most famous
applications of deep reinforcement learning include AlphaGo, which defeated the world
champion in the game of Go, and OpenAI Five, which defeated a team of human
professional players in the game of Dota 2.
Deep learning research is a field of study that focuses on developing and improving
algorithms and techniques for training deep neural networks. Deep learning refers to the
subset of machine learning that uses deep neural networks to learn and make predictions
from data. Deep neural networks are composed of multiple layers of interconnected nodes
that process information hierarchically, allowing the network to learn complex
representations of the data.
Deep learning research involves developing new architectures and algorithms for training
deep neural networks, as well as improving existing techniques. It also involves
investigating the theoretical foundations of deep learning, such as understanding the
properties of deep neural networks and their optimization landscapes.
Deep learning research is motivated by the desire to develop more accurate and efficient
machine learning models that can solve a wide range of tasks, such as image recognition,
natural language processing, speech recognition, and robotics. Deep learning has enabled
breakthroughs in these areas and has been applied to many real-world applications, such as
self-driving cars, personalized medicine, and recommendation systems.
1. Developing more efficient and scalable deep learning algorithms, such as those that can
work with smaller datasets or require less computing power.
2. Developing more interpretable and transparent deep learning models, which can help
improve trust and accountability in AI systems.
Autoencoders
A typical use of a Neural Network is a case of supervised learning. It involves training data
that contains an output label. The neural network tries to learn the mapping from the given
input to the given output label. But what if the output label is replaced by the input vector
itself? Then the network will try to find the mapping from the input to itself. This would be
the identity function which is a trivial mapping. But if the network is not allowed to simply
copy the input, then the network will be forced to capture only the salient features. This
constraint opens up a different field of applications for Neural Networks which was
unknown. The primary applications are dimensionality reduction and specific data
compression. The network is first trained on the given input. The network tries to
reconstruct the given input from the features it picked up and gives an approximation to the
input as the output. The training step involves the computation of the error and
backpropagating the error. The typical architecture of an Auto-encoder resembles a
bottleneck.
The encoder part of the network is used for encoding and sometimes even for data compression
purposes although it is not very effective as compared to other general compression techniques
like JPEG. Encoding is achieved by the encoder part of the network which has a decreasing
number of hidden units in each layer. Thus this part is forced to pick up only the most
significant and representative features of the data. The second half of the network performs
the Decoding function. This part has an increasing number of hidden units in each layer and
thus tries to reconstruct the original input from the encoded data. Thus Auto-encoders are an
unsupervised learning technique.
Example: See the below code, in autoencoder training data, is fitted to itself. That’s why
instead of fitting X_train to Y_train we have used X_train in both places.
Training of an Auto-encoder for data compression: For a data compression procedure, the most
important aspect of the compression is the reliability of the reconstruction of the compressed
data. This requirement dictates the structure of the Auto-encoder as a bottleneck. Step 1:
Encoding the input data The Auto-encoder first tries to encode the data using the initialized
weights and biases.
Step 2: Decoding the input data The Auto-encoder tries to reconstruct the original input from
the encoded data to test the reliability of the encoding.
Step 3: Backpropagating the error After the reconstruction, the loss function is computed to
determine the reliability of the encoding. The error generated is backpropagated.
The above-described training process is reiterated several times until an acceptable level of
reconstruction is reached.
After the training process, only the encoder part of the Auto-encoder is retained to encode a
similar type of data used in the training process. The different ways to constrain the network
are:-
Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then
the network will be forced to pick up only the representative features of the data thus encoding
the data.
Regularization: In this method, a loss term is added to the cost function which encourages the
network to train in ways other than copying the input.
Denoising: Another way of constraining the network is to add noise to the input and teach the
network how to remove the noise from the data.
Tuning the Activation Functions: This method involves changing the activation functions
of various nodes so that a majority of the nodes are dormant thus effectively reducing the
size of the hidden layers.
Sparse Auto-encoder: This type of auto-encoder typically contains more hidden units than
the input but only a few are allowed to be active at once. This property is called the sparsity
of the network. The sparsity of the network can be controlled by either manually zeroing
the required hidden units, tuning the activation functions or by adding a loss term to the
cost function.
Variational Auto-encoder: This type of auto-encoder makes strong assumptions about the
distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator
in the training process. It assumes that the data is generated by a Directed Graphical Model
and tries to learn an approximation to q_{\phi}(z|x) to the conditional property
q_{\theta}(z|x) where \phi and \theta are the parameters of the encoder and the decoder
respectively.
Below is the basic intuition code of how to build the autoencoder model and fitting X_train
to itself.
#The model will contract in the encoder then expand in the decoder.
autoencoder.compile(loss="mse", optimizer=keras.optimizers.SGD(lr=0.1))
codings = encoder.predict(X_train)
decodings = decoder.predict(codings)
1. Variational Autoencoders (VAEs): VAEs are a type of autoencoder that learn to encode
data into a latent representation and decode the representation back to the original data
space. However, VAEs differ from traditional autoencoders by enforcing a constraint on
the distribution of the latent representation. VAEs use the constraint to learn a smooth and
continuous latent space that can be used for generating new samples.
2. Generative Adversarial Networks (GANs): GANs are a type of deep neural network
architecture that learn to generate new samples by playing a two-player game. One player,
the generator, learns to generate new samples that are similar to the training data, while the
other player, the discriminator, learns to distinguish between real and generated samples.
The two players are trained together in an adversarial manner, where the generator tries to
fool the discriminator, and the discriminator tries to accurately distinguish between real and
generated samples.
Deep generative models have many applications, such as image synthesis, speech synthesis,
and text generation. They are also used for tasks such as data augmentation, where synthetic
samples can be generated to increase the size of the training data.
Boltzman Machines
k - Boltzmann constant
∑e(-∈j/kT) - Sum of values for all possible states of the system
Boltzmann Distribution describes different states of the system and thus Boltzmann machines
create different states of the machine using this distribution. From the above equation, as the
energy of system increases, the probability for the system to be in state ‘i’ decreases. Thus, the
system is the most stable in its lowest energy state (a gas is most stable when it spreads). Here,
in Boltzmann machines, the energy of the system is defined in terms of the weights of
synapses. Once the system is trained and the weights are set, the system always tries to find
the lowest energy state for itself by adjusting the weights.
In a full Boltzmann machine, each node is connected to every other node and hence the
connections grow exponentially. This is the reason we use RBMs. The restrictions in the node
connections in RBMs are as follows –
Z - sum if values for all possible states Suppose that we are using our RBM for building a
recommender system that works on six (6) movies. RBM learns how to allocate the hidden
nodes to certain features. By the process of Contrastive Divergence, we make the RBM close
to our set of movies that is our case or scenario. RBM identifies which features are important
by the training process. The training data is either 0 or 1 or missing data based on whether a
user liked that movie (1), disliked that movie (0) or did not watch the movie (missing data).
RBM automatically identifies important features. Contrastive Divergence:
RBM adjusts its weights by this method. Using some randomly assigned initial weights, RBM
calculates the hidden nodes, which in turn use the same weights to reconstruct the input nodes.
Each hidden node is constructed from all the visible nodes and each visible node is
reconstructed from all the hidden node and hence, the input is different from the reconstructed
input, though the weights are the same. The process continues until the reconstructed input
matches the previous input. The process is said to be converged at this stage. This entire
procedure is known as Gibbs Sampling. Gibb’s Sampling.
The Gradient Formula gives the gradient of the log probability of the certain state of the
system with respect to the weights of the system. It is given as follows –
The above equations tell us – how the change in weights of the system will change the log
probability of the system to be a particular state. The system tries to end up in the lowest
possible energy state (most stable). Instead of continuing the adjusting of weights process until
the current input matches the previous one, we can also consider the first few pauses only. It
is sufficient to understand how to adjust our curve so as to get the lowest energy state.
Therefore, we adjust the weights, redesign the system and energy curve such that we get the
lowest energy for the current position. This is known as the Hinton’s shortcut.
Consider – Mary watches four movies out of the six available movies and rates four of them.
Say, she watched m1, m3, m4 and m5 and likes m3, m5 (rated 1) and dislikes the other two,
that is m1, m4 (rated 0) whereas the other two movies – m2, m6 are unrated. Now, using our
RBM, we will recommend one of these movies for her to watch next. Say –
Mary likes m3, m5 and they are of genre ‘Drama,’ she probably likes ‘Drama’ movies.
Mary dislikes m1, m4 and they are of action genre, she probably dislikes ‘Action’
movies.
Mary likes m3, m5 and they have won an ‘Oscar’, she probably likes an ‘Oscar’ movie.
Since ‘Dicaprio’ acted in m5 and Mary likes it, she will probably like a movie in which
‘Dicaprio’ acted.
Mary does not like m4 which is directed by Tarantino, she probably dislikes any movie
directed by ‘Tarantino’.
Therefore, based on the observations and the details of m2, m6; our RBM recommends m6 to
Mary (‘Drama’, ‘Dicaprio’ and ‘Oscar’ matches both Mary’s interests and m6). This is how an
RBM works and hence is used in recommender systems.
Suppose we stack several RBMs on top of each other so that the first RBM outputs are the
input to the second RBM and so on. Such networks are known as Deep Belief Networks. The
connections within each layer are undirected (since each layer is an RBM). Simultaneously,
those in between the layers are directed (except the top two layers – the connection between
the top two layers is undirected). There are two ways to train the DBNs-
1. Greedy Layer-wise Training Algorithm – The RBMs are trained layer by layer. Once
the individual RBMs are trained (that is, the parameters – weights, biases are set), the
direction is set up between the DBN layers.
2. Wake-Sleep Algorithm – The DBN is trained all the way up (connections going up –
wake) and then down the network (connections going down — sleep).
Therefore, we stack the RBMs, train them, and once we have the parameters trained, we make
sure that the connections between the layers only work downwards (except for the top two
layers).
DBMs are similar to DBNs except that apart from the connections within layers, the
connections between the layers are also undirected (unlike DBN in which the connections
between layers are directed). DBMs can extract more complex or sophisticated features and
hence can be used for more complex tasks.
Restricted Boltzmann Machines (RBMs) are a type of neural network that belong to the
family of Boltzmann machines. RBMs are used for unsupervised learning and are
particularly useful for dimensionality reduction, feature learning, and collaborative
filtering.
An RBM consists of two layers of neurons, visible and hidden, with connections between
them. The neurons in each layer are binary units that can be in one of two states (on or off).
The connections between the neurons are weighted, with each connection having a
corresponding weight. The RBM learns to adjust the weights of the connections between
the neurons in order to represent the input data in a more compact and meaningful way.
During training, the RBM learns to reconstruct the input data by adjusting the weights of
the connections between the visible and hidden layers. This is done by minimizing the
difference between the input data and the reconstructed data using a process called
contrastive divergence.
One of the key features of RBMs is that they are "restricted" because there are no
connections between neurons within the same layer. This means that RBMs can learn to
represent complex relationships between the inputs without being affected by the
correlations within the same layer.
RBMs have many applications, including image and audio recognition, natural language
processing, and recommendation systems. They are particularly useful in deep learning
because they can be used as building blocks for larger and more complex architectures,
such as deep belief networks (DBNs) and deep autoencoders.
Deep Belief Networks (DBNs) are a type of neural network that are composed of multiple
layers of Restricted Boltzmann Machines (RBMs). DBNs are used for unsupervised
learning and are particularly effective in modeling high-dimensional data such as images,
speech, and natural language.
DBNs are composed of multiple layers of RBMs, with the first layer being the visible layer
and the subsequent layers being hidden layers. Each RBM in the DBN learns to extract
higher-level features from the output of the previous RBM. The output of the last RBM is
used as input to a supervised learning algorithm, such as a classifier, to produce a final
prediction.
The RBMs in a DBN are trained in an unsupervised manner using Contrastive Divergence,
a type of stochastic gradient descent algorithm that iteratively adjusts the weights of the
connections between the neurons. Once the RBMs are trained, the entire DBN is fine-tuned
using backpropagation, which is a supervised learning algorithm that adjusts the weights
based on the error between the predicted output and the true output.
DBNs have been used in a variety of applications, including speech recognition, image
recognition, natural language processing, and recommender systems. One of the key
advantages of DBNs is that they can learn to extract hierarchical representations of the
input data, which makes them particularly useful in modeling complex relationships in
high-dimensional data.