0% found this document useful (0 votes)
6 views

deep learning

The document provides an overview of deep learning fundamentals, artificial intelligence (AI) concepts, and the history of machine learning. It covers key topics such as neural networks, training methods, types of AI, and the evolution of machine learning techniques over the decades. Additionally, it discusses probabilistic modeling, early neural networks, and kernel methods used in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

deep learning

The document provides an overview of deep learning fundamentals, artificial intelligence (AI) concepts, and the history of machine learning. It covers key topics such as neural networks, training methods, types of AI, and the evolution of machine learning techniques over the decades. Additionally, it discusses probabilistic modeling, early neural networks, and kernel methods used in machine learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

UNIT I

1 .FUNDAMENTALS OF DEEP LEARNING


Fundamentals of deep learning refer to the basic concepts and principles that
underlie the field of deep learning. These include:

Neural Networks: Deep Learning is a type of Machine Learning that involves


training artificial neural networks to perform tasks such as image classification,
natural language processing, and speech recognition.

Training: Deep Learning models are trained by feeding large amounts of data into
a neural network, and adjusting the weights of the network through a process
known as backpropagation.

Layers: Deep neural networks are composed of multiple layers of interconnected


nodes, each performing a specific function. The input layer receives the raw data,
and the output layer produces the final prediction.

Activation Functions: Each node in a neural network applies an activation


function to its inputs, transforming the output to a desired range. Common
activation functions include sigmoid, ReLU, and softmax.

Convolutional Neural Networks (CNNs): CNNs are a type of neural network


specifically designed for image processing tasks. They use convolutional layers to
extract features from the input image, and pooling layers to downsample the
features.

Recurrent Neural Networks (RNNs): RNNs are a type of neural network


designed for processing sequences of data, such as time-series data or natural
language text. They use feedback connections to store information about previous
inputs and use this to make predictions.

Backpropagation: Backpropagation is the process of adjusting the weights of a


neural network in order to minimize the error between the predicted output and the
actual output.
Overfitting: Overfitting occurs when a model is trained too well on the training
data and begins to perform poorly on new, unseen data. Regularization techniques
such as dropout and early stopping can help prevent overfitting.

Transfer Learning: Transfer Learning is a technique where a pre-trained model is


used as a starting point for a new task, rather than training a new model from
scratch. This can greatly reduce the amount of data and computing resources
required for training.

GPU Acceleration: Deep Learning models require a lot of computational power,


and GPUs are well-suited for this type of work. GPUs can accelerate the training
process by several orders of magnitude compared to CPUs.

2 .ARTIFICIAL INTELLIGENCE

Artificial Intelligence is composed of two words Artificial and Intelligence, where


Artificial defines "man-made," and intelligence defines "thinking power", hence AI
means "a man-made thinking power."

So, we can define AI as:

"It is a branch of computer science by which we can create intelligent machines


which can behave like a human, think like humans, and able to make decisions."

Artificial Intelligence exists when a machine can have human based skills such as
learning, reasoning, and solving problems

With Artificial Intelligence you do not need to preprogram a machine to do some


work, despite that you can create a machine with programmed algorithms which
can work with own intelligence, and that is the awesomeness of AI.

It is believed that AI is not a new technology, and some people says that as per
Greek myth, there were Mechanical men in early days which can work and behave
like humans.
Why Artificial Intelligence?
Before Learning about Artificial Intelligence, we should know that what is the
importance of AI and why should we learn it. Following are some main reasons to
learn about AI:

o With the help of AI, you can create such software or devices which can
solve real-world problems very easily and with accuracy such as health
issues, marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as
Cortana, Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new
Opportunities.

Types of Artificial Intelligence:


Artificial Intelligence can be divided in various types, there are mainly two types
of main categorization which are based on capabilities and based on functionally of
AI. Following is flow diagram which explain the types of AI.
AI type-1: Based on Capabilities
1. Weak AI or Narrow AI:
o Narrow AI is a type of AI which is able to perform a dedicated task with
intelligence.The most common and currently available AI is Narrow AI in
the world of Artificial Intelligence.
o Narrow AI cannot perform beyond its field or limitations, as it is only
trained for one specific task. Hence it is also termed as weak AI. Narrow AI
can fail in unpredictable ways if it goes beyond its limits.
o Apple Siriis a good example of Narrow AI, but it operates with a limited
pre-defined range of functions.
o IBM's Watson supercomputer also comes under Narrow AI, as it uses an
Expert system approach combined with Machine learning and natural
language processing.
o Some Examples of Narrow AI are playing chess, purchasing suggestions on
e-commerce site, self-driving cars, speech recognition, and image
recognition.

2. General AI:
o General AI is a type of intelligence which could perform any intellectual
task with efficiency like a human.
o The idea behind the general AI to make such a system which could be
smarter and think like a human by its own.
o Currently, there is no such system exist which could come under general AI
and can perform any task as perfect as a human.
o The worldwide researchers are now focused on developing machines with
General AI.
o As systems with general AI are still under research, and it will take lots of
efforts and time to develop such systems.
3. Super AI:
o Super AI is a level of Intelligence of Systems at which machines could
surpass human intelligence, and can perform any task better than human
with cognitive properties. It is an outcome of general AI.
o Some key characteristics of strong AI include capability include the ability
to think, to reason,solve the puzzle, make judgments, plan, learn, and
communicate by its own.
o Super AI is still a hypothetical concept of Artificial Intelligence.
Development of such systems in real is still world changing task.

Artificial Intelligence type-2: Based on functionality


1. Reactive Machines
o Purely reactive machines are the most basic types of Artificial Intelligence.
o Such AI systems do not store memories or past experiences for future
actions.
o These machines only focus on current scenarios and react on it as per
possible best action.
o IBM's Deep Blue system is an example of reactive machines.
o Google's AlphaGo is also an example of reactive machines.
2. Limited Memory
o Limited memory machines can store past experiences or some data for a
short period of time.
o These machines can use stored data for a limited time period only.
o Self-driving cars are one of the best examples of Limited Memory systems.
These cars can store recent speed of nearby cars, the distance of other cars,
speed limit, and other information to navigate the road.

3. Theory of Mind
o Theory of Mind AI should understand the human emotions, people, beliefs,
and be able to interact socially like humans.
o This type of AI machines are still not developed, but researchers are making
lots of efforts and improvement for developing such AI machines.

4. Self-Awareness
o Self-awareness AI is the future of Artificial Intelligence. These machines
will be super intelligent, and will have their own consciousness, sentiments,
and self-awareness.
o These machines will be smarter than human mind.
o Self-Awareness AI does not exist in reality still and it is a hypothetical
concept.

HISTORY OF MACHINE LEARNING

Machine learning is a growing technology which enables computers to learn


automatically from past data. Machine learning uses various algorithms
for building mathematical models and making predictions using historical
data or information. Currently, it is being used for various tasks such as image
recognition, speech recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.
This machine learning tutorial gives you an introduction to machine learning along
with the wide range of machine learning techniques such
as Supervised, Unsupervised, and Reinforcement learning. You will learn about
regression and classification models, clustering methods, hidden Markov models,
and various sequential models.

The history of machine learning dates back to the mid-20th century when
researchers started exploring the idea of creating machines that could learn from
data. Here is a brief overview of the major milestones in the history of machine
learning:

➢ 1940s-1950s: The earliest work on artificial intelligence (AI) and machine


learning (ML) began in the 1940s and 1950s. Researchers, such as Alan
Turing, began to develop early computer models that could simulate human
intelligence.

➢ 1950s-1960s: In the late 1950s and 1960s, a group of researchers led by


Arthur Samuel at IBM developed one of the first machine learning
programs, which was able to play checkers at a novice level.

➢ 1970s-1980s: During the 1970s and 1980s, there was significant progress in
the field of machine learning. Researchers developed a range of algorithms
and techniques, such as decision trees, linear regression, and clustering,
which are still used today.

➢ 1990s-2000s: The development of neural networks in the 1990s allowed


machine learning algorithms to model complex patterns in data. Support
vector machines (SVMs) were also developed during this time, which
improved the ability to classify data into different categories.

➢ 2010s: The 2010s saw a rapid increase in the use of deep learning, which
uses neural networks with many layers to model complex patterns in data.
Deep learning has been applied successfully in a wide range of applications,
such as image recognition, speech recognition, and natural language
processing.

➢ Today: Machine learning is a rapidly growing field, with applications in


many areas, such as healthcare, finance, and self-driving cars. Advances in
machine learning algorithms and hardware have led to significant
improvements in the accuracy and speed of machine learning models, and
the field is likely to continue to grow and evolve in the coming years.

Overall, the history of machine learning is one of steady progress and innovation,
driven by advances in computing power, algorithmic development, and a growing
understanding of the principles of artificial intelligence.

PROBABILISTIC MODELLING IN ML

• Probabilistic modeling is a powerful technique used in machine learning to


model uncertain relationships between variables. In probabilistic modeling,
we make use of probability theory to represent and manipulate uncertainty.

• Probabilistic models are used to model the uncertainty of events or variables


in a given problem.

• They allow us to model and predict the likelihood of different outcomes


based on available data.

• They are particularly useful in scenarios where the outcome is not


deterministic, such as in natural language processing or image recognition.
• In probabilistic models, we use probability distributions to model the
likelihood of different outcomes.
• They provide a measure of uncertainty in predictions, which can be used to
make more informed decisions.

• Some popular probabilistic models in machine learning include Bayesian


networks, Gaussian processes, and hidden Markov models.

• Bayesian networks are graphical models that use Bayesian probability to


model relationships between variables.

• Gaussian processes are used in regression problems to model the distribution


of data points.

• Hidden Markov models are used in sequential data modeling, such as speech
recognition or language modeling.

• Probabilistic models can be combined with other machine learning


techniques, such as deep learning, to improve their accuracy and usefulness
in real-world applications

EARLY NEURAL NETWORKS


• Early neural networks in machine learning refer to the initial attempts to
build artificial neural networks inspired by the biological neurons in the
brain. These early networks were first proposed in the 1940s and 1950s by
researchers such as Warren McCulloch and Walter Pitts, and later refined by
Frank Rosenblatt in the form of the perceptron.
• The basic idea behind these early neural networks was to create
computational models that could learn from data and make predictions based
on that learning. They were composed of interconnected nodes, called
neurons, that were organized into layers. Each neuron took input from the
neurons in the previous layer and produced output that was fed into the
neurons in the next layer.

• Early neural networks were trained using a technique called supervised


learning, in which the network was presented with input-output pairs and
adjusted its internal weights to minimize the error between its predicted
output and the actual output. This training was typically done using a
process called backpropagation, which involves propagating the error
backwards through the network and adjusting the weights to reduce the
error.

• One limitation of early neural networks was that they could only learn
linearly separable functions, meaning that they could only classify data that
could be separated by a single straight line. This limitation was addressed in
the 1980s with the development of more sophisticated neural network
architectures, such as multi-layer perceptrons and convolutional neural
networks, which allowed for more complex non-linear functions to be
learned.

• Despite their limitations, early neural networks represented an important


milestone in the development of machine learning and paved the way for the
development of more powerful and sophisticated neural network models.
KERNAL METHODS

Kernels or kernel methods (also called Kernel functions) are sets of different types
of algorithms that are being used for pattern analysis. They are used to solve a non-
linear problem by using a linear classifier.

Kernel methods in machine learning

These are some of the many techniques of the kernel:

1. Support Vector Machine (SVM)


2. Adaptive Filter
3. Kernel Perception
4. Principle Component Analysis
5. Spectral Clustering

1. Support Vector Machine (SVM)

It can be defined as a classifier for separating hyperplanes, in which hyperplanes


are subspaces with one dimension less than the ambient space. Higher dimensions
make support vector machines much more challenging to interpret.

It’s more difficult to imagine how we can separate the data linearly and the
decision boundary. In p-dimensions, a hyperplane is a p-1 dimensional “flat”
subspace within the larger p-dimensional space. The hyperplane is simply a line in
two dimensions.
2. Adaptive Filter

It uses a linear filter that integrates the transfer function, controlled by several
methods and parameters, which we will use to fine-tune these parameters per the
development algorithm. Every adaptive filter is a digital filter due to the
complexity of the optimization algorithm.

An adaptive filter is required for applications with no prior knowledge of the


desired performance or where the implementation changes. The cost function is
applied to a flexible closed-loop filter as needed for optimal filter operation. It
determines how to alter the filter transfer function to reduce the cost of subsequent
duplication.

3. Kernel perception

In machine learning, it’s a variant of the popular perceptron learning algorithm


used to train kernel machines. It includes non-linear classifiers that use a kernel
function to calculate the similarity of unseen samples to training samples.

Most of the kernel algorithms discussed are statistically based on convex


optimization or eigenproblems. Therefore, the statistical learning theory is used to
analyze their statistical properties.

Kernel methods have a wide range of applications:

3D reconstruction
Bioinformatics

Geostatistics

Chemoinformatics

Handwriting recognition

Inverse distance weighting

Information extraction

4. Principle Component Analysis (PCA)

Principle component analysis is a tool used to reduce data size. It allows us to


reduce the size of the data without losing much of the information. PCA reduces
the size by obtaining a combination of orthogonal lines (key components) for real
flexibility with very large variations. The first major component captures most of
the data variability.

The second main part is orthogonal in the main part and captures the remaining
variations, the rest of the first main part, and so on. Many principal components are
uncorrelated and organized so that a few principal components define most of the
actual data variations. The kernel principal component analysis extends PCA that
uses kernel methods. In contrast to the standard linear PCA, the kernel variant
works for a large number of attributes but becomes slow for a large number of
examples.

5. Spectral clustering
In the context of image classification, it’s known as segmentation-based object
categorization. Size reduction is performed before merging into smaller sizes in
Spectral Clustering, and this is accomplished by using the eigenvalue matrix for
data matching.

Its roots can be traced back to graph theory, where this method is used to identify
node communities on a graph depending on the edges that connect them. This
method is sufficiently adaptable to allow us to compile data from non-graphs too.

Soft kernel spectral clustering (SKSC) uses algorithm 1 to calculate complex initial
classification of training data. Next, the soft group assignments go calculate cosine
distance between each point and other group prototypes in the speculative space e
(l). In particular, considering the speculation of training points.

ei = [e(1) i, …, e(k − 1)i],

i = 1, …, and the corresponding heavy assignments (q,p,i) can count on each


collection prototypes collection

s1, …, sp, …, sk , sp ∈ sk − 1

such as: sp =1np

np∑i = 1 be (1.6)
where np is the number of points assigned to cluster p during the initiation step by
KSC.

Decision Tree Classification Algorithm


o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal
nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given
dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm
for the given dataset and problem is the main point to remember while creating a
machine learning model. Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a


decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because it shows
a tree-like structure.
o Decision Tree Terminologies
o Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or more
homogeneous sets.
o Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
o Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
o Branch/Sub Tree: A tree formed by splitting the tree.
o Pruning: Pruning is the process of removing the unwanted branches
from the tree.
o Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?

o In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of
root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
o
o For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches
the leaf node of the tree. The complete process can be better understood
using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of


decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:

o There should be some actual values in the feature variable of the


dataset so that the classifier can predict accurate results rather than a
guessed result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?


Below are some points that explain why we should use the Random Forest
algorithm:

o It takes less training time as compared to other algorithms.


o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?


Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree created
in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.


Step-5: For new data points, find the predictions of each decision tree, and assign
the new data points to the category that wins the majority votes.

GBM in Machine Learning


Machine learning is one of the most popular technologies to build predictive
models for various complex regression and classification tasks. Gradient Boosting
Machine (GBM) is considered one of the most powerful boosting algorithms.

Although, there are so many algorithms used in machine learning, boosting


algorithms has become mainstream in the machine learning community across the
world. Boosting technique follows the concept of ensemble learning, and hence it
combines multiple simple models (weak learners or base estimators) to generate
the final output. GBM is also used as an ensemble method in machine learning
which converts the weak learners into strong learners. In this topic, "GBM in
Machine Learning" we will discuss gradient machine learning algorithms,
various boosting algorithms in machine learning, the history of GBM, how it
works, various terminologies used in GBM, etc. But before starting, first,
understand the boosting concept and various boosting algorithms in machine
learning.

What is GBM in Machine Learning?


Gradient Boosting Machine (GBM) is one of the most popular forward learning
ensemble methods in machine learning. It is a powerful technique for building
predictive models for regression and classification tasks.

GBM helps us to get a predictive model in form of an ensemble of weak prediction


models such as decision trees. Whenever a decision tree performs as a weak
learner then the resulting algorithm is called gradient-boosted trees.

It enables us to combine the predictions from various learner models and build a
final predictive model having the correct prediction.
But here one question may arise if we are applying the same algorithm then how
multiple decision trees can give better predictions than a single decision tree?
Moreover, how does each decision tree capture different information from the
same data?

So, the answer to these questions is that a different subset of features is taken by
the nodes of each decision tree to select the best split. It means, that each tree
behaves differently, and hence captures different signals from the same data.

How do GBM works?


Generally, most supervised learning algorithms are based on a single predictive
model such as linear regression, penalized regression model, decision trees, etc.
But there are some supervised algorithms in ML that depend on a combination of
various models together through the ensemble. In other words, when multiple base
models contribute their predictions, an average of all predictions is adapted by
boosting algorithms.

Gradient boosting machines consist 3 elements as follows:

o Loss function
o Weak learners
o Additive model

Let's understand these three elements in detail.


1. Loss function:
Although, there is a big family of Loss functions in machine learning that can be
used depending on the type of tasks being solved. The use of the loss function is
estimated by the demand of specific characteristics of the conditional distribution
such as robustness. While using a loss function in our task, we must specify the
loss function and the function to calculate the corresponding negative gradient.
Once, we get these two functions, they can be implemented into gradient boosting
machines easily. However, there are several loss functions have been already
proposed for GBM algorithms.

Classification of loss function:


Based on the type of response variable y, loss function can be classified into
different types as follows:

1. Continuous response, y ∈ R:
o Gaussian L2 loss function
o Laplace L1 loss function
o Huber loss function, δ specified
o Quantile loss function, α specified
2. Categorical response, y ∈ {0, 1}:
o Binomial loss function
o Adaboost loss function
3. Other families of response variables:
o Loss functions for survival models
o Loss functions count data
o Custom loss functions

2. Weak Learner:
Weak learners are the base learner models that learn from past errors and help in
building a strong predictive model design for boosting algorithms in machine
learning. Generally, decision trees work as a weak learners in boosting algorithms.
Boosting is defined as the framework that continuously works to improve the
output from base models. Many gradient boosting applications allow you to
"plugin" various classes of weak learners at your disposal. Hence, decision trees
are most often used for weak (base) learners.

How to train weak learners:


Machine learning uses training datasets to train base learners and based on the
prediction from the previous learner, it improves the performance by focusing on
the rows of the training data where the previous tree had the largest errors or
residuals. E.g. shallow trees are considered weak learner to decision trees as it
contains a few splits. Generally, in boosting algorithms, trees having up to 6 splits
are most common.

Below is a sequence of training the weak learner to improve their performance


where each tree is in the sequence with the previous tree's residuals. Further, we
are introducing each new tree so that it can learn from the previous tree's errors.
These are as follows:

1. Consider a data set and fit a decision tree into it.


F1(x)=y
2. Fit the next decision tree with the largest errors of the previous tree.
h1(x)=y?F1(x)
3. Add this new tree to the algorithm by adding both in steps 1 and 2.
F2(x)=F1(x)+h1(x)
4. Again fit the next decision tree with the residuals of the previous tree.
h2(x)=y?F2(x)
5. Repeat the same which we have done in step 3.
F3(x)=F2(x)+h2(x)

Continue this process until some mechanism (i.e. cross-validation) tells us to stop.
The final model here is a stagewise additive model of b individual trees:

f(x)=B∑b=1fb(x)

Hence, trees are constructed greedily, choosing the best split points based on purity
scores like Gini or minimizing the loss.
3. Additive Model:
The additive model is defined as adding trees to the model. Although we should
not add multiple trees at a time, only a single tree must be added so that existing
trees in the model are not changed. Further, we can also prefer the gradient descent
method by adding trees to reduce the loss.

In the past few years, the gradient descent method was used to minimize the set of
parameters such as the coefficient of the regression equation and weight in a neural
network. After calculating error or loss, the weight parameter is used to minimize
the error. But recently, most ML experts prefer weak learner sub-models or
decision trees as a substitute for these parameters. In which, we have to add a tree
in the model to reduce the error and improve the performance of that model. In this
way, the prediction from the newly added tree is combined with the prediction
from the existing series of trees to get a final prediction. This process continues
until the loss reaches an acceptable level or is no longer improvement required.

EXTREME GRADIENT BOOSTING MACHINE (XGBM)


XGBM is the latest version of gradient boosting machines which also works very
similar to GBM. In XGBM, trees are added sequentially (one at a time) that learn
from the errors of previous trees and improve them. Although, XGBM and GBM
algorithms are similar in look and feel but still there are a few differences between
them as follows:

o XGBM uses various regularization techniques to reduce under-fitting or


over-fitting of the model which also increases model performance more than
gradient boosting machines.
o XGBM follows parallel processing of each node, while GBM does not
which makes it more rapid than gradient boosting machines.
o XGBM helps us to get rid of the imputation of missing values because by
default the model takes care of it. It learns on its own whether these values
should be in the right or left node.

Light Gradient Boosting Machines (Light GBM)


Light GBM is a more upgraded version of the Gradient boosting machine due to its
efficiency and fast speed. Unlike GBM and XGBM, it can handle a huge amount
of data without any complexity. On the other hand, it is not suitable for those data
points that are lesser in number.

Instead of level-wise growth, Light GBM prefers leaf-wise growth of the nodes of
the tree. Further, in light GBM, the primary node is split into two secondary nodes
and later it chooses one secondary node to be split. This split of a secondary node
depends upon which between two nodes has a higher loss.

Hence, due to leaf-wise split, Light Gradient Boosting Machine (LGBM) algorithm
is always preferred over others where a large amount of data is given.

CATBOOST
The catboost algorithm is primarily used to handle the categorical features in a
dataset. Although GBM, XGBM, and Light GBM algorithms are suitable for
numeric data sets, Catboost is designed to handle categorical variables into
numeric data. Hence, catboost algorithm consists of an essential preprocessing step
to convert categorical features into numerical variables which are not present in
any other algorithm.

Advantages of Boosting Algorithms:


o Boosting algorithms follow ensemble learning which enables a model to
give a more accurate prediction that cannot be trumped.
o Boosting algorithms are much more flexible than other algorithms as can
optimize different loss functions and provides several hyperparameter tuning
options.
o It does not require data pre-processing because it is suitable for both numeric
as well as categorical variables.
o It does not require imputation of missing values in the dataset, it handles
missing data automatically.

FUNDAMENTALS IN MACHINE LEARNING:


Machine learning is a field of computer science and artificial intelligence that
enables computer systems to automatically learn and improve from experience
without being explicitly programmed. The core concepts and fundamentals of
machine learning are as follows:

• Data: Machine learning requires a significant amount of data to train


models. This data must be accurately labeled and relevant to the problem at
hand.

• Algorithms: Machine learning algorithms are the set of rules and


instructions that enable computers to learn from the data. There are three
types of algorithms: supervised learning, unsupervised learning, and
reinforcement learning.

• Model: A machine learning model is the output generated by an algorithm


that learns from the data. The model is used to make predictions or decisions
based on new data.

• Training: The process of training a machine learning model involves


feeding large amounts of data into an algorithm to enable it to learn and
improve its accuracy over time.
• Validation: Once the model is trained, it needs to be validated to ensure its
accuracy and reliability in making predictions on new data.

• Testing: Testing a machine learning model involves evaluating its


performance on a separate dataset, which was not used in training or
validation.

• Feature Engineering: Feature engineering is the process of selecting and


extracting relevant features or variables from the data that can help improve
the accuracy of the model.

• Overfitting and Underfitting: Overfitting occurs when the model is too


complex and captures noise or random variations in the training data,
leading to poor generalization on new data. Underfitting occurs when the
model is too simple and fails to capture the underlying patterns in the data.

• Hyperparameter Tuning: Machine learning algorithms have several


hyperparameters that need to be optimized to achieve the best performance.
Hyperparameter tuning involves adjusting these parameters to improve the
model's accuracy.

• Deployment: Finally, the trained machine learning model needs to be


deployed in a real-world application, where it can make predictions or
decisions based on new data.

FOUR BRANCHES OF MACHINE LEARNING:

• Supervised Learning: In supervised learning, the machine learning


algorithm is trained on labeled data, which means that the data has been pre-
classified or pre-categorized. The algorithm learns to map input data to the
correct output by being trained on examples of input/output pairs. For
instance, given a set of labeled images of cats and dogs, a supervised
learning algorithm can be trained to classify new images as either a cat or a
dog.

• Unsupervised Learning: In unsupervised learning, the machine learning


algorithm is trained on unlabeled data, which means that the data is not pre-
categorized or pre-classified. The algorithm tries to identify patterns or
relationships in the data without any prior knowledge of what the data
represents. Unsupervised learning is often used for tasks such as clustering
or anomaly detection, where the algorithm attempts to group similar data
points together or identify unusual data points.

• Reinforcement Learning: Reinforcement learning is a type of machine


learning where the algorithm learns through trial and error by interacting
with an environment. The algorithm receives feedback in the form of
rewards or penalties based on its actions, and its goal is to learn a policy that
maximizes the long-term reward. Reinforcement learning is often used for
tasks such as game playing, robotics, and control systems.

• Semi-Supervised Learning: Semi-supervised learning is a type of machine


learning where the algorithm is trained on a combination of labeled and
unlabeled data. The idea is that by leveraging the additional unlabeled data,
the algorithm can improve its performance on the labeled data. Semi-
supervised learning is often used when labeled data is expensive or difficult
to obtain, as it can reduce the amount of labeled data required for training a
model.
1.Supervised Machine Learning
As its name suggests, Supervised machine learning is based on supervision. It
means in the supervised learning technique, we train the machines using the
"labelled" dataset, and based on the training, the machine predicts the output. Here,
the labelled data specifies that some of the inputs are already mapped to the output.
More preciously, we can say; first, we train the machine with the input and
corresponding output, and then we ask the machine to predict the output using the
test dataset.

Let's understand supervised learning with an example. Suppose we have an input


dataset of cats and dog images. So, first, we will provide the training to the
machine to understand the images, such as the shape & size of the tail of cat and
dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After
completion of training, we input the picture of a cat and ask the machine to identify
the object and predict the output. Now, the machine is well trained, so it will check
all the features of the object, such as height, shape, colour, eyes, ears, tail, etc., and
find that it's a cat. So, it will put it in the Cat category. This is the process of how
the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of
supervised learning are Risk Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which
are given below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or
Blue, etc. The classification algorithms predict the categories present in the
dataset. Some real-world examples of classification algorithms are Spam
Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a


linear relationship between input and output variables. These are used to predict
continuous output variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
Advantages of Supervised Learning
Advantages:

o Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its
name suggests, there is no need for supervision. It means, in unsupervised machine
learning, the machine is trained using the unlabeled dataset, and the machine
predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of


fruit images, and we input it into the machine learning model. The images are
totally unknown to the model, and the task of the machine is to find the patterns
and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.

Categories of Unsupervised Machine Learning


Unsupervised Learning can be further classified into two types, which are given
below:

o Clustering
o Association
1) Clustering

The clustering technique is used when we want to find the inherent groups from
the data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds


interesting relations among variables within a large dataset. The main aim of this
learning algorithm is to find the dependency of one data item on another data item
and map those variables accordingly so that it can generate maximum profit. This
algorithm is mainly applied in Market Basket analysis, Web usage mining,
continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm,


Eclat, FP-growth algorithm.

Advantages of Unsupervised Learning Algorithm


Advantages:

o These algorithms can be used for complicated tasks compared to the


supervised ones because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the
unlabeled dataset is easier as compared to the labelled dataset.
3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies


between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it
mostly consists of unlabeled data. As labels are costly, but for corporate purposes,
they may have few labels. It is completely different from supervised and
unsupervised learning as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main
aim of semi-supervised learning is to effectively use all the available data, rather
than only labelled data like in supervised learning. Initially, similar data is
clustered along with an unsupervised learning algorithm, and further, it helps to
label the unlabeled data into labelled data. It is because labelled data is a
comparatively more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if
that student is self-analysing the same concept without any help from the
instructor, it comes under unsupervised learning. Under semi-supervised learning,
the student has to revise himself after analyzing the same concept under the
guidance of an instructor at college.

Advantages of Semi-supervised Learning


Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning
algorithms.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI
agent (A software component) automatically explore its surrounding by
hitting & trail, taking action, learning from experiences, and improving its
performance. Agent gets rewarded for each good action and get punished for each
bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.

In reinforcement learning, there is no labelled data like supervised learning, and


agents learn from their experiences only.

The reinforcement learning process is similar to a human being; for example, a


child learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a
high score. Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields


such as Game theory, Operation Research, Information theory, multi-agent
systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and generates a new
state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of
methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning


specifies increasing the tendency that the required behaviour would occur
again by adding something. It enhances the strength of the behaviour of the
agent and positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works
exactly opposite to the positive RL. It increases the tendency that the
specific behaviour would occur again by avoiding the negative condition.
Real-world Use cases of Reinforcement Learning
o VideoGames:
RL algorithms are much popular in gaming applications. It is used to gain
super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o ResourceManagement:
The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and schedule
resources to wait for different jobs in order to minimize average job
slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning. There are different industries that have their
vision of building intelligent robots using AI and Machine learning
technology.
o TextMining
Text-mining, one of the great applications of NLP, is now being
implemented with the help of Reinforcement Learning by Salesforce
company.

Overfitting and Underfitting in Machine Learning


Overfitting and Underfitting are the two main problems that occur in machine
learning and degrade the performance of the machine learning models.

The main goal of each machine learning model is to generalize well.


Here generalization defines the ability of an ML model to provide a suitable
output by adapting the given set of unknown input. It means after providing
training on the dataset, it can produce reliable and accurate output. Hence, the
underfitting and overfitting are the two terms that need to be checked for the
performance of the model and whether the model is generalizing well or not.
Before understanding the overfitting and underfitting, let's understand some basic
term that will help to understand this topic well:

o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.

Overfitting
Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because
of this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model. The
overfitted model has low bias and high variance.

The chances of occurrence of overfitting increase as much we provide training to


our model. It means the more we train our model, the more chances of occurring
the overfitted model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of
the linear regression output:
As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient, but in reality, it is not so. Because
the goal of the regression model to find the best fit line, but here we have not got
any best fit, so, it will generate the prediction errors.

How to avoid the Overfitting in Model


Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which
we can reduce the occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

Underfitting
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of
training data can be stopped at an early stage, due to which the model may not
learn enough from the training data. As a result, it may fail to find the best fit of
the dominant trend in the data.

In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear
regression model:

As we can see from the above diagram, the model is unable to capture the data
points present in the plot.

How to avoid underfitting:


o By increasing the training time of the model.
o By increasing the number of features.

Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines
how closely the result or predicted values match the true values of the dataset.
The model with a good fit is between the underfitted and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve
it.

As when we train our model for a time, the errors in the training data go down, and
the same happens with test data. But if we train the model for a long duration, then
the performance of the model may decrease due to the overfitting, as the model
also learn the noise present in the dataset. The errors in the test dataset start
increasing, so the point, just before the raising of errors, is the good point, and we
can stop here for achieving a good model.
UNIT – II

Introduction to Deep Learning:

Deep learning is a subfield of machine learning that is inspired by the structure and
function of the human brain, and it has become increasingly popular in recent years due
to its impressive ability to solve complex problems in a wide range of domains. Deep
learning algorithms are capable of learning from large amounts of data, using artificial
neural networks that are composed of many layers of interconnected nodes or neurons.

The term "deep" refers to the depth of the neural network, meaning that it has many
layers, with each layer learning increasingly complex features of the input data. These
layers are typically trained using a form of supervised learning, where the network is
presented with input data and corresponding output data, and the weights between the
neurons are adjusted in order to minimize the difference between the predicted output
and the actual output.

Some of the most common applications of deep learning include image recognition,
speech recognition, natural language processing, and autonomous driving. Deep
learning has also been applied to many other fields, such as drug discovery, financial
modeling, and predictive maintenance.

Biological Vision:

 Biological vision refers to the study of how the human visual system works.
The human visual system is responsible for processing and interpreting visual
information from the environment.
 The human visual system is highly complex and has inspired the development of
deep learning models that attempt to simulate its structure and function.
 The visual cortex in the brain is responsible for processing visual information,
and has a layered structure similar to deep learning neural networks.
 Deep learning models, such as convolutional neural networks (CNNs), are
modeled after the structure of the visual cortex.
 These models can learn to recognize complex patterns in images and video by
extracting increasingly complex features from the input data.
 Biological vision has inspired the development of deep learning models for tasks
such as object recognition, image classification, and image segmentation.
 One of the challenges of deep learning models inspired by biological vision is the
need for large amounts of labeled data to train the neural network.
 Another challenge is the potential for overfitting, which occurs when the model
is trained too well on the training data and fails to generalize to new, unseen data.
 The human visual system is highly adaptable and can learn to recognize new
objects and patterns with minimal training data, which is an area of active
research in deep learning.
 The human visual system is also highly sensitive to context and can recognize
objects in a variety of different settings, which is another area of active research
in deep learning.
 Deep learning models inspired by biological vision have been used for
applications such as medical diagnosis, autonomous driving, and even artistic
style transfer.
 Biological vision research is continuing to inspire the development of new deep
learning models and architectures, such as attention mechanisms and capsule
networks.
 By combining insights from biological and machine vision, researchers can
continue to develop more sophisticated and powerful deep learning algorithms
for a wide range of applications.
 Overall, biological vision has played a significant role in the development of deep
learning and will continue to do so in the future.
Machine Vision:

 Machine vision is the study of how computers can process and analyze visual
information.
 It is a subfield of computer vision, which focuses on developing algorithms that
can interpret visual information.
 Deep learning has enabled significant advances in machine vision, particularly in
areas such as image recognition, object detection, and autonomous driving.
 Deep learning models, such as convolutional neural networks (CNNs), are
capable of analyzing images and video and identifying objects and patterns with
high accuracy.
 Machine vision has numerous real-world applications, such as industrial quality
control, robotics, and security systems.
 One of the challenges in machine vision is dealing with noisy or incomplete data,
which can be mitigated by using deep learning models that can learn to extract
useful features from the data.
 Another challenge is the need for large amounts of labeled data to train deep
learning models, which can be time-consuming and expensive to obtain.
 Machine vision algorithms are often designed to be robust to changes in lighting,
angle, and other environmental factors that can affect image quality.
 Machine vision is a rapidly evolving field, with new applications and techniques
emerging on a regular basis.
 One area of active research in machine vision is developing deep learning models
that can reason and make decisions based on visual input, such as autonomous
vehicles navigating complex environments.
 Machine vision algorithms have been used to solve a wide range of problems,
including facial recognition, medical diagnosis, and object tracking.
 Deep learning models used in machine vision can be fine-tuned to improve their
performance on specific tasks, such as identifying specific types of objects or
detecting subtle changes in images over time.
 Overall, machine vision is a powerful tool that is enabling new applications and
driving innovation in a wide range of industries.

Human and Machine Language:

 Human language refers to the natural language used by humans to communicate,


while machine language refers to the language used by machines to perform
computations.
 In deep learning, both human and machine language play important roles in
training and using neural networks.
 Human language is used to provide input data to the neural network, such as text
or speech, which the network processes and learns from.
 Machine language, on the other hand, is used to represent the internal workings
of the neural network, including the weights and biases of the connections
between neurons.
 Machine language is also used to represent the output of the neural network,
which can be in the form of text, speech, or other types of data.
 Deep learning models can be trained using both supervised and unsupervised
learning techniques, with human language being used as input data in both cases.
 In supervised learning, human language is used to provide labeled training data,
which the network uses to learn how to predict labels for new data.
 In unsupervised learning, human language is used to provide unstructured data,
such as text or speech, which the network learns to cluster or classify based on
similarities and patterns.
 Machine language is used extensively in the optimization and training of neural
networks, as it allows for efficient computation and manipulation of the network
parameters.
 Deep learning models can be used for a wide range of tasks involving human
language, including natural language processing, speech recognition, sentiment
analysis, and machine translation.
 The performance of deep learning models on language tasks can be improved
through the use of techniques such as attention mechanisms, transfer learning,
and pretraining on large datasets.
 Despite the impressive progress made in deep learning for language tasks, there
are still many challenges to be addressed, such as handling rare or out-of-
vocabulary words, understanding context and nuance, and addressing issues of
bias and fairness.
 Deep learning models can also be used to generate human-like language, such as
in the case of natural language generation or dialogue systems.
 The development of advanced deep learning models for language tasks has been
facilitated by the availability of large datasets, such as the Common Crawl or
Wikipedia, and powerful hardware such as GPUs.
 The continued development of deep learning models for language tasks is likely
to have a significant impact on a wide range of industries and applications, from
healthcare to finance to entertainment.

Artificial Neural Network(ANN):

Artificial Neural Networks contain artificial neurons which are called units. These units
are arranged in a series of layers that together constitute the whole Artificial Neural
Networks in a system. A layer can have only a dozen units or millions of units as this
depends on the complexity of the system. Commonly, Artificial Neural Network has an
input layer, output layer as well as hidden layers. The input layer receives data from the
outside world which the neural network needs to analyze or learn about. Then this data
passes through one or multiple hidden layers that transform the input into data that is
valuable for the output layer. Finally, the output layer provides an output in the form of
a response of the Artificial Neural Networks to input data provided.

In the majority of neural networks, units are interconnected from one layer to another.
Each of these connections has weights that determine the influence of one unit on
another unit. As the data transfers from one unit to another, the neural network learns
more and more about the data which eventually results in an output from the output
layer.

How do Artificial Neural Networks learn?

Artificial neural networks are trained using a training set. For example, suppose you
want to teach an ANN to recognize a cat. Then it is shown thousands of different images
of cats so that the network can learn to identify a cat. Once the neural network has been
trained enough using images of cats, then you need to check if it can identify cat images
correctly. This is done by making the ANN classify the images it is provided by deciding
whether they are cat images or not. The output obtained by the ANN is corroborated by
a human-provided description of whether the image is a cat image or not. If the ANN
identifies incorrectly then back-propagation is used to adjust whatever it has learned
during training. Back-propagation is done by fine-tuning the weights of the connections
in ANN units based on the error rate obtained. This process continues until the artificial
neural network can correctly recognize a cat in an image with minimal possible error
rates.

What are the types of Artificial Neural Networks?

1. Feedforward Neural Network : The feedforward neural network is one of the most
basic artificial neural networks. In this ANN, the data or the input provided travels in a
single direction. It enters into the ANN through the input layer and exits through the
output layer while hidden layers may or may not exist. So the feedforward neural
network has a front propagated wave only and usually does not have backpropagation.
2. Recurrent Neural Network : The Recurrent Neural Network saves the output of a
layer and feeds this output back to the input to better predict the outcome of the layer.
The first layer in the RNN is quite similar to the feed-forward neural network and the
recurrent neural network starts once the output of the first layer is computed. After this
layer, each unit will remember some information from the previous step so that it can
act as a memory cell in performing computations.

3. Convolutional Neural Network : A Convolutional neural network has some


similarities to the feed-forward neural network, where the connections between units
have weights that determine the influence of one unit on another unit. But a CNN has
one or more than one convolutional layers that use a convolution operation on the input
and then pass the result obtained in the form of output to the next layer. CNN has
applications in speech and image processing which is particularly useful in computer
vision.

4. Modular Neural Network : A Modular Neural Network contains a collection of


different neural networks that work independently towards obtaining the output with no
interaction between them. Each of the different neural networks performs a different
sub-task by obtaining unique inputs compared to other networks. The advantage of this
modular neural network is that it breaks down a large and complex computational
process into smaller components, thus decreasing its complexity while still obtaining
the required output.

5. Radial basis function Neural Network : Radial basis functions are those functions that
consider the distance of a point concerning the center. RBF functions have two layers.
In the first layer, the input is mapped into all the Radial basis functions in the hidden
layer and then the output layer computes the output in the next step. Radial basis
function nets are normally used to model the data that represents any underlying trend
or function.
Applications of Artificial Neural Networks

1. Social Media : Artificial Neural Networks are used heavily in Social Media. For
example, let’s take the ‘People you may know’ feature on Facebook that suggests you
people that you might know in real life so that you can send them friend requests. Well,
this magical effect is achieved by using Artificial Neural Networks that analyze your
profile, your interests, your current friends, and also their friends and various other
factors to calculate the people you might potentially know. Another common application
of Machine Learning in social media is facial recognition. This is done by finding
around 100 reference points on the person’s face and then matching them with those
already available in the database using convolutional neural networks.

2. Marketing and Sales : When you log onto E-commerce sites like Amazon and
Flipkart, they will recommend your products to buy based on your previous browsing
history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you
restaurant recommendations based on your tastes and previous order history. This is true
across all new-age marketing segments like Book sites, Movie services, Hospitality
sites, etc. and it is done by implementing personalized marketing. This uses Artificial
Neural Networks to identify the customer likes, dislikes, previous shopping history, etc.
and then tailor the marketing campaigns accordingly. 3. Healthcare : Artificial Neural
Networks are used in Oncology to train algorithms that can identify cancerous tissue at
the microscopic level at the same accuracy as trained physicians. Various rare diseases
may manifest in physical characteristics and can be identified in their premature stages
by using Facial Analysis on the patient photos. So the full-scale implementation of
Artificial Neural Networks in the healthcare environment can only enhance the
diagnostic abilities of medical experts and ultimately lead to the overall improvement
in the quality of medical care all over the world.

4. Personal Assistants : I am sure you all have heard of Siri, Alexa, Cortana, etc. and
also heard them based on the phones you have!!! These are personal assistants and an
example of speech recognition that uses Natural Language Processing to interact with
the users and formulate a response accordingly. Natural Language Processing uses
artificial neural networks that are made to handle many tasks of these personal assistants
such as managing the language syntax, semantics, correct speech, the conversation that
is going on, etc.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.

Storing data on the entire network: Data that is used in traditional programming is stored
on the whole network, not on a database. The disappearance of a couple of pieces of
data in one place doesn't prevent the network from working.

Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies upon
the significance of missing data.

Having a memory distribution: For ANN is to be able to adapt, it is important to


determine the examples and to encourage the network according to the desired output
by demonstrating these examples to the network. The succession of the network is
directly proportional to the chosen instances, and if the event can't appear to the network
in all its aspects, it can produce false output.

Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure: There is no particular guideline for determining


the structure of artificial neural networks. The appropriate network structure is
accomplished through experience, trial, and error.
Unrecognized behavior of the network: It is the most significant issue of ANN. When
ANN produces a testing solution, it does not provide insight concerning why and how.
It decreases trust in the network.

Hardware dependence: Artificial neural networks need processors with parallel


processing power, as per their structure. Therefore, the realization of the equipment is
dependent. Difficulty of showing the issue to the network: ANNs can work with
numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly
impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown: The network is reduced to a specific value of
the error, and this value does not give us optimum results.

Training Deep Networks :

 Deep learning is a subfield of machine learning that involves the training of deep
neural networks, which are composed of multiple layers of interconnected nodes.
 The training of deep neural networks involves feeding large amounts of data into
the network, which then learns to recognize patterns and features in the data.
 The data is typically divided into training, validation, and test sets. The training
set is used to train the network, the validation set is used to optimize
hyperparameters, and the test set is used to evaluate the performance of the
network.
 During training, the network adjusts its weights and biases in order to minimize
a cost function, which measures the difference between the predicted output and
the true output.
 Gradient descent is a common optimization algorithm used in deep learning,
which involves iteratively adjusting the weights and biases of the network in the
direction of the negative gradient of the cost function.
 Backpropagation is a technique used to compute the gradient of the cost function
with respect to the weights and biases of the network, which is necessary for
gradient descent.
 Stochastic gradient descent is a variant of gradient descent that randomly selects
a small batch of samples from the training set at each iteration, which can improve
convergence and reduce computation time.
 Dropout is a regularization technique used to prevent overfitting in deep neural
networks, which involves randomly setting some of the nodes in the network to
zero during training.
 Batch normalization is a technique used to improve the stability and performance
of deep neural networks, which involves normalizing the inputs to each layer of
the network.
 Convolutional neural networks (CNNs) are a type of deep neural network
commonly used for image and video recognition, which exploit the local spatial
correlation of the data.
 Recurrent neural networks (RNNs) are a type of deep neural network commonly
used for sequential data processing, which can maintain an internal memory of
previous inputs.
 Long short-term memory (LSTM) networks are a variant of RNNs that can
selectively forget or remember previous inputs, which can improve their ability
to handle long-term dependencies.
 Transfer learning is another technique that can be used to improve the
performance of deep networks, by using pre-trained models on similar tasks and
fine-tuning them on new tasks.
 Deep networks can be trained on a variety of tasks, including image classification,
object detection, natural language processing, speech recognition, and generative
modeling.
 The architecture of the network, including the number of layers, the size of the
layers, and the connectivity between layers, can have a significant impact on the
performance of the network.
 Hyperparameter tuning is an important part of training deep networks, and
involves selecting the optimal values for various parameters such as learning rate,
batch size, and regularization strength.
 Deep networks can require a significant amount of computational resources to
train, particularly for large datasets and complex architectures.
 To accelerate training, techniques such as distributed training, parallel
computing, and hardware accelerators such as GPUs and TPUs can be used.
 Visualization techniques such as t-SNE and PCA can be used to visualize the
high-dimensional representations learned by deep networks, and gain insights
into the structure of the data.
 Deep networks have achieved state-of-the-art performance on a wide range of
tasks, and have been used to develop a variety of real-world applications in fields
such as healthcare, finance, and autonomous driving.

Improving Deep Networks:

 More data: One way to improve deep networks is to use more data to train them.
Larger datasets help the model learn more robust and generalized features that
can be applied to unseen data.
 Better initialization: Initializing the weights and biases in a deep network can
have a big impact on its performance. Techniques such as Xavier initialization
can improve the convergence and accuracy of the network.
 Regularization: Deep networks are prone to overfitting, especially when the
number of parameters is high. Regularization techniques such as Dropout or
L1/L2 regularization can help reduce overfitting and improve generalization.
 Improved activation functions: The choice of activation function can have a big
impact on the performance of a deep network. Recent advances in activation
functions such as ReLU, LeakyReLU, and Swish have shown improved
performance over traditional functions such as sigmoid and tanh.
 Advanced optimization algorithms: Gradient descent is the most common
optimization algorithm used in deep learning. Advanced optimization algorithms
such as Adam, Adagrad, and RMSprop can help speed up the convergence and
improve accuracy.
 Network architecture: The choice of network architecture can have a significant
impact on the performance of a deep network. Popular architectures such as
ResNet, VGG, and Inception have shown improved performance on various
tasks.
 Transfer learning: Transfer learning is a technique where a pre-trained network
is used as a starting point for a new task. This can help improve performance and
reduce the amount of data required for training.
 Data augmentation: Data augmentation involves generating new training
examples by applying various transformations to the existing data. This can help
improve the performance of the network by reducing overfitting and improving
generalization.
 Hyperparameter tuning: The performance of a deep network can be highly
dependent on the choice of hyperparameters such as learning rate, batch size, and
regularization strength. Careful tuning of these hyperparameters can help
improve the performance of the network.
 Ensemble methods: Ensemble methods involve combining the predictions of
multiple deep networks to improve performance.
UNIT III
Anatomy of Neural networks:
Neural networks are a type of machine learning model that are inspired by the
structure and function of the human brain. They consist of layers of
interconnected nodes, called neurons, which process and transmit information.

As a result, we can say that ANNs are composed of multiple nodes. That imitate
biological neurons of the human brain. Although, we connect these neurons by
links. Also, they interact with each other. Although, nodes are used to take input
data. Further, perform simple operations on the data. As a result, these operations
are passed to other neurons. Also, output at each node is called its activation or
node value.

The basic anatomy of a neural network includes:

Input layer: This is the first layer of the network where the input data is received.
Each neuron in this layer corresponds to one input feature.

Hidden layer(s): These are the layers that come between the input and output
layers. They process the input data and learn to recognize patterns in the data. A
neural network can have one or more hidden layers.

Output layer: This is the final layer of the network where the output data is
produced. Each neuron in this layer corresponds to one output feature.

Neurons: These are the basic units of a neural network. They receive input from
other neurons and perform a mathematical operation on the input to produce an
output.

Weights: These are the values that determine the strength of the connections
between neurons. They are adjusted during training to optimize the performance
of the network.

Bias: Bias is an additional parameter that is added to the input of each neuron. It
allows the neuron to adjust the output based on a specific threshold.
Activation function: This is a mathematical function that is applied to the output
of each neuron. It allows the neuron to produce a non-linear output that can model
complex relationships between input and output data.

Backpropagation: This is the process of adjusting the weights and biases of the
network during training to minimize the difference between the predicted output
and the actual output. It is an iterative process that continues until the network
produces accurate predictions.

As each link is associated with weight. Also, they are capable of learning. That takes
place by altering weight values. Hence, the following illustration shows a simple ANN

Figure 1 :Neural networks structure

Types of Artificial Neural Networks


Generally, there are two types of ANN. Such as Feedforward and Feedback.

a. Feedforward ANN
In this network flow of information is unidirectional. A unit used to send
information to another unit that does not receive any information. Also, no
feedback loops are present in this. Although, used in recognition of a pattern. As
they contain fixed inputs and outputs.
Figure 2 :Feed forward neural network structure

b. Feedback ANN: In this Artificial Neural Network, it allows feedback loops.


Also, used in content addressable memories.

Figure 3:Feed backward neural network structure

Introduction to keras:

Deep learning is becoming more popular in data science fields like robotics, artificial
intelligence (AI), audio & video recognition and image recognition. Artificial neural
network is the core of deep learning methodologies. Deep learning is supported by
various libraries such as Theano, TensorFlow, Caffe, Mx net etc., Keras is one of the
most powerful and easy to use python library, which is built on top of popular deep
learning libraries like TensorFlow, Theano, etc., for creating deep learning models.

Overview of Keras:

Keras runs on top of open-source machine libraries like TensorFlow, Theano or


Cognitive Toolkit (CNTK). Theano is a python library used for fast numerical
computation tasks. TensorFlow is the most famous symbolic math library used for
creating neural networks and deep learning models. TensorFlow is very flexible and the
primary benefit is distributed computing. CNTK is deep learning framework developed
by Microsoft. It uses libraries such as Python, C#, C++, or standalone machine learning
toolkits. Theano and TensorFlow are very powerful libraries but difficult to understand
for creating neural networks. Keras is based on minimal structure that provides a clean
and easy way to create deep learning models based on TensorFlow or Theano. Keras is
designed to quickly define deep learning models. Well, Keras is an optimal choice for
deep learning applications.

Features:

Keras leverages various optimization techniques to make high level neural


network API easier and more performant. It supports the following features:
• Consistent, simple and extensible API.
• Minimal structure - easy to achieve the result without any frills.
• It supports multiple platforms and backends.
• It is user friendly framework which runs on both CPU and GPU.
• Highly scalability of computation.

Benefits:

Keras is highly powerful and dynamic framework and comes up with the following
advantages: • Larger community support.
• Easy to test.
• Keras neural networks are written in Python which makes things simpler.
• Keras supports both convolution and recurrent networks.
TensorFlow :
TensorFlow is an open-source machine learning library used for numerical
computational tasks developed by Google. Keras is a high level API built on top
of TensorFlow or Theano. We know already how to install TensorFlow using pip.
If it is not installed, you can install using the below command:

pip install TensorFlow

Once we execute keras, we could see the configuration file is located at your home
directory inside and go to .keras/keras.json.
keras.json:

{
"image_data_format":
"channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}

Here,
• image_data_format represent the data format.
• epsilon represents numeric constant. It is used to avoid DivideByZero error.
• floatx represent the default data type float32. You can also change it to
float16
or float64 using set_floatx() method.
• backend denotes the current backend.
Theano:
Theano is an open-source deep learning library that allows you to evaluate
multidimensional arrays effectively. We can easily install using the below
command:

pip install Theano

By default, keras uses TensorFlow backend. If you want to change backend


configuration
from TensorFlow to Theano, just change the backend = theano in keras.json file. It
is
described below:

keras.json

{
"image_data_format":
"channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}

Now save your file, restart your terminal and start keras, your backend will be
changed.

>>> import keras as k


using theano backend
CNTK
CNTK (Microsoft Cognitive Toolkit) is an open-source deep learning framework
developed by Microsoft that allows users to build, train, and deploy deep neural
networks. It supports a variety of machine learning algorithms and architectures,
including convolutional neural networks (CNNs), recurrent neural networks (RNNs),
and deep belief networks (DBNs). Here are some key features of CNTK:

• Distributed training: CNTK supports distributed training across multiple


GPUs and multiple machines, which allows users to train large-scale models
more efficiently.
• Python and C++ APIs: CNTK provides APIs in both Python and C++, which
makes it easy to integrate with other programming languages and tools.
• GPU acceleration: CNTK supports GPU acceleration for training and
inference, which allows users to train models much faster than on CPU-only
systems.
• Pre-trained models: CNTK includes a number of pre-trained models that can
be used for tasks such as image classification and speech recognition. These
models can be fine-tuned on a user's own dataset for improved performance.
• Customizable neural network architecture: CNTK allows users to define and
customize their own neural network architecture using a high-level API called
Brain Script.
• Cross-platform support: CNTK runs on Windows, Linux, and macOS, which
makes it easy to deploy models across a variety of platforms.
• Interactive training: CNTK includes a powerful visualization tool called the
Network Learner, which allows users to monitor the training process in real-
time and make adjustments to the model as needed.
Overall, CNTK is a powerful and flexible deep learning framework that is particularly
well-suited for large-scale distributed training. Its support for multiple programming
languages and platforms makes it easy to integrate with other tools and systems, and
its pre-trained models and customizable neural network architecture make it a great
choice for a wide variety of machine learning tasks.

Setting up Deep learning workstation

PC Hardware Setup:

First of all to perform machine learning and deep learning on any dataset, the
software/program requires a computer system powerful enough to handle the
computing power necessary. So the following is required:

• Central Processing Unit (CPU) — Intel Core i5 6th Generation processor or


higher. An AMD equivalent processor will also be optimal.
• RAM — 8 GB minimum, 16 GB or higher is recommended.
• Graphics Processing Unit (GPU) — NVIDIA GeForce GTX 960 or higher.
AMD GPUs are not able to perform deep learning regardless. For more
information on NVIDIA GPUs for deep learning please visit
https://developer.nvidia.com/cuda-gpus.
• Operating System — Ubuntu or Microsoft Windows 10. I recommend updating
Windows 10 to the latest version before proceeding forward.

Note: In the case of laptops, the ideal option would be to purchase a gaming laptop
from any vendor deemed suitable such as Alienware, ASUS, Lenovo Legion, Acer
Predator, etc.
Nvidia GeForce Experience

This tool is designed to update your NVIDIA GPU drivers and it is far easier to do it
like this and it is highly recommended to be installed if you have an NVIDIA GPU.

Download NVIDIA GeForce Experience

Table of Contents

In this tutorial, we will cover the following steps:

1. Download Anaconda
2. Install Anaconda & Python
3. Start and Update Anaconda
4. Install CUDA Toolkit & cuDNN
5. Create an Anaconda Environment
6. Install Deep Learning API’s (TensorFlow & Keras)

Step 1: Download Anaconda

In this step, we will download the Anaconda Python package for your platform.

Anaconda is a free and easy-to-use environment for scientific Python.

1.Install Anaconda (Python 3.6 version) Download


Step 2: Install Anaconda

In this step, we will install the Anaconda Python software on your system.

Installation is very easy and quick once you download the setup. Open the setup and
follow the wizard instructions.

#Note: It will automatically install Python and some basic libraries with it.

It might take 5 to 10 minutes or some more time according to your system.

Step 3: Update Anaconda

Open Anaconda Prompt to type the following command(s). Don’t worry Anaconda
Prompt works the same as cmd.

conda update conda


conda update --all
Step 4: Install CUDA Toolkit & cuDNN

1. Install CUDA Toolkit 9.0 or 8.0 Download

Choose your version depending on your Operating System and GPU.

#Version Support: Here is a guide to check that if your version supports your Nvidia
Graphic Card
2. Download cuDNN Download

Download the latest version of cuDNN. Choose your version depending on your
Operating System and CUDA. Membership registration is required. Don’t worry you
can easily create an account using your email.

Put your unzipped folder in C drive as follows:

C:\cudnn-9.0-windows10-x64-v7

Step 5: Add cuDNN into Environment Path

1. Open Run dialogue using (Win + R) and run the command sysdm.cpl
2. In Window-10 System Properties, please select the Tab Advanced.
3. Select Environment Variables
4. Add the following path to your Environment.
C:\cudnn-9.0-windows10-x64-v7\cuda\bin
Step 6: Create an Anaconda Environment

Here we will create a new anaconda environment for our specific usage so that it will
not affect the root of Anaconda.

Open Anaconda Prompt to type the following commands.

1. Create a conda environment named “tensorflow” (you can change the name) by
invoking the following command:

conda create -n tensorflow pip python=3.6

2. Activate the conda environment by issuing the following command:

activate tensorflow
(tensorflow)C:> # Your prompt should change

Step 7: Install Deep Learning Libraries

In this step, we will install Python libraries used for deep learning, specifically:
TensorFlow, and Keras.

1. TensorFlow

TensorFlow is a tool for machine learning. While it contains a wide range of


functionality, TensorFlow is mainly designed for deep neural network models.

=> For installing TensorFlow, Open Anaconda Prompt to type the following
commands.

To install the GPU version of TensorFlow:


C:\> pip install tensorflow-gpu

To install the CPU-only version of TensorFlow:

C:\> pip install tensorflow

If your machine or system is the only CPU supported you can install CPU version for
basic learning and practice.

=> You can test the installation by running this program on shell:

>>> import tensorflow as tf


>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
For getting started and documentation you can visit
TensorFlow website.

2. Keras

Keras is a high-level neural networks API, written in Python and capable of running
on top of TensorFlow, CNTK, or Theano.

=> For installing Keras Open Anaconda Prompt to type the following commands.

pip install keras

=> Let’s try running Mnist_Mlp.Py in your prompt. you can use other examples as
well.

Open Anaconda Prompt to type the following commands.


activate tensorflow
python mnist_mlp.py

There are some other famous libraries like Pytorch, Theano, and Caffe2 you can use
as per your choice and use.

Classification in Machine Learning


In machine learning and statistics, classification is a supervised learning method in
which a computer software learns from data and makes new observations or
classifications.

Classification is the process of dividing a set of data into distinct classes. It may be
applied to both organized and unstructured data. Predicting the class of data points is
the first step in the procedure. Target, label, and categories are common terms for the
classes.

Approximating the mapping function from discrete input variables to discrete output
variables is the problem of classification predictive modeling. The basic objective is to
figure out which category or class the new data belongs in.

There are a couple of different types of classification tasks in machine learning,


namely:

Binary Classification – This is what we’ll discuss a bit more in-depth here.
Classification problems with two class labels are referred to as binary classification. In
most binary classification problems, one class represents the normal condition and the
other represents the aberrant condition.
Multi-Class Classification – Classification jobs with more than two class labels are
referred to as multi-class classification. Multi-class classification, unlike binary
classification, does not distinguish between normal and pathological results. Instead,
examples are assigned to one of a number of pre-defined classes.

A Closer Look at Binary Classification.As we have already discussed and as its


name implies, binary classification in deep learning refers to the type of classification
where we have two class labels – one normal and one abnormal. Some examples of
binary classification use:

1. To detect whether email is spam or not


2. To determine whether or not a patient has a certain disease in medicine.
3. To determine whether or not quality specifications were met when it comes to
QA (Quality Assurance).

For example, the normal class label would be that a patient has the disease, and the
abnormal class label would be that they do not, or vice-versa.

As is with every other type of classification, it is only as good as the binary


classification dataset that it has – or, in other words, the more training and data it has,
the better it is.

Binary Classification Algorithms.

There are quite a few different algorithms used in binary classification. The two that
are designed with only binary classification in mind (meaning they do not support
more than two class labels) are Logistic Regression and Support Vector Machines. A
few other algorithms are: Nearest Neighbours, Decision Trees, and Naive Bayes.

Logistic Regression – It is a machine learning classification algorithm that employs


one or more independent variables to produce a result. A dichotomous variable is used
to assess the result, which means there are only two potential results. The purpose of
logistic regression is to identify the best fit between a dependent variable and a
collection of independent variables. It outperforms other binary classification
algorithms such as closest neighbor because it quantifies the elements that lead to
categorization.

Support Vector Machine – The support vector machine is a classification algorithm


that depicts training data as points in space split into categories by as large a distance
as feasible. After then, new points are added to space by guessing which category they
will fall into and which space they will occupy. Its decision function employs a subset
of training points, making it memory economical and very effective in high-
dimensional spaces. The support vector machine’s sole drawback is that the approach
does not immediately offer probability estimations.
UNIT IV

Convolutional Neural Networks


Neural networks are a type of machine learning algorithm modeled after the structure
and function of the human brain. They are composed of layers of interconnected nodes,
or neurons, that process and transform input data to produce an output.

A Convolutional Neural Network (CNN) is a type of neural network that is commonly


used for image classification and computer vision tasks. CNNs are inspired by the
biological structure of the visual cortex in the human brain and are designed to process
visual data, such as images and videos.

CNNs consist of multiple layers, including convolutional layers, pooling layers, and
fully connected layers. The input to a CNN is a 3-dimensional array representing an
image, where each dimension corresponds to the width, height, and color channels (red,
green, and blue).

Convolutional layers are the most important component of CNNs. They consist of a set
of filters that slide over the input image, performing a mathematical operation called
convolution. The filters are learned during training and are used to extract features from
the input image, such as edges, corners, and textures.

Pooling layers are used to reduce the dimensionality of the feature maps produced by
the convolutional layers. They do this by downsampling the feature maps using
operations like max pooling or average pooling, which reduces the spatial size of the
feature maps.

Fully connected layers are used to produce the final output of the CNN. They take the
feature maps produced by the convolutional and pooling layers and flatten them into a
1-dimensional array. This array is then passed through a series of fully connected layers
that produce the final output of the network, which is typically a probability distribution
over the possible classes.
During training, the parameters of the CNN, including the weights and biases of the
filters and fully connected layers, are optimized using an algorithm called
backpropagation. The goal of the optimization is to minimize a loss function that
measures the difference between the predicted output of the network and the true output.

CNNs have been very successful in a wide range of computer vision tasks, including
image classification, object detection, and segmentation. They have also been used in
natural language processing and speech recognition tasks.

Representation learning
Representation learning is a subfield of machine learning that aims to automatically
discover useful representations of input data. These representations are typically in the
form of feature vectors or embeddings that capture important characteristics of the
input data. The goal of representation learning is to learn a set of features that are both
informative and robust, and that can be used as input to downstream tasks such as
classification, clustering, or retrieval.

The traditional approach to machine learning involves manually engineering features


that are relevant to a particular task. However, this approach is often labor-intensive
and requires domain expertise. In contrast, representation learning automates the
process of feature engineering by learning features directly from the data.
There are several approaches to representation learning, including unsupervised
learning, supervised learning, and semi-supervised learning. Unsupervised learning
involves learning representations from unlabelled data, while supervised learning
involves learning representations from labeled data. Semi-supervised learning
combines both labelled and unlabelled data to learn representations.

One popular approach to unsupervised representation learning is through the use of


autoencoders. An autoencoder is a neural network that is trained to reconstruct its
input data. During training, the network learns a compressed representation of the
input data in the form of a bottleneck layer. This compressed representation can be
used as a feature vector for downstream tasks.

Another approach to unsupervised representation learning is through the use of


generative models such as variational autoencoders and generative adversarial
networks (GANs). These models learn to generate new samples that are similar to the
input data, and the learned representations can be used as features for downstream
tasks.

Supervised representation learning involves learning representations from labelled


data. This approach is commonly used in computer vision, where convolutional neural
networks (CNNs) are trained on large datasets such as ImageNet. The learned
representations can then be used as features for image classification, object detection,
and other computer vision tasks.

Representation learning has proven to be a powerful approach to machine learning,


with applications in a wide range of domains including computer vision, natural
language processing, and speech recognition. By automatically learning useful
representations of input data, representation learning has the potential to significantly
improve the performance of machine learning systems.
Figure 4:Representation learning

Convolutional layers
A convolutional layer is a key building block of convolutional neural networks (CNNs),
which are widely used in image recognition, natural language processing, and other
applications. The purpose of the convolutional layer is to extract features from the input
data using a set of learnable filters called kernels or weights.

The convolutional layer takes as input a multi-dimensional array of data, such as an


image, and applies a set of filters to it. Each filter is a small matrix that slides over the
input data, performing a dot product at each location to produce an output feature map.
The dot product sums the element-wise products of the filter weights and the input
values, and the resulting scalar is placed in the output feature map at the corresponding
location.

There are several types of convolutional layers, each with a different function:

Convolutional layer: This is the most common type of convolutional layer. It applies a
set of filters to the input data, producing a set of output feature maps.

Pooling layer: This layer reduces the size of the input data by downsampling it. The
most common type of pooling is max pooling, which takes the maximum value of each
local region of the input.
Strided convolutional layer: This layer applies convolution with a larger stride than
usual, effectively reducing the resolution of the output feature maps.

Depthwise separable convolutional layer: This layer applies two separate convolutional
operations, one to each channel of the input data, before combining them into a single
output.

Dilated convolutional layer: This layer increases the receptive field of the filters by
inserting gaps between the weights.

Transposed convolutional layer: This layer is used for up sampling the feature maps,
and is sometimes called a deconvolutional layer. It applies a set of filters to the input
data, but instead of sliding them over the input, it slides them over the output, effectively
reversing the convolution operation.

Overall, convolutional layers are an important component of CNNs, enabling the


models to learn and extract relevant features from complex data such as images, text,
and audio.

Figure 5: convolutional layers


Multichannel convolutional operation
In convolutional neural networks (CNNs), a multichannel convolutional operation
refers to the process of convolving multiple filters with multiple channels of input data.
In a traditional single-channel convolutional operation, a single filter is convolved with
a single channel of input data to produce a single channel of output data. However, in
multichannel convolution, the same filter is applied to each input channel, producing a
set of output channels that are then stacked to create the final output.
For example, consider an RGB image with three input channels (red, green, and blue).
In a multichannel convolutional operation, we could apply multiple filters to each of
these channels simultaneously. Each filter would generate a separate output channel,
and the final output would be the concatenation of all these channels.
Multichannel convolutional operations are commonly used in CNNs for image and
video processing, where input data typically has multiple channels. By convolving
multiple filters with multiple channels, CNNs can learn complex spatial features and
representations that are critical for accurate image and video recognition and
classification.
Input and output feature maps: In a multi-channel convolution operation, the input data
consists of multiple feature maps or channels, each corresponding to a different color
channel or a learned feature of the input image. The output of the convolution operation
is also a set of feature maps, where each output feature map corresponds to a learned
feature of the input data.

key concepts related to multi-channel convolution operations:

Filters or Kernels: In a multi-channel convolution operation, each filter is a 3D tensor


that is convolved with the input feature maps. The first two dimensions of the filter
correspond to the width and height of the filter, and the third dimension corresponds to
the number of input channels. During training, the values of the weights in the filter are
learned so that the filter can extract meaningful features from the input data.

Stride and Padding: The stride and padding in a multi-channel convolution operation
work the same way as in a single-channel convolution operation. The stride determines
the number of pixels by which the filter is shifted each time it is convolved with the
input data. The padding refers to the addition of extra rows and columns of zeros around
the edges of the input data before convolution.

Convolution operation: The multi-channel convolution operation is performed by


convolving each filter with the input feature maps, summing the results, and applying
an activation function. This process produces an output feature map for each filter, and
the set of output feature maps is then stacked together to produce the output tensor.

Multiple Filters: In a multi-channel convolution operation, multiple filters are typically


used to learn a variety of features from the input data. Each filter produces a different
set of output feature maps, which are then concatenated together to form the final output
tensor.

In summary, multi-channel convolution operations are a key component of CNNs used


for image recognition and segmentation tasks. They allow the network to learn complex
features from input data with multiple channels, such as color images, by convolving
multiple filters with the input feature maps and stacking the output feature maps together
to produce the final output tensor.

Introduction to Recurrent Neural Network


Recurrent Neural Network (RNN) is a type of Neural Network where the output from
the previous step are fed as input to the current step. In traditional neural networks, all
the inputs and outputs are independent of each other, but in cases like when it is
required to predict the next word of a sentence, the previous words are required and
hence there is a need to remember the previous words. Thus RNN came into existence,
which solved this issue with the help of a Hidden Layer. The main and most important
feature of RNN is Hidden state, which remembers some information about a sequence.
RNN have a “memory” which remembers all information about what has been
calculated. It uses the same parameters for each input as it performs the same task on
all the inputs or hidden layers to produce the output. This reduces the complexity of
parameters, unlike other neural networks.

How RNN works

The working of an RNN can be understood with the help of the below example:

Example: Suppose there is a deeper network with one input layer, three hidden layers,
and one output layer. Then like other neural networks, each hidden layer will have its
own set of weights and biases, let’s say, for hidden layer 1 the weights and biases are
(w1, b1), (w2, b2) for the second hidden layer, and (w3, b3) for the third hidden layer.
This means that each of these layers is independent of the other, i.e. they do not
memorize the previous outputs.
Now the RNN will do the following:

• RNN converts the independent activations into dependent activations by


providing the same weights and biases to all the layers, thus reducing the
complexity of increasing parameters and memorizing each previous output by
giving each output as input to the next hidden layer.

• Hence these three layers can be joined together such that the weights and bias of
all the hidden layers are the same, in a single recurrent layer.

The formula for calculating current state:

where:

ht -> current state


ht-1 -> previous state
xt -> input state

Formula for applying Activation function(tanh):

where:
whh -> weight at recurrent neuron
wxh -> weight at input neuron

The formula for calculating output:

Yt -> output
Why -> weight at output layer

Training through RNN

1. A single-time step of the input is provided to the network.


2. Then calculate its current state using a set of current input and the previous state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the information
from all the previous states.
5. Once all the time steps are completed the final current state is used to calculate
the output.
6. The output is then compared to the actual output i.e the target output and the error
is generated.
7. The error is then back-propagated to the network to update the weights and hence
the network (RNN) is trained.

Advantages of Recurrent Neural Network

1. An RNN remembers each and every piece of information through time. It is useful
in time series prediction only because of the feature to remember previous inputs
as well. This is called Long Short-Term Memory.
2. Recurrent neural networks are even used with convolutional layers to extend the
effective pixel neighbourhood.

Disadvantages of Recurrent Neural Network

1. Gradient vanishing and exploding problems.


2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation
function.

Applications of Recurrent Neural Network

1. Language Modelling and Generating Text


2. Speech Recognition
3. Machine Translation
4. Image Recognition, Face detection
5. Time series Forecasting.

PyTorch Tensor:
PyTorch is an open-source machine learning framework that is widely used for
building and training neural networks. One of the fundamental data structures in
PyTorch is the tensor.

A tensor in PyTorch is a multi-dimensional array that can hold scalar values, vectors,
matrices, and higher-dimensional arrays. Tensors can be created in PyTorch using
various methods, such as:

• Directly specifying the values of the tensor using Python lists or NumPy arrays
• Creating tensors filled with zeros or ones using the torch.zeros() or torch.ones()
functions, respectively
• Creating tensors with random values using the torch.randn() function
• Creating identity matrices using the torch.eye() function

Once a tensor is created, you can perform various operations on it, such as:

• Basic mathematical operations, such as addition, subtraction, multiplication, and


division
• Matrix operations, such as dot product and matrix multiplication
• Reshaping, slicing, and indexing operations
• Broadcasting, which allows you to perform operations on tensors of different
shapes

Overall, tensors are a critical building block of PyTorch, and understanding how to
create and manipulate them is essential for building and training neural networks using
this framework.

PyTorch is a Python-based library that provides functionalities such as:

• Torch Script for creating serializable and optimizable models

• Distributed training to parallelize computations

• Dynamic Computation graphs which enable to make the computation graphs on


the go, and many more

Tensors in PyTorch are similar to NumPy’s n-dimensional arrays which can also be
used with GPUs. Performing operations on these tensors is almost similar to
performing operations on NumPy arrays. This makes PyTorch very user-friendly and
easy to learn.

we built a simple neural network to solve a case study. We got a benchmark accuracy
of around 65% on the test set using our simple model. Now, we will try to improve
this score using Convolutional Neural Networks.
Understanding the Problem Statement:

Let me quickly summarize the problem statement. Our task is to identify the type of
apparel by looking at a variety of apparel images. There are a total of 10 classes in
which we can classify the images of apparels:

Label Description

0 T-shirt/top

1 Trouser

2 Pullover

3 Dress

4 Coat

5 Sandal

6 Shirt

7 Sneaker

8 Bag

9 Ankle boot

The dataset contains a total of 70,000 images. 60,000 of these images belong to the
training set and the remaining 10,000 are in the test set. All the images are grayscale
images of size (28*28). The dataset contains two folders – one each for the training set
and the test set. In each folder, there is a .csv file that has the id of the image and its
corresponding label, and a folder containing the images for that particular set.
Loading the dataset
Now, let’s load the dataset, including the train, test and sample submission file:

• The train file contains the id of each image and its corresponding label
• The test file, on the other hand, only has the ids and we have to predict their
corresponding labels
• The sample submission file will tell us the format in which we have to submit
the predictions

We will read all the images one by one and stack them one over the other in an array.
We will also divide the pixels of images by 255 so that the pixel values of images
comes in the range [0,1]. This step helps in optimizing the performance of our model.
Load the images:

As you can see, we have 60,000 images, each of size (28,28), in the training set. Since
the images are in grayscale format, we only have a single-channel and hence the shape
(28,28).

Let’s now explore the data and visualize a few images:


We have kept 10% data in the validation set and the remaining in the training set.
Next, let’s convert the images and the targets into torch format:
Similarly, we will convert the validation images:

Implementing CNNs using PyTorch:

now call this model, and define the optimizer and the loss function for the model:
This is the architecture of the model. We have two Conv2d layers and a Linear layer.
Next, we will define a function to train the model:
Finally, we will train the model for 25 epochs and store the training and validation
losses:

We can see that the validation loss is decreasing as the epochs are increasing. Let’s
visualize the training and validation losses by plotting them:
We can clearly see that the training and validation losses are in sync. It is a good sign
as the model is generalizing well on the validation set.

check the accuracy of the model on the training and validation set:

Let’s check the accuracy for the validation set as well:

As we saw with the losses, the accuracy is also in sync here – we got ~72% on the
validation set as well.

Generating predictions for the test set

We will load all the images in the test set, do the same pre-processing steps as we did
for the training set and finally generate predictions.
So, let’s start by loading the test images:

Now, we will do the pre-processing steps on these images similar to what we did for
the training images earlier:

Finally, we will generate predictions for the test set:

Replace the labels in the sample submission file with the predictions and finally save
the file and submit it on the leader board:
You will see a file named submission.csv in your current directory. You just have to
upload it on the solution checker of the problem page which will generate the score.

Our CNN model gave us an accuracy of around 71% on the test set.
UNIT-V

Interactive Application of Deep Learning:

Interactive applications of deep learning refer to systems that use deep learning algorithms to
enable users to interact with them in a meaningful way. These applications are designed to
provide real-time feedback or personalized recommendations based on the user's input,
behaviour ,or preferences.

It refer to computer programs or systems that allow users to interact with them in real-time,
providing immediate feedback and responses to user inputs. With the increasing availability
of high-speed internet, mobile devices, and advanced software, interactive applications have
become a ubiquitous part of our daily lives.

Interactive applications can take many forms, ranging from simple web-based tools to complex
software systems. Some examples of interactive applications include:

1.Gaming: Interactive games are one of the most common types of interactive applications.
These games allow players to interact with virtual environments, objects, and other players in
real-time.

2.Virtual assistants: Virtual assistants like Siri, Alexa, and Google Assistant are interactive
applications that allow users to interact with them using voice commands. These applications
use natural language processing and machine learning techniques to understand user requests
and provide relevant responses.

3.Social media platforms: Social media platforms like Facebook, Twitter, and Instagram are
interactive applications that allow users to interact with each other by sharing messages,
photos, and videos.

4.E-commerce websites: E-commerce websites like Amazon and eBay are interactive
applications that allow users to search for products, compare prices, and make purchases.

5.Data visualization tools: Data visualization tools like Tableau and Power BI are interactive
applications that allow users to explore and analyze data by creating visualizations and
dashboards.
Machine Vision:

 Machine vision is a technology that enables machines or computers to interpret,


Analyze, and understand images and videos.
 It is a subset of computer vision, which focuses on the automatic extraction, analysis,
and understanding of useful information from digital images and videos.
 Machine vision systems use various techniques to process and analyze images,
including image filtering, feature detection, pattern recognition, and machine learning
algorithms.
 These systems are commonly used in industrial automation, robotics, quality control,
and surveillance.

Some examples of applications of machine vision include:

 Inspection and quality control: Machine vision systems can be used to inspect and
evaluate the quality of products, such as printed circuit boards, automotive parts, and
food products.
 Robotics and automation: Machine vision can be used to guide robots and automated
systems, allowing them to accurately identify and manipulate objects.
 Object recognition and tracking: Machine vision systems can be used to identify and
track objects in real-time, making them useful for surveillance and security applications.
 Medical imaging: Machine vision techniques can be used to analyze medical images,
such as X-rays and MRI scans, to assist in diagnosis and treatment planning

Natural Language Pre-processing

Natural Language Processing (NLP) is a field of study that focuses on the interaction
between computers and human language.

• Natural language Pre-processing refers to the process of preparing text data for NLP
tasks, such as text classification, sentiment analysis, and machine translation.

• The main objective of natural language pre-processing is to transform unstructured text


data into a structured format that can be easily analyzed and processed by computers.

Some common techniques used in natural language pre-processing include:


1. Tokenization: This process involves breaking a text into individual words or tokens,
which are then used as the building blocks for further analysis.

2. Stop-word removal: Stop words are common words such as "and," "the," and "in,"
which do not carry significant meaning in the context of a sentence. Removing these words
can improve the accuracy of NLP models.

3. Stemming and lemmatization: These techniques involve reducing words to their root
form, which can help to reduce the number of unique words in a text dataset and improve
the accuracy of text analysis.

4. Part-of-speech tagging: This involves assigning each word in a text dataset to its
appropriate part of speech, such as noun, verb, or adjective.

5. Named entity recognition: This technique involves identifying and categorizing named
entities, such as people, places, and organizations, within a text dataset.

Applications of Natural Language Processing (NLP):

Spam Filters: One of the most irritating things about email is spam. Gmail uses natural
language processing (NLP) to discern which emails are legitimate and which are spam.
These spam filters look at the text in all the emails you receive and try to figure out what it
means to see if it’s spam or not.

Algorithmic Trading: Algorithmic trading is used for predicting stock market conditions.
Using NLP, this technology examines news headlines about companies and stocks and
attempts to comprehend their meaning in order to determine if you should buy, sell, or hold
certain stocks.

Questions Answering: NLP can be seen in action by using Google Search or Siri Services.
A major use of NLP is to make search engines understand the meaning of what we are
asking and generate natural language in return to give us the answers.

Summarizing Information: On the internet, there is a lot of information, and a lot of it comes
in the form of long documents or articles. NLP is used to decipher the meaning of the data
and then provides shorter summaries of the data so that humans can comprehend it more
quickly.
Future Scope:

Bots: Chatbots assist clients to get to the point quickly by answering inquiries and referring
them to relevant resources and products at any time of day or night. To be effective,
chatbots must be fast, smart, and easy to use, To accomplish this, chatbots employ NLP to
understand language, usually over text or voice-recognition interactions

Supporting Invisible UI: Almost every connection we have with machines involves human
communication, both spoken and written. Amazon’s Echo is only one illustration of the
trend toward putting humans in closer contact with technology in the future. The concept
of an invisible or zero user interface will rely on direct communication between the user
and the machine, whether by voice, text, or a combination of the two. NLP helps to make
this concept a real-world thing.

Smarter Search: NLP’s future also includes improved search, something we’ve been
discussing at Expert System for a long time. Smarter search allows a chatbot to understand
a customer’s request can enable “search like you talk” functionality (much like you could
query Siri) rather than focusing on keywords or topics. Google recently announced that
NLP capabilities have been added to Google Drive, allowing users to search for documents
and content using natural language.

Future Enhancements:

Companies like Google are experimenting with Deep Neural Networks (DNNs) to push the
limits of NLP and make it possible for human-to-machine interactions to feel just like
human-to-human interactions.

Basic words can be further subdivided into proper semantics and used in NLP algorithms.

The NLP algorithms can be used in various languages that are currently unavailable such
as regional languages or languages spoken in rural areas etc.

Translation of a sentence in one language to the same sentence in another Language at a


broader scope.

Overall, natural language pre-processing is an important step in NLP tasks, as it helps to


ensure that text data is structured and cleaned, making it easier for computers to analyze
and process.
GENERATIVE ADVERSID LEARNING

Generative Adversarial Networks (GANs) are a type of deep learning model that consists
of two neural networks, a generator and a discriminator, that are trained together in a game-
like manner.

The generator network takes random noise as input and produces output that is meant to
resemble data from a target distribution, such as images or audio. The discriminator
network takes both real data from the target distribution and generated data from the
generator network as input and outputs a probability score indicating whether the input is
real or fake. The goal of the generator network is to produce output that can fool the
discriminator network into thinking it is real, while the goal of the discriminator network
is to correctly classify whether the input is real or fake.

During training, the generator and discriminator networks are optimized in a minimax
game-like fashion, where the generator seeks to maximize the probability of the
discriminator classifying its generated output as real, while the discriminator seeks to
minimize the probability of misclassifying generated output as real. This game-like
competition between the two networks leads to the generator network learning to produce
realistic output that is similar to the target distribution.

GANs can be used for a variety of applications, such as image synthesis, style transfer, and
data augmentation. One of the main advantages of GANs is that they can generate novel
data that is similar to the target distribution, which can be useful in situations where there
is a limited amount of real data available. However, GANs can be difficult to train and
require careful tuning of hyperparameters and regularization techniques to prevent
overfitting.

Overall, GANs are a powerful and popular deep learning model for generating realistic
data, and have been applied to a wide range of applications in computer vision, natural
language processing, and audio synthesis.
Deep Reinforcement Learning

Deep reinforcement learning is a subfield of machine learning that combines deep learning
with reinforcement learning to enable machines to learn and make decisions in complex
environments.

Reinforcement learning involves learning by trial-and-error through interactions with an


environment to maximize a reward signal. In reinforcement learning, an agent interacts
with an environment and receives feedback in the form of a reward signal. The goal of the
agent is to learn an optimal policy, or a set of actions, that maximizes the long-term
cumulative reward.

Deep reinforcement learning uses deep neural networks as function approximators to


estimate the value function or policy. The deep neural networks allow the agent to
generalize its learning from observed experiences to unseen experiences. The combination
of deep neural networks and reinforcement learning enables the agent to learn complex
tasks, such as playing games, controlling robots, or making decisions in complex
environments.

One of the main challenges in deep reinforcement learning is the problem of exploration-
exploitation trade-off. The agent needs to explore the environment to learn about the
optimal policy, but at the same time it needs to exploit the learned policy to maximize the
reward. Finding the right balance between exploration and exploitation is essential for the
success of the agent in the long term.

Deep reinforcement learning has been successfully applied in a wide range of applications,
such as game playing, robotics, and autonomous driving. Some of the most famous
applications of deep reinforcement learning include AlphaGo, which defeated the world
champion in the game of Go, and OpenAI Five, which defeated a team of human
professional players in the game of Dota 2.

Deep Learning Research

Deep learning research is a field of study that focuses on developing and improving
algorithms and techniques for training deep neural networks. Deep learning refers to the
subset of machine learning that uses deep neural networks to learn and make predictions
from data. Deep neural networks are composed of multiple layers of interconnected nodes
that process information hierarchically, allowing the network to learn complex
representations of the data.

Deep learning research involves developing new architectures and algorithms for training
deep neural networks, as well as improving existing techniques. It also involves
investigating the theoretical foundations of deep learning, such as understanding the
properties of deep neural networks and their optimization landscapes.

Deep learning research is motivated by the desire to develop more accurate and efficient
machine learning models that can solve a wide range of tasks, such as image recognition,
natural language processing, speech recognition, and robotics. Deep learning has enabled
breakthroughs in these areas and has been applied to many real-world applications, such as
self-driving cars, personalized medicine, and recommendation systems.

Some of the current research directions in deep learning include:

1. Developing more efficient and scalable deep learning algorithms, such as those that can
work with smaller datasets or require less computing power.

2. Developing more interpretable and transparent deep learning models, which can help
improve trust and accountability in AI systems.

3. Investigating the robustness and reliability of deep learning models, particularly in


adversarial settings where the model is intentionally fooled by malicious inputs.

4. Exploring new architectures and approaches to deep learning, such as attention


mechanisms, transformers, and meta-learning.

Autoencoders

A typical use of a Neural Network is a case of supervised learning. It involves training data
that contains an output label. The neural network tries to learn the mapping from the given
input to the given output label. But what if the output label is replaced by the input vector
itself? Then the network will try to find the mapping from the input to itself. This would be
the identity function which is a trivial mapping. But if the network is not allowed to simply
copy the input, then the network will be forced to capture only the salient features. This
constraint opens up a different field of applications for Neural Networks which was
unknown. The primary applications are dimensionality reduction and specific data
compression. The network is first trained on the given input. The network tries to
reconstruct the given input from the features it picked up and gives an approximation to the
input as the output. The training step involves the computation of the error and
backpropagating the error. The typical architecture of an Auto-encoder resembles a
bottleneck.

The encoder part of the network is used for encoding and sometimes even for data compression
purposes although it is not very effective as compared to other general compression techniques
like JPEG. Encoding is achieved by the encoder part of the network which has a decreasing
number of hidden units in each layer. Thus this part is forced to pick up only the most
significant and representative features of the data. The second half of the network performs
the Decoding function. This part has an increasing number of hidden units in each layer and
thus tries to reconstruct the original input from the encoded data. Thus Auto-encoders are an
unsupervised learning technique.

Example: See the below code, in autoencoder training data, is fitted to itself. That’s why
instead of fitting X_train to Y_train we have used X_train in both places.

autoencoder.fit(X_train, X_train, epochs=200)

Training of an Auto-encoder for data compression: For a data compression procedure, the most
important aspect of the compression is the reliability of the reconstruction of the compressed
data. This requirement dictates the structure of the Auto-encoder as a bottleneck. Step 1:
Encoding the input data The Auto-encoder first tries to encode the data using the initialized
weights and biases.
Step 2: Decoding the input data The Auto-encoder tries to reconstruct the original input from
the encoded data to test the reliability of the encoding.

Step 3: Backpropagating the error After the reconstruction, the loss function is computed to
determine the reliability of the encoding. The error generated is backpropagated.

The above-described training process is reiterated several times until an acceptable level of
reconstruction is reached.
After the training process, only the encoder part of the Auto-encoder is retained to encode a
similar type of data used in the training process. The different ways to constrain the network
are:-

Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then
the network will be forced to pick up only the representative features of the data thus encoding
the data.

Regularization: In this method, a loss term is added to the cost function which encourages the
network to train in ways other than copying the input.

Denoising: Another way of constraining the network is to add noise to the input and teach the
network how to remove the noise from the data.

Tuning the Activation Functions: This method involves changing the activation functions
of various nodes so that a majority of the nodes are dormant thus effectively reducing the
size of the hidden layers.

The different variations of Auto-encoders are:-

Denoising Auto-encoder: This type of auto-encoder works on a partially corrupted input


and trains to recover the original undistorted image. As mentioned above, this method is an
effective way to constrain the network from simply copying the input.

Sparse Auto-encoder: This type of auto-encoder typically contains more hidden units than
the input but only a few are allowed to be active at once. This property is called the sparsity
of the network. The sparsity of the network can be controlled by either manually zeroing
the required hidden units, tuning the activation functions or by adding a loss term to the
cost function.
Variational Auto-encoder: This type of auto-encoder makes strong assumptions about the
distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator
in the training process. It assumes that the data is generated by a Directed Graphical Model
and tries to learn an approximation to q_{\phi}(z|x) to the conditional property
q_{\theta}(z|x) where \phi and \theta are the parameters of the encoder and the decoder
respectively.

Below is the basic intuition code of how to build the autoencoder model and fitting X_train
to itself.

#build the simple encoder-decoder model.

#Notice the number of neurons in each Dense layer.

#The model will contract in the encoder then expand in the decoder.

encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])

decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])

autoencoder = keras.models.Sequential([encoder, decoder])

#compile the model

autoencoder.compile(loss="mse", optimizer=keras.optimizers.SGD(lr=0.1))

#train the model

history = autoencoder.fit(X_train, X_train, epochs=200)

# encode the data

codings = encoder.predict(X_train)

# decode the encoder output

decodings = decoder.predict(codings)

Deep Generative Models


Deep generative models are a type of deep learning architecture that can generate new
samples from a given distribution, such as images, audio, or text. Deep generative models
learn to capture the underlying structure of the data distribution by learning a latent
representation of the data.

There are several types of deep generative models, including:

1. Variational Autoencoders (VAEs): VAEs are a type of autoencoder that learn to encode
data into a latent representation and decode the representation back to the original data
space. However, VAEs differ from traditional autoencoders by enforcing a constraint on
the distribution of the latent representation. VAEs use the constraint to learn a smooth and
continuous latent space that can be used for generating new samples.

2. Generative Adversarial Networks (GANs): GANs are a type of deep neural network
architecture that learn to generate new samples by playing a two-player game. One player,
the generator, learns to generate new samples that are similar to the training data, while the
other player, the discriminator, learns to distinguish between real and generated samples.
The two players are trained together in an adversarial manner, where the generator tries to
fool the discriminator, and the discriminator tries to accurately distinguish between real and
generated samples.

3. Autoregressive Models: Autoregressive models generate new samples by modeling the


conditional distribution of the next value in a sequence, given the previous values. These
models are often used for text generation, where the model generates one word at a time
based on the previous words.

4. Flow-based Models: A flow-based model is a type of generative model that transforms


a simple random variable into a more complex distribution using a series of invertible
transformations. The model is trained to learn these transformations by minimizing the
difference between the true data distribution and the distribution of the transformed random
variable.

Deep generative models have many applications, such as image synthesis, speech synthesis,
and text generation. They are also used for tasks such as data augmentation, where synthetic
samples can be generated to increase the size of the training data.
Boltzman Machines

Boltzmann Machines is an unsupervised DL model in which every node is connected to


every other node. That is, unlike the ANNs, CNNs, RNNs and SOMs, the Boltzmann
Machines are undirected (or the connections are bidirectional). Boltzmann Machine is not
a deterministic DL model but a stochastic or generative DL model. It is rather a
representation of a certain system. There are two types of nodes in the Boltzmann Machine
— Visible nodes — those nodes which we can and do measure, and the Hidden nodes –
those nodes which we cannot or do not measure. Although the node types are different, the
Boltzmann machine considers them as the same and everything works as one single system.
The training data is fed into the Boltzmann Machine and the weights of the system are
adjusted accordingly. Boltzmann machines help us understand abnormalities by learning
about the working of the system in normal conditions.

Boltzmann Machine Energy-Based Models: Boltzmann Distribution is used in the sampling


distribution of the Boltzmann Machine. The Boltzmann distribution is governed by the
equation – Pi = e(-∈i/kT)/ ∑e(-∈j/kT)

Pi - probability of system being in state i

∈i - Energy of system in state i

T - Temperature of the system

k - Boltzmann constant
∑e(-∈j/kT) - Sum of values for all possible states of the system

Boltzmann Distribution describes different states of the system and thus Boltzmann machines
create different states of the machine using this distribution. From the above equation, as the
energy of system increases, the probability for the system to be in state ‘i’ decreases. Thus, the
system is the most stable in its lowest energy state (a gas is most stable when it spreads). Here,
in Boltzmann machines, the energy of the system is defined in terms of the weights of
synapses. Once the system is trained and the weights are set, the system always tries to find
the lowest energy state for itself by adjusting the weights.

Types of Boltzmann Machines:

 Restricted Boltzmann Machines (RBMs)


 Deep Belief Networks (DBNs)
 Deep Boltzmann Machines (DBMs)

Restricted Boltzmann Machines (RBMs):

In a full Boltzmann machine, each node is connected to every other node and hence the
connections grow exponentially. This is the reason we use RBMs. The restrictions in the node
connections in RBMs are as follows –

 Hidden nodes cannot be connected to one another.


 Visible nodes connected to one another.

Energy function example for Restricted Boltzmann Machine –

E(v, h) = -∑ aivi - ∑ bjhj - ∑∑ viwi,jhj

a, v - biases in the system - constants

vi, hj - visible node, hidden node

P(v, h) = Probability of being in a certain state

P(v, h) = e(-E(v, h))/Z

Z - sum if values for all possible states Suppose that we are using our RBM for building a
recommender system that works on six (6) movies. RBM learns how to allocate the hidden
nodes to certain features. By the process of Contrastive Divergence, we make the RBM close
to our set of movies that is our case or scenario. RBM identifies which features are important
by the training process. The training data is either 0 or 1 or missing data based on whether a
user liked that movie (1), disliked that movie (0) or did not watch the movie (missing data).
RBM automatically identifies important features. Contrastive Divergence:

RBM adjusts its weights by this method. Using some randomly assigned initial weights, RBM
calculates the hidden nodes, which in turn use the same weights to reconstruct the input nodes.
Each hidden node is constructed from all the visible nodes and each visible node is
reconstructed from all the hidden node and hence, the input is different from the reconstructed
input, though the weights are the same. The process continues until the reconstructed input
matches the previous input. The process is said to be converged at this stage. This entire
procedure is known as Gibbs Sampling. Gibb’s Sampling.

The Gradient Formula gives the gradient of the log probability of the certain state of the
system with respect to the weights of the system. It is given as follows –

d/dwij(log(P(v0))) = <vi0 * hj0> - <vi∞ * hj∞>

v - visible state, h- hidden state

<vi0 * hj0> - initial state of the system

<vi∞ * hj∞> - final state of the system

P(v0) - probability that the system is in state v0

wij - weights of the system

The above equations tell us – how the change in weights of the system will change the log
probability of the system to be a particular state. The system tries to end up in the lowest
possible energy state (most stable). Instead of continuing the adjusting of weights process until
the current input matches the previous one, we can also consider the first few pauses only. It
is sufficient to understand how to adjust our curve so as to get the lowest energy state.
Therefore, we adjust the weights, redesign the system and energy curve such that we get the
lowest energy for the current position. This is known as the Hinton’s shortcut.

Hinton’s Shortcut Working of RBM – Illustrative Example –

Consider – Mary watches four movies out of the six available movies and rates four of them.
Say, she watched m1, m3, m4 and m5 and likes m3, m5 (rated 1) and dislikes the other two,
that is m1, m4 (rated 0) whereas the other two movies – m2, m6 are unrated. Now, using our
RBM, we will recommend one of these movies for her to watch next. Say –

 m3, m5 are of ‘Drama’ genre.


 m1, m4 are of ‘Action’ genre.
 ‘Dicaprio’ played a role in m5.
 m3, m5 have won ‘Oscar.’
 ‘Tarantino’ directed m4.
 m2 is of the ‘Action’ genre.
 m6 is of both the genres ‘Action’ and ‘Drama’, ‘Dicaprio’ acted in it and it has won an
‘Oscar’.

We have the following observations –

 Mary likes m3, m5 and they are of genre ‘Drama,’ she probably likes ‘Drama’ movies.
 Mary dislikes m1, m4 and they are of action genre, she probably dislikes ‘Action’
movies.
 Mary likes m3, m5 and they have won an ‘Oscar’, she probably likes an ‘Oscar’ movie.
 Since ‘Dicaprio’ acted in m5 and Mary likes it, she will probably like a movie in which
‘Dicaprio’ acted.
 Mary does not like m4 which is directed by Tarantino, she probably dislikes any movie
directed by ‘Tarantino’.

Therefore, based on the observations and the details of m2, m6; our RBM recommends m6 to
Mary (‘Drama’, ‘Dicaprio’ and ‘Oscar’ matches both Mary’s interests and m6). This is how an
RBM works and hence is used in recommender systems.

RBMs are used to build Recommender Systems.

Deep Belief Networks (DBNs):

Suppose we stack several RBMs on top of each other so that the first RBM outputs are the
input to the second RBM and so on. Such networks are known as Deep Belief Networks. The
connections within each layer are undirected (since each layer is an RBM). Simultaneously,
those in between the layers are directed (except the top two layers – the connection between
the top two layers is undirected). There are two ways to train the DBNs-

1. Greedy Layer-wise Training Algorithm – The RBMs are trained layer by layer. Once
the individual RBMs are trained (that is, the parameters – weights, biases are set), the
direction is set up between the DBN layers.
2. Wake-Sleep Algorithm – The DBN is trained all the way up (connections going up –
wake) and then down the network (connections going down — sleep).
Therefore, we stack the RBMs, train them, and once we have the parameters trained, we make
sure that the connections between the layers only work downwards (except for the top two
layers).

Deep Boltzmann Machines (DBMs):

DBMs are similar to DBNs except that apart from the connections within layers, the
connections between the layers are also undirected (unlike DBN in which the connections
between layers are directed). DBMs can extract more complex or sophisticated features and
hence can be used for more complex tasks.

Restricted Boltzman Machines

Restricted Boltzmann Machines (RBMs) are a type of neural network that belong to the
family of Boltzmann machines. RBMs are used for unsupervised learning and are
particularly useful for dimensionality reduction, feature learning, and collaborative
filtering.

An RBM consists of two layers of neurons, visible and hidden, with connections between
them. The neurons in each layer are binary units that can be in one of two states (on or off).
The connections between the neurons are weighted, with each connection having a
corresponding weight. The RBM learns to adjust the weights of the connections between
the neurons in order to represent the input data in a more compact and meaningful way.

During training, the RBM learns to reconstruct the input data by adjusting the weights of
the connections between the visible and hidden layers. This is done by minimizing the
difference between the input data and the reconstructed data using a process called
contrastive divergence.

One of the key features of RBMs is that they are "restricted" because there are no
connections between neurons within the same layer. This means that RBMs can learn to
represent complex relationships between the inputs without being affected by the
correlations within the same layer.

RBMs have many applications, including image and audio recognition, natural language
processing, and recommendation systems. They are particularly useful in deep learning
because they can be used as building blocks for larger and more complex architectures,
such as deep belief networks (DBNs) and deep autoencoders.

Deep Belief Networks

Deep Belief Networks (DBNs) are a type of neural network that are composed of multiple
layers of Restricted Boltzmann Machines (RBMs). DBNs are used for unsupervised
learning and are particularly effective in modeling high-dimensional data such as images,
speech, and natural language.

DBNs are composed of multiple layers of RBMs, with the first layer being the visible layer
and the subsequent layers being hidden layers. Each RBM in the DBN learns to extract
higher-level features from the output of the previous RBM. The output of the last RBM is
used as input to a supervised learning algorithm, such as a classifier, to produce a final
prediction.

The RBMs in a DBN are trained in an unsupervised manner using Contrastive Divergence,
a type of stochastic gradient descent algorithm that iteratively adjusts the weights of the
connections between the neurons. Once the RBMs are trained, the entire DBN is fine-tuned
using backpropagation, which is a supervised learning algorithm that adjusts the weights
based on the error between the predicted output and the true output.

DBNs have been used in a variety of applications, including speech recognition, image
recognition, natural language processing, and recommender systems. One of the key
advantages of DBNs is that they can learn to extract hierarchical representations of the
input data, which makes them particularly useful in modeling complex relationships in
high-dimensional data.

You might also like