0% found this document useful (0 votes)
36 views27 pages

Unit II

Introduction to Deep Learning& Application

Uploaded by

Naga Raju Challa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views27 pages

Unit II

Introduction to Deep Learning& Application

Uploaded by

Naga Raju Challa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

BAPATLA ENGINEERING COLLEGE :: BAPATLA

(Autonomous)

Deep Learning (20ECJ44)


By
Dr. Naga Raju Challa
Assistant Professor,
Department of ECE,
Bapatla Engineering College,
(Autonomous)
Bapatla.
UNIT- I
Introduction to Deep Learning
&
Architectures
Machine Learning Vs. Deep Learning

 Machine learning and deep learning are both subfields of


artificial intelligence (AI) that focus on training
algorithms to make predictions or decisions based on data.
 However, they differ in their approaches, architectures,
and applications.
Machine Learning Vs Deep Learning
Features Machine Learning Deep Learning

ML encompasses a broad range of techniques DL is a subset of ML that specifically deals with


and algorithms that can be categorized into neural networks consisting of multiple layers.
supervised, unsupervised, and reinforcement These networks, known as artificial neural
Architecture
learning. networks, can automatically learn features from
These algorithms often rely on handcrafted data.
features extracted from data.

ML models can perform well with smaller DL models typically require large amounts of
datasets, making them suitable for applications data to generalize effectively.
Data Size
with limited data availability. They thrive when trained on big datasets,
making them suitable for tasks

ML models are generally less computationally DL models often demand significant


intensive than deep learning models. They can computational resources, including powerful
Computations
run on standard hardware and may not require GPUs or TPUs, due to the complexity of neural
specialized GPUs. networks with many layers.
Machine Learning Vs. Deep Learning
Features Machine Learning Deep Learning

ML models often train faster than DL models, DL models slower than ML models.
Training Time which can require prolonged training times,
especially on large datasets.

Traditional ML models are often more DL models, particularly DNN, are considered as
Interpretability interpretable because they rely on human- black boxes because understanding their
engineered features and simpler algorithms. decision-making processes can be challenging.

In traditional ML, a significant amount of time DL models can automatically learn features
Feature is spent on feature engineering, which involves from the raw data, reducing the need for
Engineering selecting, transforming, and engineering extensive feature engineering. This is one of the
relevant features from the raw data to improve key advantages of deep learning.
model performance.

fraud detection, recommendation systems, Image recognition, speech recognition, NLP,


Applications
natural language processing, and more. autonomous driving, and game playing.
Representation learning

 Definition: The process of automatically discovering and extracting meaningful patterns or


features from raw data is known as Representation learning.
 It is also known as feature learning or feature extraction.
 It is a fundamental concept in deep learning and machine learning.
 The representation learning are more informative and relevant for solving a specific task, such as
classification, regression, or clustering.
 The aim of representation learning is to transform data into a different, more compact, and more
useful format.
 In traditional machine learning, feature engineering often involves manually selecting or
designing features based on domain knowledge.
 However, the representation learning, aims to automate this process by allowing the model to
learn the most relevant features directly from the data.
Representation learning Models
 Convolutional Neural Networks (CNNs): In computer vision tasks, CNNs are designed to
automatically learn hierarchical representations of images. They use convolutional layers to capture
local patterns and features, followed by fully connected layers for higher-level abstractions.
 Recurrent Neural Networks (RNNs): RNNs are used for sequential data, such as natural language
text or time series data. They learn to capture temporal dependencies and can be used for tasks like
sentiment analysis, machine translation, and speech recognition.
 Transfer Learning: Transfer learning involves pre-training a deep neural network on a large dataset
and then fine-tuning it on a smaller, task-specific dataset. The pre-trained network serves as a feature
extractor, and its learned representations are often useful for various downstream tasks.
 Auto-encoders: Auto-encoders are neural networks that are trained to reconstruct their input data. The
hidden layers of the auto-encoder learn to capture essential features or representations of the input data
during the training process. Variations like de-noising auto-encoders and variational auto-encoders
(VAEs) are commonly used for representation learning.
Representation learning Models
 Auto-encoders: Auto-encoders are neural networks that are trained to reconstruct their input data. The
hidden layers of the auto-encoder learn to capture essential features or representations of the input data
during the training process. Variations like de-noising auto-encoders and variational auto-encoders
(VAEs) are commonly used for representation learning.
 Self-Supervised Learning: Self-supervised learning is a type of representation learning where a model
learns from data with automatically generated labels or annotations. For example, predicting missing
parts of an image or missing words in a sentence can be used to train self-supervised models.
 Representation learning has played a crucial role in improving the performance of deep learning
models across various domains, including computer vision, NLP, speech recognition, etc.
 By learning informative representations, models can generalize better to new and unseen data, making
them more effective and efficient for a wide range of tasks.
Width Vs Depth of Neural Networks

 Width and Depth are two important architectural aspects of neural networks that affect their
capacity and performance.
 The number of neurons (or units) in each layer of a neural network is known as Width.
 Increasing the width of a neural network can increase its capacity to learn complex patterns in
the data.
 However, a very wide network may also require more training data and computational
resources and may be prone to overfitting.
Source: NPTEL IIT KGP
Width Vs. Depth of Neural Networks

 The number of layers presented in a neural network is called Depth.


 A deep neural network has many hidden layers between the input and output layers.
 Deeper networks are capable of capturing hierarchical features in the data, where lower
layers learn simple features, and higher layers learn more abstract and complex features.
 Image processing, and natural language understanding.
Source: NPTEL IIT KGP
Activation Functions

 The sigmoid function compresses all inputs to the range  The Tanh function compresses all inputs to the
of {0,1}. 𝜕 𝜎 ( 𝑥) range of {-1,1}.
 The gradient of the function is =𝜎 ( 𝑥 ) ( 1 −𝜎 ( 𝑥 ) )  The gradient of the function is
𝜕𝑥
Cons Pros:
Saturation Region:  It is a zero center
 A sigmoid neuron is saturated when=1 or =0. Cons:
 From the graph, at saturation it would be 0;  Still gradient vanish problem is there.
𝑤=𝑤 −𝜂 𝛻 𝑤 =0  Computationally expensive.
 A saturated gradient neuron can causes the gradient to
vanish
 Sigmoid are not zero center
Rectified Linear Unit (ReLU)
 ReLU Stands for Rectified Linear Unit.
 It is a non linear activation function.
 Pros:
 It doesn’t saturate in the positive region
 Computationally Effective.
 Much faster than sigmoid/Tanh
 The derivative of ReLU is

𝑓 ( 𝑥 ) =𝑚𝑎𝑥 ( 0 , 𝑥 )

 Cons:
 This causes Dead neuron Problem.
Leaky ReLU & Exponential ReLu
Leaky RELU Parametric RELU Exponential RELU

𝑓 ( 𝑥 ) =𝑚𝑎𝑥 ( 0 .01 𝑥 , 𝑥 ) 𝑓 ( 𝑥 )=𝑥 if 𝑥 >0


𝑓 ( 𝑥 ) =𝑚𝑎𝑥 ( 𝛼 𝑥 , 𝑥 ) = 𝑎𝑒 𝑥 − 1 if 𝑥 ≤ 0
Learning Models
 Supervised Learning Models: In supervised learning, models are trained on labeled data, where each
input is associated with a corresponding target or output. Common algorithms include linear regression,
decision trees, support vector machines, and deep neural networks. These models learn to map inputs to
outputs and can be used for tasks like classification and regression.
 Unsupervised Learning Models: Unsupervised learning models work with unlabeled data and aim to
discover patterns, structures, or relationships within the data. Clustering algorithms like k-means and
hierarchical clustering, as well as dimensionality reduction techniques like Principal Component
Analysis (PCA), are examples of unsupervised learning models.
 Reinforcement Learning Models: In reinforcement learning, agents learn to make sequential
decisions in an environment to maximize a reward signal. These models are used in applications such
as game playing, robotics, and autonomous systems. Popular reinforcement learning algorithms include
Q-learning and deep reinforcement learning algorithms like DQN and A3C.
Learning Models
 Semi-Supervised Learning Models: Semi-supervised learning combines elements of both supervised
and unsupervised learning. These models use a small amount of labeled data and a larger amount of
unlabeled data to improve learning performance.
 Self-Supervised Learning Models: Self-supervised learning is a type of unsupervised learning where
models generate their own labels from the data itself. For example, in natural language processing,
models might learn to predict missing words in a sentence or generate contextually relevant
representations of words or phrases.
 Transfer Learning Models: Transfer learning involves pre-training a model on one task or dataset and
then fine-tuning it for another related task. This approach can save time and resources when training
models and is commonly used in deep learning, such as with pre-trained convolutional neural networks
(CNNs) for image classification.
Learning Models
 Neural Network Architectures: Deep learning models, which are a subset of neural networks, have
gained prominence in recent years. These models consist of multiple layers of interconnected artificial
neurons and are particularly well-suited for tasks involving large amounts of data, such as image and
speech recognition. Popular architectures include convolutional neural networks (CNNs) for computer
vision and recurrent neural networks (RNNs) for sequence data.
Unsupervised Training of Neural Networks
 Unsupervised training of neural networks is a type of machine learning approach where a neural
network learns patterns, representations, or structures in data without explicit supervision or labeled
target outputs.
 In contrast to supervised learning, where the network is provided with labeled examples and aims to
minimize the prediction error.
 However, the unsupervised learning focuses on extracting useful information from the data itself.
 Techniques:
 Autoencoder
 Restricted Boltzmann Machines (RBMs)
 Clustring
 Generative Adversarial Networks (GANs)
 Variational Autoencoders (VAEs)
 Dimensionality Reduction
Autoencoder
 An auto encoder is a type of artificial neural network used in unsupervised machine learning and
dimensionality reduction tasks.
 It is used for data compression and feature learning.
 Auto encoders consist of an encoder and a decoder, both of which are neural networks.
 The main idea behind auto encoders is to learn a compressed representation (encoding) of the input
data and then decode it to reconstruct the original data.
 Assumptions:
 1. High degree of correlation/ Structure exist in the data.
 2. For uncorrelated data (input features are independent), then compression and subsequent
reconstruction would be difficult.
Autoencoder

 The number of layers in the bottleneck layer or


hidden layer is much lesser than the number of
nodes in the input layer.
 The number of nodes in the input layer is equal
to the number of nodes in the output layer.
 The number of nodes in the input layer is𝑀 × 𝑁 +1
 The number of nodes in the output layer is𝑀 × 𝑁
 If the number of nodes in the hidden layer is d
then the condition for number of hidden layers in
the encoder side is𝑑≪ 𝑀 × 𝑁
 the condition for number of hidden layers in the
decoder side is𝑑≪ 𝑀 × 𝑁 +1 Fig: Basic Structure of an Autoencoder
Autoencoder

Fig: Block Diagram of an Autoencoder


Stacked Autoencoder

Fig: Block Diagram of an Stacked Autoencoder


Autoencoder

 Expectations:
 Sensitive enough to input for accurate
reconstruction.
 Insensitive enough that it doesn’t memorize or
overfit the training data
 𝐿𝑜𝑠𝑠 function ⇒ 𝐿 ( 𝑋 , ^
𝑋 ) +𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑟

Fig: Basic Structure of an Autoencoder


Autoencoder
Under Complete Autoencoder

Fig: Basic Structure of an Autoencoder

 In order to minimize the error we use back propogation algorithm.


Under Complete Autoencoder

In this case length (h)< length( 𝑋 𝑖 )

h=𝑔 ( 𝑊 𝑋 𝑖 +𝑏 )
^
𝑋 = 𝑓 ( 𝑊 ∗ h+𝑐 )
𝑖

Under Complete Autoencoder


Over Complete Autoencoder

In this case length (h) ≥ length( 𝑋 𝑖 )

h=𝑔 ( 𝑊 𝑋 𝑖 +𝑏 )
^
𝑋 = 𝑓 ( 𝑊 ∗ h+𝑐 )
𝑖

Fig: Over Complete Autoencoder


Autoencoder: Examples

 Suppose all the inputs are binary each𝑥 ∈ { 0 ,1 }


𝑖𝑗
 The most suitable activation function for the
decoder is
^
𝑋 𝑖=log 𝑖𝑠𝑡𝑖𝑐 ( 𝑊 ∗ h+ 𝑐 )
 The logistic function resists all output values to 0
and 1.
 However, at the encoder side tanh or linear, or
sigmoid activation functions can be used.

You might also like