0% found this document useful (0 votes)
18 views19 pages

NN DL Unit - III

Deep learning is a subset of machine learning that utilizes multi-layered artificial neural networks to model complex problems, mimicking human brain processes. Key features include automatic feature extraction, scalability with large datasets, and high computational power, with applications in computer vision, natural language processing, and healthcare. The evolution of deep learning has been marked by significant milestones, including the introduction of CNNs, RNNs, and Transformers, leading to breakthroughs in various AI domains.

Uploaded by

harsha8383m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

NN DL Unit - III

Deep learning is a subset of machine learning that utilizes multi-layered artificial neural networks to model complex problems, mimicking human brain processes. Key features include automatic feature extraction, scalability with large datasets, and high computational power, with applications in computer vision, natural language processing, and healthcare. The evolution of deep learning has been marked by significant milestones, including the introduction of CNNs, RNNs, and Transformers, leading to breakthroughs in various AI domains.

Uploaded by

harsha8383m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT – III

INTRODUCTION TO DEEP LEARNING


Deep Learning is a subset of machine learning that focuses on using artificial neural
networks with many layers (hence "deep") to model and solve complex problems. It mimics
the way the human brain processes information to recognize patterns, make decisions,
and learn from data.

Key Features of Deep Learning:

1. Multi-layered Networks:

o Deep learning models consist of multiple layers of neurons that process data
hierarchically, extracting features from simple to complex levels.

2. Representation Learning:

o Unlike traditional machine learning, where feature extraction is manual, deep


learning automatically learns features from raw data.

3. Large-scale Data:

o Deep learning thrives on massive datasets and can identify patterns that are
difficult for humans or traditional algorithms to spot.

4. High Computational Power:

o It leverages modern hardware like GPUs and TPUs to process data efficiently.

How Deep Learning Works:

 Deep learning uses neural networks that consist of:

1. Input Layer: Takes raw data (e.g., images, text, audio).

2. Hidden Layers: These intermediate layers learn to extract features from data.

 Each layer applies transformations using weights, biases, and


activation functions.

3. Output Layer: Produces predictions or classifications.

Applications of Deep Learning:

1. Computer Vision:

o Object detection, facial recognition, autonomous driving.

2. Natural Language Processing (NLP):

o Machine translation, chatbots, sentiment analysis.

3. Speech Recognition:
o Voice assistants like Alexa, Siri.

4. Healthcare:

o Disease diagnosis, drug discovery.

5. Gaming and AI:

o Intelligent agents in games, AlphaGo.

Popular Architectures:

1. Convolutional Neural Networks (CNNs):

o Best for image and video data.

2. Recurrent Neural Networks (RNNs):

o Useful for sequential data like text and time-series.

3. Transformers:

o Backbone of modern NLP models like GPT and BERT.

Key Benefits:

 Automates feature extraction.

 Achieves state-of-the-art performance in many domains.

 Scales well with data and compute power.

Deep learning has revolutionized AI, enabling machines to outperform humans in certain
tasks like image recognition and language understanding.

HISTORICAL TRENDS IN DEEP LEARNING


Deep learning's evolution is tied to breakthroughs in artificial neural networks,
computation, and data availability. Here's a detailed timeline with key milestones and
examples that shaped the field:

1. Early Foundations (1940s–1980s)

Deep learning’s roots go back to foundational concepts in artificial intelligence and


neuroscience.

 1943: McCulloch-Pitts Neuron

o Development: Warren McCulloch and Walter Pitts introduced a


mathematical model of artificial neurons.

o Significance: Laid the theoretical groundwork for neural networks.

 1958: Perceptron
o Development: Frank Rosenblatt developed the Perceptron, the first
algorithm to learn weights using input-output examples.

o Limitation: Could only solve linearly separable problems (highlighted in 1969


by Minsky and Papert).

 1980: Neocognitron

o Development: Kunihiko Fukushima introduced the Neocognitron, an early


model resembling today’s Convolutional Neural Networks (CNNs) for image
recognition.

o Significance: Introduced hierarchical feature extraction.

2. Neural Network Revival (1980s–1990s)

Interest in neural networks resurfaced due to new techniques and computational


advances.

 1986: Backpropagation Algorithm

o Development: David Rumelhart, Geoffrey Hinton, and Ronald Williams


popularized backpropagation for training neural networks.

o Significance: Enabled multi-layer networks to learn efficiently.

 1989: LeNet (Yann LeCun)

o Example: LeNet-5, an early CNN, was developed for digit recognition (e.g.,
reading zip codes on mail).

o Impact: Demonstrated neural networks' capability in computer vision.

3. Winter and Slow Growth (1990s–2000s)

Neural networks stagnated due to:

1. Limited Computational Power: Training large networks was impractical.

2. Lack of Data: Insufficient labeled data for large-scale learning.

3. Competitors: Algorithms like Support Vector Machines (SVMs) outperformed


neural networks.

4. Deep Learning Renaissance (2006–2012)

Deep learning gained traction with breakthroughs in computation, algorithms, and data
availability.

 2006: Deep Belief Networks

o Development: Geoffrey Hinton introduced unsupervised pre-training with


Deep Belief Networks (DBNs).
o Significance: Demonstrated the effectiveness of training deep architectures.

 2012: AlexNet

o Development: Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever designed


AlexNet, a CNN that won the ImageNet competition with a large margin.

o Significance: Showed the power of GPUs for large-scale neural networks and
sparked widespread interest in deep learning.

5. Modern Deep Learning Era (2012–Present)

Deep learning became the dominant paradigm for AI research, thanks to abundant data,
scalable architectures, and hardware.

Key Trends and Examples:

1. Convolutional Neural Networks (CNNs):

o 2015: ResNet (Residual Networks):

 Introduced by Kaiming He, ResNet overcame the vanishing gradient


problem by using residual connections.

 Application: Advanced image recognition tasks.

o Impact: CNNs dominated computer vision tasks like object detection (e.g.,
YOLO, Mask R-CNN).

2. Recurrent Neural Networks (RNNs) and LSTMs:

o Development: Long Short-Term Memory (LSTM) networks (1997) were


popularized for sequential data tasks like speech recognition and time-series
forecasting.

o Example: Google Translate's early deep learning models.

3. Transformers and NLP Revolution:

o 2017: Attention is All You Need:

 Introduced Transformers, which replaced RNNs in NLP tasks.

 Example: GPT models (e.g., GPT-3 in 2020) for language generation.

o Impact: Models like BERT and GPT dominated NLP tasks like text generation,
summarization, and translation.

4. Reinforcement Learning:

o 2016: AlphaGo (DeepMind):

 Used deep reinforcement learning to defeat a world champion in Go.

o Impact: Demonstrated the potential of deep learning in decision-making.

5. Generative Models:
o 2014: Generative Adversarial Networks (GANs):

 Introduced by Ian Goodfellow, GANs generated realistic images and


videos.

o Example: Deepfake technologies, art generation.

o 2021: DALL-E:

 OpenAI's model for generating creative images from textual


descriptions.

6. Scalability and Foundation Models:

o 2020s: Emergence of large-scale pre-trained models.

 Examples: GPT-4, CLIP, and DALL-E.

o Significance: General-purpose AI systems for diverse applications.

Future Directions in Deep Learning:

1. Self-supervised Learning:

o Learning representations with minimal labeled data.

o Examples: SimCLR, BYOL.

2. Efficient Models:

o Lightweight models for deployment on edge devices (e.g., MobileNet).

3. Multimodal AI:

o Combining modalities like vision, language, and audio.

o Example: OpenAI’s GPT-4 Vision.

4. Ethics and Explainability:

o Improving transparency and addressing biases in deep learning systems.

Summary

Deep learning has evolved from theoretical foundations to practical, state-of-the-art


systems in fields like computer vision, NLP, and reinforcement learning. Innovations like
CNNs, Transformers, and generative models have revolutionized AI, enabling
breakthroughs in areas ranging from healthcare to entertainment.

DEEP FEED-FORWARD NETWORKS (FFNN)


A Deep Feed-Forward Neural Network (also called a multi-layer perceptron or MLP) is
one of the simplest types of artificial neural networks. It is composed of layers of
interconnected neurons where data flows in one direction—from the input layer to the
output layer—without loops or feedback.
Key Characteristics:

1. Layered Structure:

o Input Layer: Receives raw data (features).

o Hidden Layers: Perform computations to learn intermediate representations.

o Output Layer: Produces predictions or classifications.

2. Feed-Forward:

o Information flows forward through the network (no cycles).

3. Fully Connected:

o Each neuron in one layer is connected to every neuron in the next layer.

4. Activation Functions:

o Introduce non-linearity (e.g., ReLU, Sigmoid, Tanh) to help the network learn
complex patterns.

Mathematical Representation:

Example: Classifying Handwritten Digits (MNIST Dataset)

1. Input:
o 28×28 grayscale images of digits, flattened into a vector of 784 features.

2. Architecture:

o Input Layer: 784 neurons (one for each pixel).

o Hidden Layer 1: 128 neurons with ReLU activation.

o Hidden Layer 2: 64 neurons with ReLU activation.

o Output Layer: 10 neurons (one for each digit, 0–9), with Softmax activation
for probabilities.

3. Flow:

o Data moves from input to hidden layers, undergoes weighted transformations,


and is activated before reaching the output layer.

Diagram:

Here’s a simple illustration of a deep feed-forward network with 1 input layer, 2 hidden
layers, and 1 output layer:

Advantages:

 Simplicity: Easy to implement and interpret.

 Universal Approximation: Can approximate any function with enough neurons


and layers.

Limitations:

 Overfitting: Prone to memorizing training data if regularization is not applied.

 Vanishing Gradients: Training deep networks can be challenging due to


diminishing gradients (solved using ReLU and batch normalization).

Deep feed-forward networks are foundational in deep learning, serving as a basis for more
advanced architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs).
GRADIENT-BASED LEARNING
Gradient-based learning is a core concept in deep learning, where a model learns by
optimizing its parameters (weights and biases) through an iterative process that minimizes
a loss function. The gradient represents the direction and rate of change of the loss with
respect to the model's parameters, guiding the optimization process.

Key Steps in Gradient-Based Learning:

1. Forward Pass:

o Input data is passed through the network to compute the output (predictions).

o A loss function evaluates the difference between the predicted output and
the actual target.

2. Backward Pass (Backpropagation):

o The gradient of the loss function with respect to each parameter is computed
using the chain rule of calculus.

3. Parameter Update:

o Parameters (weights and biases) are updated using an optimization algorithm


(e.g., Gradient Descent):
Diagram: Gradient-Based Learning Workflow

Here’s a simplified flowchart for gradient-based learning:

Common Optimization Algorithms:

1. Gradient Descent:

o Vanilla method that updates parameters using the gradient.

2. Stochastic Gradient Descent (SGD):

o Updates parameters using a single sample or mini-batch.

3. Advanced Methods:
o Adam: Combines momentum and adaptive learning rates.

o RMSProp: Scales gradients using recent magnitudes.

Advantages:

 Works well for large-scale problems.

 Efficient for learning complex models like deep neural networks.

Limitations:

 Vanishing/Exploding Gradients: In very deep networks, gradients can become too


small or too large.

 Local Minima: Gradient descent may converge to a local minimum instead of the
global minimum.

Gradient-based learning is the backbone of most modern machine learning models and
has enabled significant advancements in AI applications like image recognition, natural
language processing, and more.

HIDDEN UNITS IN NEURAL NETWORKS


Hidden units are the neurons in the hidden layers of a neural network. They are called
"hidden" because they are not directly exposed to the input or output; instead, they act as
intermediaries that process data to learn patterns and representations.

Each hidden unit performs a computation based on the weighted sum of its inputs,
followed by an activation function. These units are critical for capturing non-linear
relationships in the data.

Role of Hidden Units:

1. Feature Extraction:

o Hidden units extract intermediate features from the input data, transforming
raw information into useful representations.

2. Non-linearity:

o By applying activation functions (e.g., ReLU, Sigmoid), hidden units allow the
network to learn non-linear relationships in the data.

3. Learning Complexity:

o The number of hidden units and layers determines the model's capacity to
learn complex patterns.

Example: Hidden Units in a Simple Neural Network


Suppose we have a network to classify images as "cat" or "dog":

1. Input Layer:

o Takes features of the image (e.g., pixel values).

2. Hidden Layer:

o Contains hidden units that combine input features using weights and biases,
applying an activation function to capture patterns like "edges" or "textures."

3. Output Layer:

o Produces probabilities for "cat" or "dog" based on the representations learned


by the hidden units.

Computation in Hidden Units:

Diagram: Neural Network with Hidden Units

In this diagram:

 Hidden Layer: Contains the "hidden units" (neurons in the middle).

Choosing the Number of Hidden Units:

1. Too Few Hidden Units:


o May underfit the data (not enough capacity to capture patterns).

2. Too Many Hidden Units:

o May overfit the data (memorizing instead of generalizing).

Real-World Example:

 Image Recognition:

o In a deep network like a Convolutional Neural Network (CNN), hidden units


in the initial layers learn simple features (e.g., edges), while later layers
combine them to recognize complex objects.

Hidden units are the building blocks of a neural network, enabling it to learn complex,
non-linear relationships in data.

Types of Hidden Units:

1. Fully Connected Units:

o Each hidden unit is connected to all units in the previous layer.

2. Convolutional Units:

o Used in CNNs for spatial data like images.

3. Recurrent Units:

o Used in RNNs for sequential data.

Key Points:

 Number of Hidden Units: Affects model complexity and performance. Too few units
may underfit, while too many may overfit.

 Activation Functions: Common choices include ReLU, Sigmoid, and Tanh.

 Deep Learning: Multiple layers of hidden units enable deep learning models to learn
hierarchical representations.

Hidden units are the heart of neural networks, allowing them to process and transform
data into meaningful insights for solving complex problems.

ARCHITECTURE DESIGN OF DEEP LEARNING


The architecture design of a deep learning model refers to the structure and organization
of layers, neurons, and connections in a neural network. Different tasks require different
architectures, such as feedforward networks, convolutional networks (CNNs), recurrent
networks (RNNs), and transformers.

Key Components of Deep Learning Architectures:


1. Input Layer:

o Takes the raw data (e.g., images, text, or tabular data).

o Number of neurons equals the number of features in the input.

2. Hidden Layers:

o Consist of multiple layers of neurons that learn patterns and features from
data.

o Each layer can have varying numbers of neurons and activation functions.

3. Output Layer:

o Produces the final result (e.g., class probabilities or regression values).

o Number of neurons depends on the task (e.g., 10 for digit classification, 1 for
regression).

4. Activation Functions:

o Introduce non-linearity (e.g., ReLU, Sigmoid, Tanh) for better feature


representation.

5. Loss Function:

o Measures the difference between predictions and true labels (e.g., cross-
entropy for classification, MSE for regression).

6. Optimization Algorithm:

o Updates weights during training (e.g., Gradient Descent, Adam).

Example: Architecture for MNIST Digit Classification

Problem:

Classify handwritten digits (0–9) from the MNIST dataset.

Architecture:

1. Input Layer:

o 28×28 grayscale images flattened into a vector of 784 features.

2. Hidden Layers:

o Hidden Layer 1: 128 neurons with ReLU activation.

o Hidden Layer 2: 64 neurons with ReLU activation.

3. Output Layer:

o 10 neurons with Softmax activation to output probabilities for each digit.

Workflow:

 Input → Hidden Layer 1 → Hidden Layer 2 → Output


Diagram: Example of Deep Learning Architecture

Advanced Architectures:

1. Convolutional Neural Networks (CNNs):

 Use Case: Image classification, object detection.

 Design:

o Convolutional layers for feature extraction.

o Pooling layers for dimensionality reduction.

o Fully connected layers for classification.

2. Recurrent Neural Networks (RNNs):

 Use Case: Sequential data (e.g., time series, text).

 Design:

o Recurrent connections to handle temporal dependencies.

o Variants: LSTM, GRU for long-term dependencies.

3. Transformers:

 Use Case: NLP and multimodal tasks.

 Design:

o Attention mechanisms to process entire input sequences simultaneously.

o Example: GPT, BERT.

Architecture Design Principles:

1. Task-Specific Design:

o Choose the architecture based on the problem type (e.g., CNNs for images,
RNNs for sequences).
2. Depth and Complexity:

o Deeper networks can learn complex features but require more data and
computation.

3. Regularization:

o Techniques like dropout and batch normalization prevent overfitting.

4. Hyperparameter Tuning:

o Optimize the number of layers, neurons, learning rate, etc., for better
performance.

Summary

The architecture design of a deep learning model is critical to its success. By tailoring the
number of layers, activation functions, and connections, deep learning models can handle
tasks like image classification, speech recognition, and natural language processing with
high accuracy. For example, CNNs excel in image tasks, while transformers dominate NLP.

BACKPROPAGATION AND OTHER DIFFERENTIATION ALGORITHMS


Backpropagation is the cornerstone of training deep learning models. It enables the
efficient computation of gradients of the loss function with respect to the model's
parameters, which are then updated using optimization algorithms. In addition to
backpropagation, other differentiation algorithms, like symbolic and automatic
differentiation, are used in specific scenarios.

1. Backpropagation

What is Backpropagation?

Backpropagation is a gradient-based optimization algorithm that computes the gradient


of the loss function with respect to the weights of a neural network. It uses the chain rule
of calculus to propagate errors backward from the output layer to the input layer.

Steps of Backpropagation:

1. Forward Pass:

o Input data flows through the network to compute the predictions.

o The loss function calculates the error between the predictions and actual
outputs.

2. Backward Pass (Gradient Calculation):

o Compute the gradient of the loss with respect to each parameter in the
network, layer by layer, starting from the output layer and moving backward.

3. Parameter Update:
Key Advantages:

 Efficiency: Computes gradients for all parameters simultaneously using the chain
rule.

 Scalability: Works well for networks with many layers.

Challenges:

 Vanishing/Exploding Gradients: Gradients can become too small or too large in


very deep networks.

 Requires Differentiability: Activation functions and loss functions must be


differentiable.
2. Other Differentiation Algorithms

a. Symbolic Differentiation

 Computes exact derivatives symbolically using algebraic rules.

 Example: In symbolic computation libraries like SymPy.

 Advantages:

o Provides exact gradients.

 Disadvantages:

o Computationally expensive for large models (expression blow-up).

b. Numerical Differentiation

c. Automatic Differentiation (AutoDiff)

 Combines the efficiency of symbolic differentiation and numerical accuracy.

 Breaks complex functions into a sequence of elementary operations and applies the
chain rule automatically.

 Used in deep learning frameworks like TensorFlow and PyTorch.

Comparison of Differentiation Algorithms

Algorithm Accuracy Efficiency Use Case

Backpropagation High High Training deep neural networks.

Symbolic Analytical gradients for small


Exact Low
Differentiation problems.

Numerical
Approximate Low Debugging or verifying gradients.
Differentiation
Algorithm Accuracy Efficiency Use Case

Automatic Deep learning frameworks and large


Exact High
Differentiation models.

Example: Backpropagation in a Simple Neural Network

Network Structure:

Diagram: Backpropagation Process

Summary

 Backpropagation is the backbone of training neural networks, efficiently computing


gradients for optimization.
 Other differentiation algorithms like numerical differentiation and automatic
differentiation complement backpropagation in debugging and implementation.

 Frameworks like TensorFlow and PyTorch heavily rely on automatic


differentiation to streamline the process of computing gradients.

You might also like