NN DL Unit - III
NN DL Unit - III
1. Multi-layered Networks:
o Deep learning models consist of multiple layers of neurons that process data
hierarchically, extracting features from simple to complex levels.
2. Representation Learning:
3. Large-scale Data:
o Deep learning thrives on massive datasets and can identify patterns that are
difficult for humans or traditional algorithms to spot.
o It leverages modern hardware like GPUs and TPUs to process data efficiently.
2. Hidden Layers: These intermediate layers learn to extract features from data.
1. Computer Vision:
3. Speech Recognition:
o Voice assistants like Alexa, Siri.
4. Healthcare:
Popular Architectures:
3. Transformers:
Key Benefits:
Deep learning has revolutionized AI, enabling machines to outperform humans in certain
tasks like image recognition and language understanding.
1958: Perceptron
o Development: Frank Rosenblatt developed the Perceptron, the first
algorithm to learn weights using input-output examples.
1980: Neocognitron
o Example: LeNet-5, an early CNN, was developed for digit recognition (e.g.,
reading zip codes on mail).
Deep learning gained traction with breakthroughs in computation, algorithms, and data
availability.
2012: AlexNet
o Significance: Showed the power of GPUs for large-scale neural networks and
sparked widespread interest in deep learning.
Deep learning became the dominant paradigm for AI research, thanks to abundant data,
scalable architectures, and hardware.
o Impact: CNNs dominated computer vision tasks like object detection (e.g.,
YOLO, Mask R-CNN).
o Impact: Models like BERT and GPT dominated NLP tasks like text generation,
summarization, and translation.
4. Reinforcement Learning:
5. Generative Models:
o 2014: Generative Adversarial Networks (GANs):
o 2021: DALL-E:
1. Self-supervised Learning:
2. Efficient Models:
3. Multimodal AI:
Summary
1. Layered Structure:
2. Feed-Forward:
3. Fully Connected:
o Each neuron in one layer is connected to every neuron in the next layer.
4. Activation Functions:
o Introduce non-linearity (e.g., ReLU, Sigmoid, Tanh) to help the network learn
complex patterns.
Mathematical Representation:
1. Input:
o 28×28 grayscale images of digits, flattened into a vector of 784 features.
2. Architecture:
o Output Layer: 10 neurons (one for each digit, 0–9), with Softmax activation
for probabilities.
3. Flow:
Diagram:
Here’s a simple illustration of a deep feed-forward network with 1 input layer, 2 hidden
layers, and 1 output layer:
Advantages:
Limitations:
Deep feed-forward networks are foundational in deep learning, serving as a basis for more
advanced architectures like Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs).
GRADIENT-BASED LEARNING
Gradient-based learning is a core concept in deep learning, where a model learns by
optimizing its parameters (weights and biases) through an iterative process that minimizes
a loss function. The gradient represents the direction and rate of change of the loss with
respect to the model's parameters, guiding the optimization process.
1. Forward Pass:
o Input data is passed through the network to compute the output (predictions).
o A loss function evaluates the difference between the predicted output and
the actual target.
o The gradient of the loss function with respect to each parameter is computed
using the chain rule of calculus.
3. Parameter Update:
1. Gradient Descent:
3. Advanced Methods:
o Adam: Combines momentum and adaptive learning rates.
Advantages:
Limitations:
Local Minima: Gradient descent may converge to a local minimum instead of the
global minimum.
Gradient-based learning is the backbone of most modern machine learning models and
has enabled significant advancements in AI applications like image recognition, natural
language processing, and more.
Each hidden unit performs a computation based on the weighted sum of its inputs,
followed by an activation function. These units are critical for capturing non-linear
relationships in the data.
1. Feature Extraction:
o Hidden units extract intermediate features from the input data, transforming
raw information into useful representations.
2. Non-linearity:
o By applying activation functions (e.g., ReLU, Sigmoid), hidden units allow the
network to learn non-linear relationships in the data.
3. Learning Complexity:
o The number of hidden units and layers determines the model's capacity to
learn complex patterns.
1. Input Layer:
2. Hidden Layer:
o Contains hidden units that combine input features using weights and biases,
applying an activation function to capture patterns like "edges" or "textures."
3. Output Layer:
In this diagram:
Real-World Example:
Image Recognition:
Hidden units are the building blocks of a neural network, enabling it to learn complex,
non-linear relationships in data.
2. Convolutional Units:
3. Recurrent Units:
Key Points:
Number of Hidden Units: Affects model complexity and performance. Too few units
may underfit, while too many may overfit.
Deep Learning: Multiple layers of hidden units enable deep learning models to learn
hierarchical representations.
Hidden units are the heart of neural networks, allowing them to process and transform
data into meaningful insights for solving complex problems.
2. Hidden Layers:
o Consist of multiple layers of neurons that learn patterns and features from
data.
o Each layer can have varying numbers of neurons and activation functions.
3. Output Layer:
o Number of neurons depends on the task (e.g., 10 for digit classification, 1 for
regression).
4. Activation Functions:
5. Loss Function:
o Measures the difference between predictions and true labels (e.g., cross-
entropy for classification, MSE for regression).
6. Optimization Algorithm:
Problem:
Architecture:
1. Input Layer:
2. Hidden Layers:
3. Output Layer:
Workflow:
Advanced Architectures:
Design:
Design:
3. Transformers:
Design:
1. Task-Specific Design:
o Choose the architecture based on the problem type (e.g., CNNs for images,
RNNs for sequences).
2. Depth and Complexity:
o Deeper networks can learn complex features but require more data and
computation.
3. Regularization:
4. Hyperparameter Tuning:
o Optimize the number of layers, neurons, learning rate, etc., for better
performance.
Summary
The architecture design of a deep learning model is critical to its success. By tailoring the
number of layers, activation functions, and connections, deep learning models can handle
tasks like image classification, speech recognition, and natural language processing with
high accuracy. For example, CNNs excel in image tasks, while transformers dominate NLP.
1. Backpropagation
What is Backpropagation?
Steps of Backpropagation:
1. Forward Pass:
o The loss function calculates the error between the predictions and actual
outputs.
o Compute the gradient of the loss with respect to each parameter in the
network, layer by layer, starting from the output layer and moving backward.
3. Parameter Update:
Key Advantages:
Efficiency: Computes gradients for all parameters simultaneously using the chain
rule.
Challenges:
a. Symbolic Differentiation
Advantages:
Disadvantages:
b. Numerical Differentiation
Breaks complex functions into a sequence of elementary operations and applies the
chain rule automatically.
Numerical
Approximate Low Debugging or verifying gradients.
Differentiation
Algorithm Accuracy Efficiency Use Case
Network Structure:
Summary