0% found this document useful (0 votes)

22 views12 pages

Unit 3

Easy notes

Uploaded by

maneshareddyaare6300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views12 pages

Unit 3

Easy notes

Uploaded by

maneshareddyaare6300

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

DEPARTMENT OF INFORMATION TECHNOLOGY

ACADEMIC YEAR: 2021-2025

Unit-3

Deep FeedForward Networks or FeedForward Neural Networks or Multilayer perceptron

(MLP).

A deep neural network is a neural network with atleast two hidden layers. Deep neural networks use
sophisticated mathematical modeling to process data in different ways.Traditional machine learning
algorithms are linear, deep learning algorithms are stacked in a hierarchy.

Fig. Deep Feedforward Network

Deep learning creates many layers of neurons, attempting to learn structured representation, layer by
layer

The goal of a feedforward network is to approximate some function f ∗(X).

For example,for a classifier,
y = f ∗(x) maps an input x to a category y.

A feedforward network defines a mapping y = f (x:θ) and learns the value of the parameters θ that result in
the best function approximation.
These models are called feedforward because information flows through the function being evaluated from
x, through the intermediate computations
used to define f, and finally to the output y. There are no feedback connections in which outputs of the
model are fed back into itself.
When feedforward neural networks are extended to include feedback connections, they are called recurrent
neural networks.
Feedforward networks are of extreme importance to machine learning practitioners.They form the basis of
many important commercial applications. Forexample, the convolutional networks used for object
recognition from photos are aspecialized kind of feedforward network.
Feedforward neural networks are called networks because they are typically
represented by composing together many different functions. The model is associated with a directed
acyclic graph describing how the functions are composed together.

For example:
we might have three functions f (1), f (2), and f (3) connected in a chain, to form f(x) = f(3)(f (2)(f(1) (x ))).
This chain structure is most commonly used structure of neural networks. In this case, f (1) is called the first
layer of the network called input layer used to feed the input into the network; f (2) is called the second
layer called hidden layer used to train the neural network, and so on. The final layer of a feedforward
network is called the output layer that provides the output of the network. The overall length of the chain
gives the depth of the model and width of the model is number of neurons in the input layer. It is from this
terminology that the name “deep learning” arises.

Explain briefly about Gradient Descent Algorithm.

A deep learning neural network learns to map a set of inputs to a set of outputs from training data. We
cannot calculate the perfect weights for a neural network.
Gradient descent is an iterative optimization algorithm for finding the minimum of a function.
To find the minimum of a function using gradient descent, one takes steps proportional to the negative of
the gradient of the function at the current point.
The “gradient” in gradient descent refers to an error gradient. The model with a given set of weights is used
to make predictions and the error for those predictions is calculated.

Fig. Gradient Descent

The gradient is given by the slope of the tangent at w = 0.2, and then the magnitude of the step is controlled
by a parameter called the learning rate. The larger the learning rate, the bigger the step we take, and the
smaller the learning rate, the smaller the step we take. Then we take the step and we move to w1.
Now when choosing the learning rate, we have to be very careful as a large learning rate can lead to big
steps and eventually missing the minimum.
On the other hand, a small learning rate can result in very small steps and therefore causing the algorithm to
take a long time to find the minimum point.
n deep learning, hidden units (or hidden neurons) are the computational nodes in the hidden layers of a
neural network that process and transform input data into more abstract representations before it reaches the
output layer. The number and arrangement of these hidden units significantly impact the network's learning
capacity, efficiency, and performance.

Here’s a closer look at hidden units, their roles, and considerations:

1. Role of Hidden Units

 Hidden units detect patterns and features in the input data that are not directly visible at the output.
Each unit applies a weight to the inputs it receives, sums them, and then applies an activation
function to introduce non-linearity.
 By stacking multiple hidden layers with many hidden units, a network can learn increasingly
abstract representations. For instance, in image classification, earlier layers may detect edges, while
deeper layers can detect objects or entire scenes.

2. Activation Functions and Hidden Units

 Purpose of Activation Functions: Activation functions (e.g., ReLU, sigmoid, tanh) are applied to
each hidden unit’s output, allowing the network to model complex, non-linear relationships.
 Common Activation Functions:

 ReLU (Rectified Linear Unit): Sets negative values to zero and is commonly used in
hidden layers because it reduces issues like vanishing gradients and speeds up training.
 Sigmoid and Tanh: Earlier-used functions that squash the outputs between fixed ranges (0
to 1 for sigmoid, -1 to 1 for tanh). However, they can suffer from vanishing gradients in
deeper networks.
 Leaky ReLU, Swish, GELU: Variants that avoid issues like the “dying ReLU” problem
and improve gradient flow through layers.

3. Determining the Number of Hidden Units

 More Units Increases Capacity: Increasing hidden units allows the network to capture more
complex patterns but also increases the risk of overfitting, where the network learns specific noise
in the training data rather than general patterns.
 Too Few Units Limits Learning: With insufficient hidden units, the network may lack the capacity
to capture intricate data structures, leading to underfitting and lower accuracy.
 Rule of Thumb for Selection: Selecting the optimal number of hidden units often involves
experimentation, typically starting with a manageable number and tuning based on performance.
Cross-validation and regularization techniques can help guide the selection.

4. Hidden Layers vs. Hidden Units

 The term hidden layers refers to the layers themselves, while hidden units refer to the neurons
within each layer.
 Deeper Networks (more hidden layers) can learn hierarchical representations and are ideal for
tasks where different levels of abstraction are needed, like image and language processing.
 Wider Networks (more hidden units per layer) can be beneficial for tasks requiring complex
pattern recognition within a single level of abstraction.

5. Distributed Representations

 In networks with many hidden units, the hidden layers often develop distributed representations,
where the features learned are represented as combinations of the outputs of multiple units.
 These representations are more flexible t han single-feature representations, allowing networks to
generalize better across varied inputs.
6. Dropout in Hidden Units
 Purpose: Dropout is a regularization technique where random hidden units are "dropped out" (set
to zero) during training, forcing the network to learn robust patterns that don’t rely on specific units.
 Application: By randomly deactivating hidden units, dropout helps prevent overfitting, particularly
in deep networks.

7. Examples of Hidden Unit Configurations

 Shallow Networks: Few hidden layers and hidden units, effective for simple tasks (e.g., basic
classification problems) but limited in handling complex data.
 Deep Networks with Many Hidden Units: Often used in tasks like image recognition (e.g.,
ResNet, Inception), speech processing, and NLP (e.g., Transformers), where the data requires both
depth (for abstraction) and width (for complexity).

8. Hidden Units in Specialized Architectures

 CNNs: Hidden units are organized into convolutional and pooling layers, where each unit in the
convolutional layer acts on a localized region of the input. These units focus on spatial hierarchies,
detecting features like edges, shapes, and patterns.
 RNNs: Hidden units in RNNs are recurrent, meaning they process sequences and maintain memory
of previous inputs. Each unit’s output is influenced by both the current input and the previous unit
state, allowing them to capture temporal dependencies.
 Transformers: Hidden units are part of self-attention mechanisms, where each unit attends to other
tokens to capture long-range dependencies. Transformers tend to have wider layers and more
hidden units due to the computational efficiency of parallel processing.

9. Challenges and Trade-Offs in Using Many Hidden Units

 Computational Cost: Increasing hidden units raises the number of parameters and computational
requirements. In very deep networks, this can lead to longer training times and higher memory
usage.
 Overfitting and Generalization: More hidden units increase the risk of overfitting. Techniques
like dropout, L2 regularization, and early stopping are essential in balancing model complexity and
generalization.
 Vanishing/Exploding Gradients: In very deep networks with many hidden units, gradients may
become very small or large as they pass through layers, slowing down or halting training. Solutions
include better initialization, normalization techniques (e.g., batch normalization), and alternative
architectures like residual connections (e.g., ResNet).

Hidden units are a critical component of deep learning architectures. Their configuration should be carefully
chosen to balance the network’s capacity to learn complex patterns with the need for generalization and
computational efficiency.

DEEP LEARNING ARCHITECTURE DESIGN

The architecture design of deep learning models involves creating and organizing the structure of layers,
neurons, and connections to best solve a specific task. This design is critical in determining the network’s
effectiveness, efficiency, and ability to generalize across data. Here's an overview of key architectural types
and design principles:
1. Basic Feedforward Neural Networks (FNN)

 Description: FNNs are the simplest type of neural network, with information flowing in one
direction, from the input to the output through a series of hidden layers.
 Structure:

 Input Layer: Receives data inputs.

 Hidden Layers: Capture and transform patterns. Each neuron in a layer is connected to all
neurons in the next layer, known as fully connected or dense layers.
 Output Layer: Produces the final prediction or classification.

 Use Cases: Suitable for simple classification and regression tasks on structured data.

2. Convolutional Neural Networks (CNN)

 Purpose: Designed specifically for tasks with spatially correlated data, like images, CNNs efficiently capture
hierarchical patterns (e.g., edges, textures, shapes) using local connections and shared weights.
 Key Components:

1.Convolutional Layers: Use filters to detect local patterns. Each filter slides over the input, extracting
spatial features, which allows for translation invariance.

2.Pooling Layers: Reduce spatial dimensions by selecting the maximum (max pooling) or average
(average pooling) values, preserving important features while lowering computational costs.

3.Fully Connected Layers: Often added at the end to integrate the features learned by convolutional
layers.

 Popular Architectures:

1. AlexNet: Popularized deep CNNs for image classification with multiple convolutional layers.

2.VGG: Increased depth with small filters for improved accuracy.

3.ResNet: Introduced residual connections to allow for very deep networks by mitigating the vanishing
gradient problem.

4.Inception: Used multi-scale filters to capture patterns at different resolutions.

 Applications: Image classification, object detection, image segmentation, medical imaging.

3. Recurrent Neural Networks (RNN)

 Purpose: Suited for sequential data, such as time series, audio, and text, RNNs can learn
dependencies in data over time by retaining information from previous steps.
 Key Components:

 Recurrent Layers: Enable each neuron to connect back to itself, allowing information to
persist across steps.
 Memory Cells (LSTM and GRU): LSTMs (Long Short-Term Memory) and GRUs (Gated
Recurrent Units) improve traditional RNNs by adding gates that control the flow of
information, enabling the network to retain or forget information selectively.

 Limitations: RNNs can struggle with very long sequences due to gradient issues.
 Applications: Text generation, language translation, time series forecasting, speech recognition.

4. Transformers

 Purpose: Originally developed for natural language processing, Transformers are now used across
tasks, including image and multi-modal tasks, due to their efficiency and scalability.
 Key Components:

 Self-Attention Mechanism: Each token in the sequence can focus on relevant parts of the
entire sequence, capturing dependencies over long distances without recurrence.
 Multi-Head Attention: Provides multiple attention layers to capture different types of
relationships within the data.
 Positional Encoding: Adds information about the order of tokens in the sequence, as
Transformers lack inherent sequence-processing structures.

 Popular Models:

 BERT (Bidirectional Encoder Representations from Transformers): Known for

understanding context in both directions, often used in NLP tasks.
 GPT (Generative Pre-trained Transformer): Used for language generation, learning to
predict the next token.
 Vision Transformers (ViT): Adapted to process image data by treating patches as tokens,
opening up Transformer applications in vision.

 Applications: Language modeling, text summarization, translation, image classification, multi-

modal tasks.

5. Autoencoders

 Purpose: Used for unsupervised learning, Autoencoders learn compressed representations of data,
useful for tasks like dimensionality reduction and anomaly detection.
 Key Components:

 Encoder: Maps the input data to a lower-dimensional representation (latent space).

 Decoder: Reconstructs the original data from the latent space.

 Variants:

 Denoising Autoencoders: Learn to reconstruct data by removing noise.

 Variational Autoencoders (VAEs): Incorporate a probabilistic element, allowing the
model to generate new data similar to the input data.

 Applications: Data compression, feature extraction, anomaly detection, image generation.

6. Generative Adversarial Networks (GANs)

 Purpose: GANs are used to generate realistic data samples by learning the underlying data
distribution through adversarial training.
 Key Components:

 Generator: Generates data samples from random noise.

 Discriminator: Classifies data as real or fake. The generator tries to fool the discriminator,
and the discriminator learns to distinguish real from fake data.
 Popular Variants:

o DCGAN (Deep Convolutional GAN): Introduced convolutional layers to improve image

quality.
o CycleGAN: Capable of translating images from one domain to another (e.g., transforming
paintings to photographs).
 Applications: Image and video generation, data augmentation, style transfer.

7. Graph Neural Networks (GNNs)

 Purpose: Designed for data represented as graphs, like social networks, molecular structures, or knowledge
graphs.
 Key Components:

 Graph Convolutional Layers: Extend convolutional operations to graph structures, enabling the
network to learn based on relationships among nodes.
 Message Passing Mechanism: Allows each node to aggregate information from its neighbors,
capturing relationships and dependencies.

 Applications: Social network analysis, molecular property prediction, recommendation systems,

knowledge graph completion.

8. Hybrid and Multi-Modal Networks

 Purpose: Combine different architectures to handle complex or multi-modal data.

 Examples:

 CNN-RNN Hybrids: Useful for video analysis and image captioning, where a CNN
extracts spatial features, and an RNN handles sequential data.
 Multi-Modal Transformers: Recently popular for models that work with multiple data
types (e.g., text and images), like OpenAI’s CLIP, which learns joint text-image
embeddings.

 Applications: Tasks that require both spatial and sequential information, such as video processing,
robotics, and multi-modal learning.

9. Key Design Principles in Deep Learning Architectures

 Layer Depth: Deeper networks can capture more complex patterns, but they also risk vanishing or
exploding gradients. Techniques like residual connections (ResNet) or batch normalization mitigate
this risk.
 Layer Width: Wider networks (more neurons per layer) can capture a greater diversity of features,
but they come with higher computational costs and increased risk of overfitting.
 Regularization: Techniques like dropout, L2 regularization, and batch normalization help prevent
overfitting, making the model more robust.
 Residual and Skip Connections: By bypassing one or more layers, residual connections help
preserve gradient flow, allowing for much deeper networks.
 Attention Mechanisms: Attention is now a key component for many architectures, allowing
models to focus on relevant parts of the input, especially beneficial for sequence-based tasks.
 Normalization Techniques: Batch normalization and layer normalization help stabilize and speed
up training by normalizing inputs to layers.
10. Hyperparameter Tuning

 Learning Rate: Controls the step size in gradient descent. Finding the right learning rate is critical,
as too high a value can cause the model to overshoot minima, and too low a value can slow down
training.
 Batch Size: Larger batches make training more stable but require more memory, while smaller
batches can lead to faster convergence but noisier updates.
 Early Stopping: Stops training when the model’s performance on a validation set stops improving,
reducing the risk of overfitting.

Summary of Design Choices by Task

 Image Data: CNNs, with architectures like ResNet and Inception.

 Sequential Data: RNNs and Transformers (e.g., BERT, GPT).
 Structured/Tabular Data: Feedforward networks or gradient boosting models.
 Graph Data: GNNs, with architectures like Graph Convolutional Networks (GCNs).
 Generative Tasks: GANs and Autoencoders.

The architecture design in deep learning is highly task-specific and requires careful balancing of layer depth,
width, and regularization to optimize model performance while avoiding overfitting and maintaining
computational efficiency.

Backpropagation and other differential algorithms are central to how deep learning models learn by
adjusting their weights to minimize error. These methods involve calculating gradients to update parameters
efficiently during training. Here’s an overview of backpropagation, followed by a look at related
optimization techniques and advancements in gradient-based learning algorithms.

1. BACKPROPAGATION

 Purpose: Backpropagation (short for "backward propagation of errors") is a method to compute the
gradient of the loss function with respect to each weight in the network. It enables efficient training
of deep neural networks by propagating error backward through the layers.
 Process:

1.Forward Pass: Input data passes through the network to compute the output and loss (error).

2.Backward Pass: The network calculates gradients of the loss with respect to each parameter
(weight) using the chain rule of calculus. These gradients show how each weight affects the
loss.

3.Weight Update: Using the computed gradients, each weight is updated in the opposite
direction of the gradient (usually scaled by a learning rate) to minimize the loss.

 Challenges in Backpropagation:

1.Vanishing/Exploding Gradients: In deep networks, gradients can become very small

(vanish) or very large (explode) as they propagate back through layers, hindering learning.

2.Local Minima and Saddle Points: Backpropagation may get stuck in local minima or saddle
points, slowing down convergence.
 Solutions:

1.Activation Functions: Functions like ReLU (Rectified Linear Unit) help mitigate vanishing
gradients by maintaining larger gradients for positive inputs.

2.Batch Normalization: Normalizes inputs to each layer to stabilize and speed up training.

3.Residual Connections: Enable gradients to flow through the network more directly, making
very deep networks like ResNet possible.

2. Optimization Algorithms for Gradient Descent

 After computing gradients through backpropagation, the next step is to update weights effectively.
Several variations of gradient descent help address issues like slow convergence, noise, and
overfitting.

Gradient Descent Variants

 Stochastic Gradient Descent (SGD): Updates weights based on a single sample per iteration. It
introduces randomness, which can help avoid local minima but may be noisy.
 Mini-Batch Gradient Descent: Updates weights using a subset (mini-batch) of the data. It strikes a
balance between the efficiency of batch gradient descent and the noise-reducing benefit of
averaging multiple samples.
 Batch Gradient Descent: Uses the entire dataset to compute gradients before each update, which is
stable but computationally intensive and slow for large datasets.

Advanced Optimization Algorithms

 Momentum:

 Description: Accelerates gradient descent by adding a fraction of the previous update to the
current update, helping to smooth the trajectory and speed up convergence, especially in
areas with small gradients.
 Formula: vt=γvt−1+η∗L(w)v_t = \gamma v_{t-1} + \eta \nabla L(w)vt=γvt−1+η∗L(w),
where vtv_tvt is the update direction, γ\gammaγ is the momentum coefficient, and η\etaη is
the learning rate.

 Nesterov Accelerated Gradient (NAG):

 Description: Improves upon momentum by looking ahead, calculating gradients not at the
current position but a position moved in the direction of the previous momentum.
 Advantage: Helps avoid overshooting by incorporating a form of anticipation into the
updates.

 Adagrad (Adaptive Gradient):

 Description: Adapts the learning rate based on the magnitude of gradients in each
dimension, giving each parameter its own learning rate that decreases as it accumulates
more updates.
 Use Case: Works well for sparse data and cases where features have different frequencies.

 RMSProp:
 Description: An improvement on Adagrad, RMSProp also scales learning rates for each
parameter but with a moving average of past gradients. This approach prevents learning
rates from decaying too fast, as in Adagrad.
 Application: Often used in RNNs and other networks with large amounts of sequential
data.

 Adam (Adaptive Moment Estimation):

 Description: Combines momentum and RMSProp, adjusting learning rates based on both
first (momentum) and second (variance) moments of the gradient.
 Popular Parameters: Typically, β1=0.9\beta_1 = 0.9β1=0.9 (for momentum) and
β2=0.999\beta_2 = 0.999β2=0.999 (for variance).
 Advantage: Adapts well to both sparse and dense data, making it widely popular for a
variety of deep learning applications.

 AdamW:

 Description: A variant of Adam that decouples weight decay from the gradient update,
resulting in better generalization by more precisely regulating regularization during weight
updates.

 AdaMax and Nadam:

 AdaMax: An extension of Adam using infinity-norm, making it more robust in some cases.
 Nadam: Combines Adam with Nesterov momentum, often improving convergence rates.

3. Gradient-Free Optimization Techniques

While gradient-based methods are most common, gradient-free techniques are useful when
calculating gradients is computationally infeasible or when the optimization surface is non-
differentiable.

Evolutionary Algorithms:

 Description: Inspired by biological evolution, these algorithms evolve populations of

solutions over time, using operations like mutation and crossover.
 Applications: Often used in neuroevolution, where the network architecture itself is
optimized along with parameters.

Bayesian Optimization:

 Description: Models the function to be optimized as a probabilistic distribution and uses

past evaluations to determine the most promising next point.
 Use Case: Typically applied in hyperparameter tuning, especially for complex models
where traditional methods are too slow.

Simulated Annealing:

 Description: A probabilistic technique that explores the optimization space, allowing

occasional jumps to worse solutions to escape local minima.
 Application: Useful for optimization surfaces with many local minima and non-
differentiable areas.
4. Advanced Backpropagation Techniques
 Layer-wise Adaptive Rate Scaling (LARS):

 Purpose: Increases learning rates for large batch sizes, adjusting rates based on each layer’s
magnitude, often used in large-scale training.

 Learning Rate Scheduling:

 Fixed Schedules: Decrease learning rate at fixed intervals (e.g., every few epochs).
 Adaptive Schedules: Reduce learning rate when performance plateaus, such as the
ReduceLROnPlateau in Keras.
 Cosine Annealing: Adjusts the learning rate in a periodic, cosine-shaped curve, allowing
for more aggressive exploration in early stages.

 Gradient Clipping:

 Purpose: Clips gradients to a maximum value to prevent exploding gradients, especially in

RNNs.
 Application: Useful in deep networks and recurrent architectures.

5. Second-Order Optimization Methods

 Newton’s Method and Quasi-Newton Methods:

o Concept: Use second-order derivatives (the Hessian) to better approximate the curvature of
the loss surface, allowing for more accurate steps towards the minima.
o Limitation: Calculating the Hessian is computationally intensive, making these methods
impractical for large networks.
 Limited-memory BFGS (L-BFGS):

 Description: A quasi-Newton method that approximates the Hessian matrix efficiently,

suitable for smaller, dense networks or when training in batches.
 Applications: Rarely used in very deep models but can be beneficial in smaller, dense
networks or reinforcement learning.

6. Auto-Differentiation in Deep Learning

 Automatic Differentiation:

 Description: A key feature of modern deep learning frameworks like TensorFlow and
PyTorch. It automatically computes the gradients needed for backpropagation by creating a
computational graph and applying the chain rule efficiently.
 Reverse Mode (Backpropagation): Used in deep learning to efficiently compute gradients
for large models by moving from the output back to the input.

 Symbolic vs. Numeric Differentiation:

 Symbolic Differentiation: Calculates gradients exactly using symbolic expressions (e.g.,

algebraic manipulation).
 Numeric Differentiation: Approximates gradients by evaluating finite differences but is
computationally inefficient for large networks.
Summary of Differential Algorithms

 Backpropagation: The fundamental gradient-based method used in almost all deep learning
models.
 Gradient Descent Variants (SGD, Momentum, Adam, RMSProp, etc.): Provide efficient ways
to update parameters by adjusting learning rates and applying momentum.
 Gradient-Free Techniques: Used in cases where differentiability isn’t guaranteed or is too
computationally costly, including evolutionary algorithms and Bayesian optimization.
 Advanced Techniques: Learning rate scheduling, gradient clipping, and second-order methods
refine the optimization process, improving convergence and stability for specific tasks and network
architectures.

In summary, back-propagation is the backbone of gradient-based learning in deep neural networks, while a
variety of optimization techniques further refine and adapt weight updates, allowing deep learning models
to converge faster and generalize better across different applications.

Unit II
No ratings yet
Unit II
56 pages
Case Study Henkel
50% (2)
Case Study Henkel
21 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
Ai Specialist Salesforce Exam
No ratings yet
Ai Specialist Salesforce Exam
24 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Finance Playbook
100% (1)
Finance Playbook
97 pages
The Deep Neural Network-A Review
No ratings yet
The Deep Neural Network-A Review
5 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
3 pages
Unit 1 DL
No ratings yet
Unit 1 DL
18 pages
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
No ratings yet
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
4 pages
Unit 3 Self Made
No ratings yet
Unit 3 Self Made
23 pages
Machine Learning-Lecture 16 (Student)
No ratings yet
Machine Learning-Lecture 16 (Student)
10 pages
CS217 2024 Lec11
No ratings yet
CS217 2024 Lec11
7 pages
Unit V
No ratings yet
Unit V
9 pages
Multilayer - Feedforward - Network - Activation Functions
No ratings yet
Multilayer - Feedforward - Network - Activation Functions
9 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
2K21 - Ee - 192 MLP
No ratings yet
2K21 - Ee - 192 MLP
59 pages
A Survey of Randomized Algorithms For Training Neural Networks
No ratings yet
A Survey of Randomized Algorithms For Training Neural Networks
10 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Unit 3
No ratings yet
Unit 3
8 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Module 2
No ratings yet
Module 2
44 pages
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
No ratings yet
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
14 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Neural Network
No ratings yet
Neural Network
7 pages
Unit I
No ratings yet
Unit I
90 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
21CS002Notes 8
No ratings yet
21CS002Notes 8
13 pages
DL 2
No ratings yet
DL 2
62 pages
Neural Net Notes
No ratings yet
Neural Net Notes
7 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Neural Network Basics 2.1 Neurons or Nodes and Layers
No ratings yet
Neural Network Basics 2.1 Neurons or Nodes and Layers
9 pages
Unit2 3 Notes
No ratings yet
Unit2 3 Notes
34 pages
Unit 4
No ratings yet
Unit 4
19 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Chapter 3 - ANN Updated
No ratings yet
Chapter 3 - ANN Updated
140 pages
Unit 3 - IntrotoNN
No ratings yet
Unit 3 - IntrotoNN
4 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
47 pages
Question 105A
No ratings yet
Question 105A
33 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
MLP Reading
No ratings yet
MLP Reading
23 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
29 pages
Unit 12
No ratings yet
Unit 12
28 pages
The Structure of Style
No ratings yet
The Structure of Style
358 pages
Utilization of ERP Systems in Manufacturing Industry For Productivity
No ratings yet
Utilization of ERP Systems in Manufacturing Industry For Productivity
8 pages
Session 1
No ratings yet
Session 1
8 pages
Neuromorphic Computing Full Report
No ratings yet
Neuromorphic Computing Full Report
12 pages
论文或学位论文
100% (1)
论文或学位论文
6 pages
Context Engineering in Artificial Intelligence - The Next Frontier Beyond Prompting
No ratings yet
Context Engineering in Artificial Intelligence - The Next Frontier Beyond Prompting
3 pages
Akka Infoq Agentic Ai Design Patterns
No ratings yet
Akka Infoq Agentic Ai Design Patterns
33 pages
Gender Detection by Voice Using Deep Learning
No ratings yet
Gender Detection by Voice Using Deep Learning
5 pages
India's IT Industry, AI Impact
No ratings yet
India's IT Industry, AI Impact
15 pages
Introduction To Social Media - Gamma
No ratings yet
Introduction To Social Media - Gamma
9 pages
Autoencoder Architecture
No ratings yet
Autoencoder Architecture
16 pages
AI Exam
No ratings yet
AI Exam
45 pages
NLP Unit 6
No ratings yet
NLP Unit 6
16 pages
Research Design
No ratings yet
Research Design
2 pages
Eurofound - Ethical Digitalisation at Work
No ratings yet
Eurofound - Ethical Digitalisation at Work
68 pages
Digital Technologies and Their Impact On Society and Governance
No ratings yet
Digital Technologies and Their Impact On Society and Governance
16 pages
Gourav Resume
No ratings yet
Gourav Resume
2 pages
Future of Indian Banking The Road Ahead 1 10
No ratings yet
Future of Indian Banking The Road Ahead 1 10
10 pages
Mirdasm Report File
No ratings yet
Mirdasm Report File
101 pages
Future-Of-Digital-Banking-In-2030-Cba - PD (3) .PDF - 20241217 - 002817 - 0000
No ratings yet
Future-Of-Digital-Banking-In-2030-Cba - PD (3) .PDF - 20241217 - 002817 - 0000
16 pages
Ship Happens - Ensou
No ratings yet
Ship Happens - Ensou
232 pages
Human Behavior and Emerging Technologies - 2023 - Xu - Transparency Enhances Positive Perceptions of Social Artificial
No ratings yet
Human Behavior and Emerging Technologies - 2023 - Xu - Transparency Enhances Positive Perceptions of Social Artificial
15 pages
SAR Despeckling Via Log-Yeo-Johnson Transformation and Sparse Representation
No ratings yet
SAR Despeckling Via Log-Yeo-Johnson Transformation and Sparse Representation
5 pages
KDD 2024 ChatGPT Camera Ready
No ratings yet
KDD 2024 ChatGPT Camera Ready
7 pages
New Research Report Maximizing Digital Banking Engagement 1676131939
No ratings yet
New Research Report Maximizing Digital Banking Engagement 1676131939
70 pages
Arg Essay Simple Outline Dang Hoang Anh
No ratings yet
Arg Essay Simple Outline Dang Hoang Anh
3 pages
SmartBots One-Pager
No ratings yet
SmartBots One-Pager
1 page
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet

Unit 3

Uploaded by

Unit 3

Uploaded by

DEPARTMENT OF INFORMATION TECHNOLOGY

ACADEMIC YEAR: 2021-2025

Deep FeedForward Networks or FeedForward Neural Networks or Multilayer perceptron

Fig. Deep Feedforward Network

The goal of a feedforward network is to approximate some function f ∗(X).

Explain briefly about Gradient Descent Algorithm.

Fig. Gradient Descent

Here’s a closer look at hidden units, their roles, and considerations:

2. Activation Functions and Hidden Units

3. Determining the Number of Hidden Units

4. Hidden Layers vs. Hidden Units

7. Examples of Hidden Unit Configurations

8. Hidden Units in Specialized Architectures

9. Challenges and Trade-Offs in Using Many Hidden Units

DEEP LEARNING ARCHITECTURE DESIGN

 Input Layer: Receives data inputs.

2. Convolutional Neural Networks (CNN)

2.VGG: Increased depth with small filters for improved accuracy.

4.Inception: Used multi-scale filters to capture patterns at different resolutions.

 Applications: Image classification, object detection, image segmentation, medical imaging.

3. Recurrent Neural Networks (RNN)

 BERT (Bidirectional Encoder Representations from Transformers): Known for

 Applications: Language modeling, text summarization, translation, image classification, multi-

 Encoder: Maps the input data to a lower-dimensional representation (latent space).

 Denoising Autoencoders: Learn to reconstruct data by removing noise.

 Applications: Data compression, feature extraction, anomaly detection, image generation.

6. Generative Adversarial Networks (GANs)

 Generator: Generates data samples from random noise.

o DCGAN (Deep Convolutional GAN): Introduced convolutional layers to improve image

7. Graph Neural Networks (GNNs)

 Applications: Social network analysis, molecular property prediction, recommendation systems,

8. Hybrid and Multi-Modal Networks

 Purpose: Combine different architectures to handle complex or multi-modal data.

9. Key Design Principles in Deep Learning Architectures

Summary of Design Choices by Task

 Image Data: CNNs, with architectures like ResNet and Inception.

1.Vanishing/Exploding Gradients: In deep networks, gradients can become very small

2. Optimization Algorithms for Gradient Descent

Gradient Descent Variants

Advanced Optimization Algorithms

 Nesterov Accelerated Gradient (NAG):

 Adagrad (Adaptive Gradient):

 Adam (Adaptive Moment Estimation):

 AdaMax and Nadam:

3. Gradient-Free Optimization Techniques

 Description: Inspired by biological evolution, these algorithms evolve populations of

 Description: Models the function to be optimized as a probabilistic distribution and uses

 Description: A probabilistic technique that explores the optimization space, allowing

 Learning Rate Scheduling:

 Purpose: Clips gradients to a maximum value to prevent exploding gradients, especially in

5. Second-Order Optimization Methods

 Newton’s Method and Quasi-Newton Methods:

 Description: A quasi-Newton method that approximates the Hessian matrix efficiently,

6. Auto-Differentiation in Deep Learning

 Symbolic vs. Numeric Differentiation:

 Symbolic Differentiation: Calculates gradients exactly using symbolic expressions (e.g.,

You might also like