0% found this document useful (0 votes)

61 views21 pages

Unit 4 Notes

Uploaded by

Poranki Anusha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views21 pages

Unit 4 Notes

Uploaded by

Poranki Anusha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Attention Mechanisms

Attention mechanisms enhance deep learning models by selectively focusing on important input
elements, improving prediction accuracy and computational efficiency. They prioritize and emphasize
relevant information, acting as a spotlight to enhance overall model performance.

Fundamentally, the attention mechanism is akin to our brain's neurological system, which emphasizes
relevant sounds while filtering out background distractions. In the realm of deep learning, it allows
neural networks to attribute varying levels of importance to different input segments, significantly
boosting their capability to capture essential information. This process is crucial in tasks such as natural
language processing (NLP), where attention aids in aligning relevant parts of a source sentence during
translation or question-answering activities.

Working:

1. Breaking Down the Input: Let’s say you have a bunch of words (or any kind of data) that you
want the computer to understand. First, it breaks down this input into smaller pieces, like
individual words.

2. Picking Out Important Bits: Then, it looks at these pieces and decides which ones are the most
important. It does this by comparing each piece to a question or ‘query’ it has in mind.

3. Assigning Importance: Each piece gets a score based on how well it matches the question. The
higher the score, the more important that piece is.

4. Focusing Attention: After scoring each piece, it figures out how much attention to give to each
one. Pieces with higher scores get more attention, while less important ones get less attention.

5. Putting It All Together: Finally, it adds up all the pieces, but gives more weight to the important
ones. This way, the computer gets a clearer picture of what’s most important in the input.

Attention is an interface connecting the encoder and decoder that provides the decoder with
information from every encoder hidden state. With this framework, the model is able to selectively
focus on valuable parts of the input sequence and hence, learn the association between them. This
helps the model to cope efficiently with long input sentences.

Attention mechanisms in deep learning are used to help the model focus on the most relevant parts of
the input when making a prediction. In many problems, the input data may be very large and complex,
and it can be difficult for the model to process all of it. Attention mechanisms allow the model to
selectively focus on the parts of the input that are most important for making a prediction, and to ignore
the less relevant parts. This can help the model to make more accurate predictions and to run more
efficiently.

Intuition

The below figure demonstrates an Encoder-Decoder architecture with an attention layer.

The idea is to keep the decoder as it is, and we just replace sequential RNN/LSTM with bidirectional
RNN/LSTM in the encoder.

Here, we give attention to some words by considering a window size Tx (say four words x1, x2, x3, and
x4). Using these four words, we’ll create a context vector c1, which is given as input to the decoder.
Similarly, we’ll create a context vector c2 using these four words. Also, we have α1, α2, and α3 as
weights, and the sum of all weights within one window is equal to 1.

Similarly, we create context vectors from different sets of words with different α values.

The attention model computes a set of attention weights denoted by α(t,1), α(t,2),..,α(t,t) because not
all the inputs would be used in generating the corresponding output. The context vector ci for the
output word yi is generated using the weighted sum of the annotations:
The attention weights are calculated by normalizing the output score of a feed-forward neural network
described by the function that captures the alignment between input at j and output at i.

Implementation

Let's take an example where a translator reads the English(input language) sentence while writing down
the keywords from the start till the end, after which it starts translating to Portuguese (the output
language). While translating each English word, it makes use of the keywords it has understood.

Attention places different focus on different words by assigning each word with a score. Then, using the
softmax scores, we aggregate the encoder hidden states using a weighted sum of the encoder hidden
states to get the context vector.

The implementations of an attention layer can be broken down into 4 steps.

Step 0: Prepare hidden states.

First, prepare all the available encoder hidden states (green) and the first decoder hidden state (red). In
our example, we have 4 encoder hidden states and the current decoder hidden state. (Note: the last
consolidated encoder hidden state is fed as input to the first time step of the decoder. The output of this
first time step of the decoder is called the first decoder hidden state.)

Step 1: Obtain a score for every encoder hidden state.

A score (scalar) is obtained by a score function (also known as alignment score function or alignment
model). In this example, the score function is a dot product between the decoder and encoder hidden
states.

Step 2: Run all the scores through a softmax layer.

We put the scores to a softmax layer so that the softmax scores (scalar) add up to 1. These softmax
scores represent the attention distribution.

Step 3: Multiply each encoder hidden state by its softmax score.

By multiplying each encoder hidden state with its softmax score (scalar), we obtain the alignment vector
or the annotation vector. This is exactly the mechanism where alignment takes place.
Step 4: Sum the alignment vectors.

The alignment vectors are summed up to produce the context vector. A context vector is an aggregated
information of the alignment vectors from the previous step.

Step 5: Feed the context vector into the decoder.

Recurrent Models of Visual Attention

Recurrent models of visual attention combine the principles of attention mechanisms and recurrent
neural networks (RNNs) to effectively process and analyze visual data. This approach is particularly
useful in tasks such as image captioning, visual question answering, and video analysis, where the ability
to focus on specific regions of an image or sequence is crucial.

Key Features:

1. Visual Attention:

o Visual attention mimics human visual perception by allowing models to focus on specific
areas of an image while ignoring irrelevant parts. This selective focus helps in extracting
meaningful features and improving performance in various tasks.

2. Recurrent Neural Networks (RNNs):

o RNNs are designed for sequential data, maintaining a hidden state that captures
information from previous time steps. They are particularly effective for tasks where the
input is a sequence, such as text or time series data.

3. Combining Attention and RNNs:

o Integrating attention mechanisms into RNNs allows the model to dynamically adjust its
focus on different parts of the visual input as it generates output sequences, such as
captions or answers to questions.

Architecture of Recurrent Models of Visual Attention

1. Input Representation:

o Images are typically processed using Convolutional Neural Networks (CNNs) to extract
feature maps. These feature maps represent different regions of the image and their
corresponding visual features.

2. Attention Mechanism:

o The attention mechanism computes a set of attention weights that determine the
importance of different regions of the feature map.
o Attention scores can be calculated based on the current hidden state of the RNN and
the feature representations.

3. Context Vector:

o The attended feature representation (context vector) is computed as a weighted sum of

the feature map, where the weights are determined by the attention mechanism.

4. Recurrent Processing:

o The RNN (often an LSTM or GRU) takes the current hidden state and the context vector
as inputs. The hidden state is updated based on the attended features, allowing the
model to maintain context over time.

5. Output Generation:

o The output of the RNN can be used for various tasks, such as generating a sequence of
words (in image captioning) or predicting a class label (in visual question answering).

Attention Mechanism Details

1. Calculating Attention Weights:

o Given the feature map (F) and the previous hidden state (h_{t-1}), the attention weights
can be computed using a scoring function (f): [ e_{ij} = f(h_{t-1}, F_j) ] where (e_{ij}) is
the attention score for the (j)-th region of the feature map at time (t).

2. Softmax Normalization:

o The attention scores are normalized using the softmax function: [ \alpha_{ij} = \frac{\
exp(e_{ij})}{\sum_{k} \exp(e_{ik})} ] where (\alpha_{ij}) represents the attention weight
for the (j)-th region.

3. Weighted Feature Representation:

o The context vector is computed as a weighted sum of the feature map: [ \mathbf{c}t = \
sum{j} \alpha_{ij} F_j ] where (\mathbf{c}_t) is the context vector for the current time
step.

Applications of Recurrent Models of Visual Attention

1. Image Captioning:

o Models can generate descriptive captions for images by focusing on different parts of
the image at each time step of the caption generation process.

2. Visual Question Answering (VQA):

o By attending to relevant regions of an image while answering questions about it, models
can provide more accurate and context-aware answers.

3. Object Detection and Recognition:

o Attention mechanisms can help models focus on specific objects within an image,
improving their ability to detect and classify items.

4. Video Analysis:

o In video processing, recurrent models of visual attention can dynamically focus on

different frames or regions of interest, enhancing understanding of temporal dynamics.

Attention Mechanisms for Machine Translation in Deep Learning

Attention mechanisms have revolutionized machine translation by allowing models to focus on

specific parts of the input sequence when generating translations. This approach enhances the
performance of neural machine translation (NMT) systems, particularly in handling long
sentences and complex structures.

The attention mechanism calculates a weighted sum of the input sequence at each step of the
output sequence. The weights are based on how similar the current output sequence is to the
input sequence.

Neural Machine Translation (NMT)

NMT is a large neural network that is trained in an end to end fashion for translating one
language into another. The figure below is an illustration of NMT with an RNN based encoder-
decoder architecture.
Figure 1 : Neural machine translation as a stacking recurrent architecture for translating a source
sequence A B C D into a target sequence X Y Z. Here <eos> marks the end of a sentence.
NMT directly models the conditional probability p(y/x) of translating a source (x1,x2….xn)
sentence into a target sentence (y1,y2….yn).
NMT consist of two components:
1. An encoder which computes a representation S for each source sentence
2. A decoder which generates translation one word at a time and hence decomposes the
conditional probability as:

A probability of translation y given the source sentence x

One could parametrize the probability of decoding each word y(j) as

where h(j) could be modeled as

RNN hidden unit definition (h)

where
g: a transformative function that outputs a vocabulary size vector
h: RNN hidden unit
f: computes the current hidden state given the previously hidden state.
The training objective for the translation process could be framed as
Loss Function

Generative Adversarial Networks

Training a Generative Adversarial Network

Using GANs for Generating Image Data

GAN is an algorithm that uses two neural networks- Generator G and Discriminator D. The two
networks compete against one another (hence the term ‘adversarial’).

The generator creates synthetic data, while the Discriminator tries to distinguish between the
generated data and real data. This leads to creating highly realistic data that can often pass for
real

GANs are usually trained to generate images from random noises and a GAN has usually two
parts in which it works namely the Generator that generates new samples of images and the
second is a Discriminator that classifies images as real or fake for example we can train a GAN
model to generate digit images that look like hand-written digit images from the MNIST dataset
and apart from this GANs are widely used for voice generation, image generation or video
generation.

o Generator: A generator is a model that is used to generate new reasonable data
examples from the problem statement and

o Discriminator: A discriminator model is a model that classifies the given
examples as real (from the domain) or fake (generated).
Creating novel images given an image dataset is one of the strengths of a specific branch of models
called Generative Adversarial Networks (GAN). These networks specialize in unsupervised/semi-
supervised image generation given any image data.

Building the Model

The GAN we want to create comprises two major parts:

 Generator

 Discriminator.

The Generator is responsible for creating novel images, while the Discriminator is responsible for
understanding how good the generated image is.

The entire architecture we want to build for the GANs image generation is shown in the following
diagram.

Example :
 MNIST (Modified National Institute of Standards and Technology) dataset. This dataset has more
handwritten digits of 28x28
 The MNIST is an easy dataset for a GAN such as the one we are building, as it has small, single
channels images.
 The shape of the image is defined as a matrix of 28x28x128x28x1. The last dimension
corresponds to the number of channels in an image. Since we are using the MNIST dataset in
black and white, we only have a single channel.
 The zsize is the shape of the latent space we want to generate. In this case, we set it to 100. This
number could be modified if required.

Defining the Generator

The job of the Generator (D) is to create realistic images that the Discriminator fails to understand are
fake. Thus, the Generator is an essential component that enables a GANs image generation ability. The
architecture we consider in this article comprises fully connected layers (FC) and Leaky ReLU activations.
The final layer of the Generator has a TanH activation rather than a LeakyReLU. This replacement was
done because we wanted to convert the generated image to the same range as the original MNIST
dataset (-1, 1).

Defining the Discriminator

The GAN uses the Discriminator (D) to identify how real the Generator's outputs look by returning a
probability of real vs. fake. This part of the network can be considered a binary classification problem. To
solve this binary classification problem, we need a rather simple network composed of blocks of Fully
Connect Layers (FC), Leaky ReLU activations, and Dropout layers. Note that the final layer has a block
with an FC layer and a Sigmoid. The final Sigmoid activation returns the classification probability that we
require.
Having defined all the required functions, we can train the network to optimize the losses.

Steps for the GAN's image generation are as follows:

 Load an image, and generate random noise the same size as the loaded image.
 Send these images to the Discriminator and calculate the real vs. fake probability.
 Generate another noise of the same size. Send this noise to the Generator.
 Run training for the Generator for a few epochs.
 Repeat all the steps until a satisfactory image is generated.

Conditional Generative Adversarial Networks

Generative Adversarial Networks (GAN) is a deep learning framework that is used to generate
random, plausible examples based on our needs. It contains two essential parts that are always
competing against each other in a repetitive process (as adversaries). These two essential parts
are:
 Generator Network: It is the neural network responsible for creating (or generating)
new data. They can be in the form of an image, text, video, sound, etc., as per the data
they are trained on.
 Discriminator Network: It’s work is to distinguish between real and fake data from the
dataset and data generated by the generator.

The Conditional Generative Adversarial Network (cGAN) is a model used in deep learning, a
derivative of machine learning. It enables more precise generation and discrimination of images
to train machines and allow them to learn on their own.

Imagine the need to generate images that are of only Mercedes cars when you have trained your
model on a collection of cars. To do that, you need to provide the GAN model with a specific
“condition,” which can be done by providing the car’s name (or label). Conditional generative
adversarial networks work in the same way as GANs. The generation of data in a CGAN is
conditional on specific input information, which could be labels, class information, or any other
relevant features. This conditioning enables more precise and targeted data generation.

To understand what a cGAN is, you first need to become familiar with Deep Learning. This
process involves feeding a computer program thousands of data points so that it can learn to
recognize them. The Generative Adversarial Network (GAN) represents an initial training
approach. It facilitates a dialogue between two networks: the generator and the discriminator.
On one side, the generator creates fake images that are supposed to be as realistic as possible,
with the aim of deceiving the opposing network: the discriminator.
On the other side, the discriminator observes images coming from both the generator and a
database. It must determine which images come from the database (and label them as real) and
which images are generated by the generator (and are therefore fake).
The discriminator correctly classifies fakes as fakes and real images as real, receiving positive
feedback. If it fails in its task, it receives negative feedback. Gradually, thanks to the Gradient
Descent algorithm, it can determine the range of data that allows it to recognize a real image,
learn from its mistakes, and improve. Thus, it progressively enhances its ability to create more
relevant objects.
The cGAN or how to maximize the performance of the generator and the discriminator
With a conditional GAN, it’s possible to send more precise information, called class labels, to
both the generator and the discriminator to guide their data generation. These pieces of
information help specify the data produced by the generator and the discriminator, allowing them
to arrive at the desired results more quickly.

The labels guide the generator’s production to generate more specific information. For example,
instead of producing images of clothing in general, it will produce images of pants, jackets, or
socks based on the provided label.
On the discriminator’s side, the labels help the network better distinguish between real images
and the fake images provided by the generator, making it more efficient.
Architecture and Working of CGANs

Conditioning in GANs:

 GANs can be extended to a conditional model by providing additional information (denoted as y)

to both the generator and discriminator.

 This additional information (y) can be any kind of auxiliary information, such as class labels or
data from other modalities.

 In the generator, the prior input noise (z) and y are combined in a joint hidden representation.

Generator Architecture:

 The generator takes both the prior input noise (z) and the additional information (y) as inputs.

 These inputs are combined in a joint hidden representation, and the generator produces
synthetic samples.

 The adversarial training framework allows flexibility in how this hidden representation is
composed.

Discriminator Architecture:
 The discriminator takes both real data (x) and the additional information (y) as inputs.

 The discriminator’s task is to distinguish between real data and synthetic data generated by the
generator conditioned on y.

Attention Is All You Need
No ratings yet
Attention Is All You Need
15 pages
Attention Mechanism
No ratings yet
Attention Mechanism
21 pages
Attention - Attention! - Lil'Log
No ratings yet
Attention - Attention! - Lil'Log
23 pages
L22 - Attention in Deep Learning
No ratings yet
L22 - Attention in Deep Learning
65 pages
Visual Attention Methods in Deep Learning An In-Depth Survey
No ratings yet
Visual Attention Methods in Deep Learning An In-Depth Survey
29 pages
Intra-Neuronal Attention Within Language Models: Relationships Between Activation and Semantics
No ratings yet
Intra-Neuronal Attention Within Language Models: Relationships Between Activation and Semantics
42 pages
Attention Attention!
No ratings yet
Attention Attention!
26 pages
NLP 8
No ratings yet
NLP 8
42 pages
Module 3 Presentation
No ratings yet
Module 3 Presentation
48 pages
LLM Attention
No ratings yet
LLM Attention
13 pages
UNIT 2 FULL - Compressed
No ratings yet
UNIT 2 FULL - Compressed
26 pages
Self Attention Mechanism
No ratings yet
Self Attention Mechanism
20 pages
Understanding Attention Mechanisms in Deep Learning
No ratings yet
Understanding Attention Mechanisms in Deep Learning
104 pages
WT and Fds Practical Slips
No ratings yet
WT and Fds Practical Slips
32 pages
Dis7 Sol
No ratings yet
Dis7 Sol
8 pages
cs224n 2022 Lecture08 Final Project
No ratings yet
cs224n 2022 Lecture08 Final Project
71 pages
3.1 Language Models and Attention
No ratings yet
3.1 Language Models and Attention
22 pages
NeurIPS 2021 Understanding How Encoder Decoder Architectures Attend Paper
No ratings yet
NeurIPS 2021 Understanding How Encoder Decoder Architectures Attend Paper
12 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
NLP Week8 Transformers
No ratings yet
NLP Week8 Transformers
66 pages
AZURE
No ratings yet
AZURE
314 pages
Tructured Ttention Etworks: (Yoonkim@seas, Carldenton@college, Lhoang@g, Srush@seas) .Harvard - Edu
No ratings yet
Tructured Ttention Etworks: (Yoonkim@seas, Carldenton@college, Lhoang@g, Srush@seas) .Harvard - Edu
21 pages
Visual Attention Methods in Deep Learning An In-De
No ratings yet
Visual Attention Methods in Deep Learning An In-De
20 pages
SimpleTron NeurIPS 2022
No ratings yet
SimpleTron NeurIPS 2022
15 pages
Lecture 10
No ratings yet
Lecture 10
66 pages
A General Survey On Attention Mechanisms in Deep Learning
No ratings yet
A General Survey On Attention Mechanisms in Deep Learning
20 pages
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
No ratings yet
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
117 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
An Overview of The Attention Mechanisms in Compute
No ratings yet
An Overview of The Attention Mechanisms in Compute
8 pages
Transformer Tutorial
No ratings yet
Transformer Tutorial
14 pages
Transformers From Scratch PoliTO - Ipynb Colab
No ratings yet
Transformers From Scratch PoliTO - Ipynb Colab
17 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
Encoder Decoder Attention Final 25 QA
No ratings yet
Encoder Decoder Attention Final 25 QA
4 pages
Attention LLM
No ratings yet
Attention LLM
36 pages
I/A Series Integrated Control Configurator: Continuous
No ratings yet
I/A Series Integrated Control Configurator: Continuous
200 pages
Attention in Neural Networks
No ratings yet
Attention in Neural Networks
8 pages
C11-Attention and Transformers
No ratings yet
C11-Attention and Transformers
59 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
SQL Interview Questions and Answers Final PDF
No ratings yet
SQL Interview Questions and Answers Final PDF
28 pages
Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
No ratings yet
Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
14 pages
BIBLIOMETRIC
100% (1)
BIBLIOMETRIC
30 pages
Guia Mendiix
No ratings yet
Guia Mendiix
10 pages
Attention Mechanism
No ratings yet
Attention Mechanism
2 pages
Presentation On Attention Model
No ratings yet
Presentation On Attention Model
14 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
Attention Mechanism
No ratings yet
Attention Mechanism
7 pages
Transformers
No ratings yet
Transformers
15 pages
Attention
No ratings yet
Attention
3 pages
Unit-1 Notes Complete
No ratings yet
Unit-1 Notes Complete
75 pages
Lecture15 Transformer
No ratings yet
Lecture15 Transformer
26 pages
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
No ratings yet
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
5 pages
Attention Paper Summary
No ratings yet
Attention Paper Summary
3 pages
DL Unitwuse
No ratings yet
DL Unitwuse
5 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
11 pages
Comprehensive Guide Attention Mechanism Deep Learning
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
17 pages
COMPUTER SCIENCE PRACTICAL FILE - Arnav-1-1 (1) 1
No ratings yet
COMPUTER SCIENCE PRACTICAL FILE - Arnav-1-1 (1) 1
102 pages
Attention in Natural Language Processing: Andrea Galassi, Marco Lippi, and Paolo Torroni
No ratings yet
Attention in Natural Language Processing: Andrea Galassi, Marco Lippi, and Paolo Torroni
18 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
2-Data Warehousing
No ratings yet
2-Data Warehousing
30 pages
Autodesk Civil 3D Pre Test
No ratings yet
Autodesk Civil 3D Pre Test
35 pages
Attention
No ratings yet
Attention
15 pages
Pega For SAs - Chapter 1 - Savable Datapage-2
No ratings yet
Pega For SAs - Chapter 1 - Savable Datapage-2
14 pages
Attention Mechanism in Neural Networks
No ratings yet
Attention Mechanism in Neural Networks
22 pages
495 Lecture 10 Attall
No ratings yet
495 Lecture 10 Attall
18 pages
7181 Attention Is All You Need
No ratings yet
7181 Attention Is All You Need
11 pages
Resumen Data Architect
No ratings yet
Resumen Data Architect
76 pages
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
No ratings yet
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
11 pages
Crowdstrike Falcon Event Streams Add-On: Installation and Configuration Guide
No ratings yet
Crowdstrike Falcon Event Streams Add-On: Installation and Configuration Guide
33 pages
Chapter 6 Case Study Hadoop
No ratings yet
Chapter 6 Case Study Hadoop
39 pages
A1
No ratings yet
A1
11 pages
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
No ratings yet
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
15 pages
Transformer
No ratings yet
Transformer
5 pages
Attention Is All You Need Paper - Removed
No ratings yet
Attention Is All You Need Paper - Removed
9 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
49 pages
Notes of Transformer
No ratings yet
Notes of Transformer
8 pages
Lesson F - 2 Ch07 Testing Computer Application Controls CAATTs For Testing Controls
No ratings yet
Lesson F - 2 Ch07 Testing Computer Application Controls CAATTs For Testing Controls
30 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
40 pages
Oracle® Database: Unplugging, Plugging, and Upgrading A PDB Toanewcdb
No ratings yet
Oracle® Database: Unplugging, Plugging, and Upgrading A PDB Toanewcdb
13 pages
Mid Term Examination I: Questions Marks Out of
No ratings yet
Mid Term Examination I: Questions Marks Out of
11 pages
Cnns Layers: Convolution Neural Network Convolutional Neural Network
No ratings yet
Cnns Layers: Convolution Neural Network Convolutional Neural Network
10 pages
Intercope Box Product - Presentation v1.0
No ratings yet
Intercope Box Product - Presentation v1.0
22 pages
Lecture Six-Schemas
No ratings yet
Lecture Six-Schemas
5 pages
Oracle AWR (Automatic Workload Repository) Trending
No ratings yet
Oracle AWR (Automatic Workload Repository) Trending
6 pages
PLSQL Journal Introduction 2024
No ratings yet
PLSQL Journal Introduction 2024
12 pages
Dev Ops
No ratings yet
Dev Ops
6 pages
Akash Thumma Resumed
No ratings yet
Akash Thumma Resumed
1 page
(E-R Modeling From The Problem Statements) : Practical 6 Objective
No ratings yet
(E-R Modeling From The Problem Statements) : Practical 6 Objective
5 pages
OCP Java SE 11 Study Plan
No ratings yet
OCP Java SE 11 Study Plan
2 pages
Informatics Practices Practical List22-23
No ratings yet
Informatics Practices Practical List22-23
3 pages
12.multi-Valued Dependencies and Fourth Normal Form
No ratings yet
12.multi-Valued Dependencies and Fourth Normal Form
4 pages
COMP 552 Introduction To Cybersecurity Winter 2021: Page 1 of 3
No ratings yet
COMP 552 Introduction To Cybersecurity Winter 2021: Page 1 of 3
3 pages
Hive 1
No ratings yet
Hive 1
7 pages
Information Management Exercise 3 Questions: Isbn Book Title Publisher Edition
No ratings yet
Information Management Exercise 3 Questions: Isbn Book Title Publisher Edition
4 pages

Unit 4 Notes

Uploaded by

Unit 4 Notes

Uploaded by

Attention Mechanisms

The below figure demonstrates an Encoder-Decoder architecture with an attention layer.

The implementations of an attention layer can be broken down into 4 steps.

Step 0: Prepare hidden states.

Step 1: Obtain a score for every encoder hidden state.

Step 2: Run all the scores through a softmax layer.

Step 3: Multiply each encoder hidden state by its softmax score.

Step 5: Feed the context vector into the decoder.

Recurrent Models of Visual Attention

2. Recurrent Neural Networks (RNNs):

3. Combining Attention and RNNs:

Architecture of Recurrent Models of Visual Attention

o The attended feature representation (context vector) is computed as a weighted sum of

Attention Mechanism Details

1. Calculating Attention Weights:

3. Weighted Feature Representation:

Applications of Recurrent Models of Visual Attention

2. Visual Question Answering (VQA):

3. Object Detection and Recognition:

o In video processing, recurrent models of visual attention can dynamically focus on

Attention Mechanisms for Machine Translation in Deep Learning

Attention mechanisms have revolutionized machine translation by allowing models to focus on

Neural Machine Translation (NMT)

A probability of translation y given the source sentence x

where h(j) could be modeled as

RNN hidden unit definition (h)

Generative Adversarial Networks

Training a Generative Adversarial Network

Using GANs for Generating Image Data

Building the Model

The GAN we want to create comprises two major parts:

Defining the Generator

Defining the Discriminator

Steps for the GAN's image generation are as follows:

Conditional Generative Adversarial Networks

 GANs can be extended to a conditional model by providing additional information (denoted as y)

You might also like