0% found this document useful (0 votes)

11 views2 pages

Additive - Attention Example

Additive attention, introduced by Bahdanau et al. in 2015, is a mechanism used in sequence-to-sequence models to improve the alignment between input and output sequences. It computes attention scores by first applying a feedforward neural network to combine the decoder's previous hidden state with each encoder hidden state. The combined vector is then passed through a non-linear activation function (typically tanh), followed by a linear layer to produce a scalar score for each encoder hidden sta

Uploaded by

l228296

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views2 pages

Additive - Attention Example

Uploaded by

l228296

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Additive Attention Mechanism Calculation

Encoder Embedding Vectors

hturn = 0.1 0.2 0.3 0.4

hoff = 0.5 0.6 0.7 0.8

hthe = 0.9 1.0 1.1 1.2

hlight = 1.3 1.4 1.5 1.6

Decoder Hidden State

hdecoder1 = 0.0 0.4 1.0 0.3

Step 1: Concatenate Decoder Hidden State with Each Encoder Hidden State
For hturn :
concat(hdecoder1 , hturn ) = 0.0 0.4 1.0 0.3 0.1 0.2 0.3 0.4
For hoff :
concat(hdecoder1 , hoff ) = 0.0 0.4 1.0 0.3 0.5 0.6 0.7 0.8
For hthe :
concat(hdecoder1 , hthe ) = 0.0 0.4 1.0 0.3 0.9 1.0 1.1 1.2
For hlight :
concat(hdecoder1 , hlight ) = 0.0 0.4 1.0 0.3 1.3 1.4 1.5 1.6

Step 2: Apply Weight Matrix W

Assume W is a weight matrix of appropriate dimensions. For simplicity, let W be an identity matrix of
size 8 × 8 for demonstration purposes.
For hturn :

W · concat(hdecoder1 , hturn ) = 0.0 0.4 1.0 0.3 0.1 0.2 0.3 0.4

For hoff :
W · concat(hdecoder1 , hoff ) = 0.0 0.4 1.0 0.3 0.5 0.6 0.7 0.8
For hthe :

W · concat(hdecoder1 , hthe ) = 0.0 0.4 1.0 0.3 0.9 1.0 1.1 1.2

For hlight :

W · concat(hdecoder1 , hlight ) = 0.0 0.4 1.0 0.3 1.3 1.4 1.5 1.6

Step 3: Apply v Vector and tanh Activation

Assume v is a vector of size 8. For simplicity, let v be a vector of ones: v = 1 1 1 1 1 1 1 1 .
For hturn :

score(hdecoder1 , hturn ) = v·tanh W · concat(hdecoder1 , hturn ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.1)+ta

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.0997 + 0.1974 + 0.2913 + 0.3799 = 2.4011

For hoff :

score(hdecoder1 , hoff ) = v·tanh W · concat(hdecoder1 , hoff ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.5)+tanh(

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.4621 + 0.5370 + 0.6044 + 0.6640 = 3.7003

1
For hthe :

score(hdecoder1 , hthe ) = v·tanh W · concat(hdecoder1 , hthe ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.9)+tanh

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.7163 + 0.7616 + 0.8005 + 0.8337 = 4.5449

For hlight :

score(hdecoder1 , hlight ) = v·tanh W · concat(hdecoder1 , hlight ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(1.3)+ta

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.8617 + 0.8854 + 0.9051 + 0.9216 = 5.0066

Step 4: Apply Softmax to Scores

exp(score)
softmax(score) = P
exp(score)
Calculate the exponential values:

exp(2.4011) ≈ 11.0342, exp(3.7003) ≈ 40.4559, exp(4.5449) ≈ 94.3468, exp(5.0066) ≈ 149.9468

Sum of exponentials:

11.0342 + 40.4559 + 94.3468 + 149.9468 = 295.7837

Calculate the softmax values:

11.0342
αturn = ≈ 0.0373
295.7837
40.4559
αoff = ≈ 0.1368
295.7837
94.3468
αthe = ≈ 0.3190
295.7837
149.9468
αlight = ≈ 0.5069
295.7837

Step 5: Calculate Context Vector ct

ct = αturn · hturn + αoff · hoff + αthe · hthe + αlight · hlight

ct = 0.0373· 0.1 0.3 0.4 0.5 +0.1368· 0.6 0.7 0.8 0.9 +0.3190· 1.0 1.1 1.2 1.3 +0.5069· 1.4 1.5 1.6 1.7

= 0.0037 0.0112 0.0149 0.0186 + 0.0821 0.0958 0.1094 0.1231 + 0.3190 0.3509 0.3828 0.4147 + 0.7097 0

= 1.1145 1.2182 1.3179 1.4180

MA3005 Compiled
No ratings yet
MA3005 Compiled
280 pages
SECA 770 Service Manual
100% (2)
SECA 770 Service Manual
17 pages
Assignment 2 - ML-SelfAttn
No ratings yet
Assignment 2 - ML-SelfAttn
4 pages
NNDL
No ratings yet
NNDL
4 pages
HW 3
No ratings yet
HW 3
12 pages
CS541 HW4
No ratings yet
CS541 HW4
11 pages
Chap 6 Embedding
No ratings yet
Chap 6 Embedding
44 pages
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
No ratings yet
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
9 pages
Ad3511 Deep Learning Lab Manual - 241230 - 204240
No ratings yet
Ad3511 Deep Learning Lab Manual - 241230 - 204240
63 pages
Notes Chapter Feature Representation
No ratings yet
Notes Chapter Feature Representation
6 pages
Assignment 10
No ratings yet
Assignment 10
2 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
Lesson 1
No ratings yet
Lesson 1
49 pages
Comprehensive Exam - Answer Key - DNN - EC3M - October 2024
No ratings yet
Comprehensive Exam - Answer Key - DNN - EC3M - October 2024
7 pages
Math Tools NYU HW2 Problem Set
No ratings yet
Math Tools NYU HW2 Problem Set
3 pages
Assignment No 4
No ratings yet
Assignment No 4
8 pages
AI-ML Module3
No ratings yet
AI-ML Module3
117 pages
Neural Networks - DR - G V Maha Lakshmi
No ratings yet
Neural Networks - DR - G V Maha Lakshmi
32 pages
Practical 10 Solution
No ratings yet
Practical 10 Solution
6 pages
MAMBA
No ratings yet
MAMBA
5 pages
Ordinary Differential Equation Application
No ratings yet
Ordinary Differential Equation Application
8 pages
NM Week4 Final
No ratings yet
NM Week4 Final
6 pages
Attention Processor
No ratings yet
Attention Processor
3 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
49 pages
AI Assignment
No ratings yet
AI Assignment
31 pages
Peng Et Al.: Deep Learning and Practice 1
No ratings yet
Peng Et Al.: Deep Learning and Practice 1
8 pages
ANFIS Nerual Network Matlab
No ratings yet
ANFIS Nerual Network Matlab
14 pages
Anlp 05 Transformers
No ratings yet
Anlp 05 Transformers
40 pages
COMP9444 Summary
No ratings yet
COMP9444 Summary
36 pages
Exp 1 - Exp 2 - Exp 3 - Merged
No ratings yet
Exp 1 - Exp 2 - Exp 3 - Merged
9 pages
NLP Week8 Transformers
No ratings yet
NLP Week8 Transformers
66 pages
ANN Detection Technique
No ratings yet
ANN Detection Technique
20 pages
Dat 300
No ratings yet
Dat 300
12 pages
Coding Attention Mechanisms
No ratings yet
Coding Attention Mechanisms
24 pages
Soft Computing Practical Teacher Manual
No ratings yet
Soft Computing Practical Teacher Manual
87 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
110 pages
Solved Example of Transformers
No ratings yet
Solved Example of Transformers
20 pages
CSE512 Fall19 HW4V1
No ratings yet
CSE512 Fall19 HW4V1
6 pages
Tutorial On Diffusion Models
No ratings yet
Tutorial On Diffusion Models
4 pages
2018 Online Normalizer Calculation For Softmax Milakov Gimelshein ArXiv
No ratings yet
2018 Online Normalizer Calculation For Softmax Milakov Gimelshein ArXiv
9 pages
Suhasini ECE 170103015 1000009511 PDF
No ratings yet
Suhasini ECE 170103015 1000009511 PDF
18 pages
NLP 4
No ratings yet
NLP 4
10 pages
Multi-Query Attention
No ratings yet
Multi-Query Attention
4 pages
AD3511 Deep Learning Lab Manual
No ratings yet
AD3511 Deep Learning Lab Manual
54 pages
Position Encoding: Intuition Lack Inherent Word Order Awareness
No ratings yet
Position Encoding: Intuition Lack Inherent Word Order Awareness
33 pages
Lec 100
No ratings yet
Lec 100
13 pages
SPH Whitepaper
No ratings yet
SPH Whitepaper
25 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
DL Lab Manual - Organized
No ratings yet
DL Lab Manual - Organized
50 pages
Medium Understand The Softmax Function in Minutes F3a59641e86d
No ratings yet
Medium Understand The Softmax Function in Minutes F3a59641e86d
14 pages
PES1PG24CS018 Debjit DLTP Assignment-2 BERT Report
No ratings yet
PES1PG24CS018 Debjit DLTP Assignment-2 BERT Report
10 pages
Trans Ormer Numerical
No ratings yet
Trans Ormer Numerical
20 pages
A 3
No ratings yet
A 3
5 pages
DL Final
No ratings yet
DL Final
63 pages
Course Code::: Neural Network and Fuzzy Logics ECE3008
No ratings yet
Course Code::: Neural Network and Fuzzy Logics ECE3008
11 pages
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
No ratings yet
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
44 pages
Bahdanau Attention Mechanism (Also Known As Additive Attention)
No ratings yet
Bahdanau Attention Mechanism (Also Known As Additive Attention)
41 pages
Variational AutoEncoders (VAE) With PyTorch - Alexander Van de Kleut
No ratings yet
Variational AutoEncoders (VAE) With PyTorch - Alexander Van de Kleut
17 pages
Additional Assignment
No ratings yet
Additional Assignment
7 pages
Abhishek Soft Computing Practical File
No ratings yet
Abhishek Soft Computing Practical File
16 pages
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Section 531 - Grounding and Lightning Protection System
No ratings yet
Section 531 - Grounding and Lightning Protection System
11 pages
Lesson Plan in Science
No ratings yet
Lesson Plan in Science
5 pages
Design of Steel Structures Toc
0% (1)
Design of Steel Structures Toc
6 pages
r05412107 Helicopter Engineering
No ratings yet
r05412107 Helicopter Engineering
5 pages
Scientific Ideas
No ratings yet
Scientific Ideas
4 pages
Becc-102 e
No ratings yet
Becc-102 e
318 pages
S8 1st Periodical Test Questionnaire
No ratings yet
S8 1st Periodical Test Questionnaire
7 pages
40034
No ratings yet
40034
6 pages
Ass 2021
No ratings yet
Ass 2021
6 pages
Datasheet PTC 200 Ds
No ratings yet
Datasheet PTC 200 Ds
9 pages
Poems About Time - and Love
100% (1)
Poems About Time - and Love
8 pages
Joint Probability Distributions: Chapter Outline
No ratings yet
Joint Probability Distributions: Chapter Outline
12 pages
The Weaver of Whispers - Part Three - Echoes in The Silence
No ratings yet
The Weaver of Whispers - Part Three - Echoes in The Silence
2 pages
Polymer Fabrication
No ratings yet
Polymer Fabrication
16 pages
Module-2 Fundamentals of Surveying
100% (1)
Module-2 Fundamentals of Surveying
2 pages
Cycling and Sprinting Biomechanics
No ratings yet
Cycling and Sprinting Biomechanics
42 pages
Hypothesis Testing
0% (1)
Hypothesis Testing
139 pages
Additional Mathematics - Area of Sector
100% (2)
Additional Mathematics - Area of Sector
2 pages
Lecture #1: Introduction To Robotics
No ratings yet
Lecture #1: Introduction To Robotics
18 pages
Final Report - Coron Power Plant, Palawan
100% (1)
Final Report - Coron Power Plant, Palawan
9 pages
Physics Practical Manual Class Xi-1
No ratings yet
Physics Practical Manual Class Xi-1
32 pages
What Is The Structural Composition of Hydrosphere
No ratings yet
What Is The Structural Composition of Hydrosphere
3 pages
Space Robotics Seminar Report
No ratings yet
Space Robotics Seminar Report
31 pages
Anindya Chatterjee CV
No ratings yet
Anindya Chatterjee CV
9 pages
Beeswax Abstract
No ratings yet
Beeswax Abstract
4 pages
Shardlow
No ratings yet
Shardlow
19 pages
Module 5 - Newtons Laws
No ratings yet
Module 5 - Newtons Laws
11 pages
Astm E587 15 2020
100% (1)
Astm E587 15 2020
5 pages

Additive - Attention Example

Uploaded by

Additive - Attention Example

Uploaded by

Additive Attention Mechanism Calculation

Encoder Embedding Vectors

Decoder Hidden State

Step 2: Apply Weight Matrix W

Step 3: Apply v Vector and tanh Activation

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.0997 + 0.1974 + 0.2913 + 0.3799 = 2.4011

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.4621 + 0.5370 + 0.6044 + 0.6640 = 3.7003

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.7163 + 0.7616 + 0.8005 + 0.8337 = 4.5449

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.8617 + 0.8854 + 0.9051 + 0.9216 = 5.0066

Step 4: Apply Softmax to Scores

exp(2.4011) ≈ 11.0342, exp(3.7003) ≈ 40.4559, exp(4.5449) ≈ 94.3468, exp(5.0066) ≈ 149.9468

11.0342 + 40.4559 + 94.3468 + 149.9468 = 295.7837

Calculate the softmax values:

Step 5: Calculate Context Vector ct

You might also like