0% found this document useful (0 votes)
11 views2 pages

Additive - Attention Example

Additive attention, introduced by Bahdanau et al. in 2015, is a mechanism used in sequence-to-sequence models to improve the alignment between input and output sequences. It computes attention scores by first applying a feedforward neural network to combine the decoder's previous hidden state with each encoder hidden state. The combined vector is then passed through a non-linear activation function (typically tanh), followed by a linear layer to produce a scalar score for each encoder hidden sta

Uploaded by

l228296
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views2 pages

Additive - Attention Example

Additive attention, introduced by Bahdanau et al. in 2015, is a mechanism used in sequence-to-sequence models to improve the alignment between input and output sequences. It computes attention scores by first applying a feedforward neural network to combine the decoder's previous hidden state with each encoder hidden state. The combined vector is then passed through a non-linear activation function (typically tanh), followed by a linear layer to produce a scalar score for each encoder hidden sta

Uploaded by

l228296
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Additive Attention Mechanism Calculation

Encoder Embedding Vectors


 
hturn = 0.1 0.2 0.3 0.4
 
hoff = 0.5 0.6 0.7 0.8
 
hthe = 0.9 1.0 1.1 1.2
 
hlight = 1.3 1.4 1.5 1.6

Decoder Hidden State


 
hdecoder1 = 0.0 0.4 1.0 0.3

Step 1: Concatenate Decoder Hidden State with Each Encoder Hidden State
For hturn :  
concat(hdecoder1 , hturn ) = 0.0 0.4 1.0 0.3 0.1 0.2 0.3 0.4
For hoff :  
concat(hdecoder1 , hoff ) = 0.0 0.4 1.0 0.3 0.5 0.6 0.7 0.8
For hthe :  
concat(hdecoder1 , hthe ) = 0.0 0.4 1.0 0.3 0.9 1.0 1.1 1.2
For hlight :  
concat(hdecoder1 , hlight ) = 0.0 0.4 1.0 0.3 1.3 1.4 1.5 1.6

Step 2: Apply Weight Matrix W


Assume W is a weight matrix of appropriate dimensions. For simplicity, let W be an identity matrix of
size 8 × 8 for demonstration purposes.
For hturn :
 
W · concat(hdecoder1 , hturn ) = 0.0 0.4 1.0 0.3 0.1 0.2 0.3 0.4

For hoff :  
W · concat(hdecoder1 , hoff ) = 0.0 0.4 1.0 0.3 0.5 0.6 0.7 0.8
For hthe :
 
W · concat(hdecoder1 , hthe ) = 0.0 0.4 1.0 0.3 0.9 1.0 1.1 1.2

For hlight :
 
W · concat(hdecoder1 , hlight ) = 0.0 0.4 1.0 0.3 1.3 1.4 1.5 1.6

Step 3: Apply v Vector and tanh Activation


 
Assume v is a vector of size 8. For simplicity, let v be a vector of ones: v = 1 1 1 1 1 1 1 1 .
For hturn :

score(hdecoder1 , hturn ) = v·tanh W · concat(hdecoder1 , hturn ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.1)+ta

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.0997 + 0.1974 + 0.2913 + 0.3799 = 2.4011


For hoff :

score(hdecoder1 , hoff ) = v·tanh W · concat(hdecoder1 , hoff ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.5)+tanh(

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.4621 + 0.5370 + 0.6044 + 0.6640 = 3.7003

1
For hthe :

score(hdecoder1 , hthe ) = v·tanh W · concat(hdecoder1 , hthe ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(0.9)+tanh

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.7163 + 0.7616 + 0.8005 + 0.8337 = 4.5449


For hlight :

score(hdecoder1 , hlight ) = v·tanh W · concat(hdecoder1 , hlight ) = tanh(0.0)+tanh(0.4)+tanh(1.0)+tanh(0.3)+tanh(1.3)+ta

= 0.0 + 0.3799 + 0.7616 + 0.2913 + 0.8617 + 0.8854 + 0.9051 + 0.9216 = 5.0066

Step 4: Apply Softmax to Scores


exp(score)
softmax(score) = P
exp(score)
Calculate the exponential values:

exp(2.4011) ≈ 11.0342, exp(3.7003) ≈ 40.4559, exp(4.5449) ≈ 94.3468, exp(5.0066) ≈ 149.9468

Sum of exponentials:

11.0342 + 40.4559 + 94.3468 + 149.9468 = 295.7837

Calculate the softmax values:


11.0342
αturn = ≈ 0.0373
295.7837
40.4559
αoff = ≈ 0.1368
295.7837
94.3468
αthe = ≈ 0.3190
295.7837
149.9468
αlight = ≈ 0.5069
295.7837

Step 5: Calculate Context Vector ct


ct = αturn · hturn + αoff · hoff + αthe · hthe + αlight · hlight

       
ct = 0.0373· 0.1 0.3 0.4 0.5 +0.1368· 0.6 0.7 0.8 0.9 +0.3190· 1.0 1.1 1.2 1.3 +0.5069· 1.4 1.5 1.6 1.7

      
= 0.0037 0.0112 0.0149 0.0186 + 0.0821 0.0958 0.1094 0.1231 + 0.3190 0.3509 0.3828 0.4147 + 0.7097 0

 
= 1.1145 1.2182 1.3179 1.4180

You might also like