答案解析
答案解析
: NLP
: DASOU
弃
放
到
门
入
从
LP
N
1. Transformer
2. 3 Transformer Encoder
Transformer
4.
RPR
RPR
BN -- Batch Normalization
BN
BN
BN
6. NLP -layer-norm BatchNorm
BN NLP
layner-norm
弃
- BN CNN NLP
7. Decoder 放
到
门
入
8. Transformer
9. Transformer
从
10. Transformer Q K
11. Transformer attention
LP
1. Transformer
1. Transformer
2. Transformer Q K
3. Transformer attention
13. BatchNorm
14. Transformer
15. Encoder Decoder seq2seq attention
16. Decoder encoder decoder
sequence mask)
17. Transformer Decoder
18. wordpiece model byte pair encoding
19. Transformer Dropout Dropout
弃
放
到
2. 3 Transformer Encoder
门
N=6
从
LP
RNN
N
n_heads n_heads
hidden_size/n_heads
Add
NLP NLP
Linear Relu
encoder
弃
encoder
Transformer
Transformer
(1) (2) Q/K
(2)
弃
放
到
门
入
从
LP
N
(2)
弃
放
到
(1) k
门
5
入
从
LP
N
4.
RPR
RPR
RPR attention
attention
RPR
” / / / “
” “ ” “ -1 ” “ ” “ 1
RPR
4 attention 4
4
4 4 4 9
attention
弃
“ ”
“ ”
放
到
门
入
RPR : Q/K/V K
从
RPR : Q/K/V V
LP
N
BN -- Batch Normalization
layer-norm
BN Batch
MLP 10 5 5 10 10
10
BN
BN
BN
sigmoid
BN
batch_size BN batch
batch_size
BN RNN
RNN
batch_size 10 10 9 5 10
20
batch
6 20
弃
batch
batch
放
到
batch 1
门
入
BN
LayerNorm
BN NLP
BN RNN batch
batch NLP BN
BN MLP BN batch_size
BN NLP
MLP
:“ / / / ” “ / / /
”
弃
BN " " “ ” BN
BN 放
到
layner-norm
门
入
layner-norm layner-norm
N C/H/W
从
“ / / / ”
LP
BN “ ”
layner-norm
N
N batch size
“ ”
insight
- BN CNN NLP
CNN BN NLP BN
BN NLP
NLP
CNN BN feature
map
7. Decoder
弃
放
到
门
入
encoder K/V
LP
seq2seq attention
N
context vector
K/V RNN
K/V K/V
Encoder K/V
Encoder Decoder N N 6
Add&Norm
Encoder
mask
mask
mask
GAP
" / / / "
” “ ” “
teacher forcing ” “
” “ ” “
弃
” “
mask ”
放“ ” “ ” “
到
门
ground truth
入
从
LP
” “ ” “
GAP mask ” “ ” “
N
Q K/V encoder
K/V encoder
decoder
8. Transformer
Transformer
Decoder RNN
Encoder
attention
6 encoder encoder
弃
放
到
门
9. Transformer
入
transformer
LP
N
弃
放
到
门
10. Transformer Q K
入
从
transformer K Q -
LP
https://www.zhihu.com/question/319339652
Q/K/V
N
attention
dk
dk (
)
弃
放
到
门
入
从
LP
N
import numpy as np
arr1=np.random.normal(size=(3,1000))
arr2=np.random.normal(size=(3,1000))
result=np.dot(arr1.T,arr2)
arr_var=np.var(result)
print(arr_var) #result: 2.9 ( 3 )
13. attention score padding mask
padding ( -1000 ) batch_size
弃
Transformer transformer
放
wordpiece model
到
GPT
门
: NLP
入
从
LP
N