0% found this document useful (0 votes)
32 views15 pages

答案解析

Uploaded by

zhouzhennsfz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

答案解析

Uploaded by

zhouzhennsfz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Transformer

: NLP
: DASOU







LP
N
1. Transformer
2. 3 Transformer Encoder
Transformer

4.
RPR

RPR
BN -- Batch Normalization
BN
BN
BN
6. NLP -layer-norm BatchNorm
BN NLP
layner-norm


- BN CNN NLP

7. Decoder 放


8. Transformer
9. Transformer

10. Transformer Q K
11. Transformer attention
LP

12. softmax attention scaled dk


N

13. attention score padding mask

1. Transformer
1. Transformer
2. Transformer Q K

3. Transformer attention

4. softmax attention scaled dk

5. attention score padding mask


6. head
7. Transformer Encoder
8. embedding size
9. Transformer
10.
11. Transformer
12. transformer LayerNorm BatchNorm LayerNorm Transformer

13. BatchNorm
14. Transformer
15. Encoder Decoder seq2seq attention
16. Decoder encoder decoder
sequence mask)
17. Transformer Decoder
18. wordpiece model byte pair encoding
19. Transformer Dropout Dropout

20. bert bert mask transformer attention score




2. 3 Transformer Encoder

Transformer encoder decoder encoder N


N=6

LP

RNN
N

Add&Norm Linear Add&Norm

Q/K/V Q/K ( softmax) V

n_heads n_heads
hidden_size/n_heads

Add
NLP NLP

Transformer RNN RNN


Bilstm
Elmo lstm RNN
GNMT tricks

Norm Layer Normalization Layer Normalization BN


BN

Linear Relu

encoder

word embedding position encoding

word embedding word2vec

position encoding transformer

position encoding position encoding


encoder

encoder decoder 放 K/V


Q decoder K/V encoder




LP
N

Transformer

Transformer
(1) (2) Q/K

(2)







LP
N

(2)



(1) k

5


LP
N

4.

RPR Transformer-XL complex embeddings

RPR

RPR

RPR attention
attention
RPR

” / / / “

” “ ” “ -1 ” “ ” “ 1

RPR

4 attention 4
4

4 4 4 9

attention


“ ”

“ ”



RPR : Q/K/V K

RPR : Q/K/V V
LP
N

BN -- Batch Normalization
layer-norm

BN Batch

MLP 10 5 5 10 10
10

CNN N·C·H·W N batch_size C H W


BN C N,H,W
..... N

BN
BN
BN

sigmoid

BN

batch_size BN batch
batch_size

BN RNN

RNN

batch_size 10 10 9 5 10
20

batch
6 20


batch

batch


batch 1

1000 600 400



LP
N

BN

BN MLP CNN RNN


NLP BN Layer norm

6. NLP -layer-norm BatchNorm


NLP Transformer LayerNorm BatchNorm

LayerNorm
BN NLP

BN RNN batch

batch NLP BN

BN MLP BN batch_size

BN NLP

MLP

:“ / / / ” “ / / /


BN " " “ ” BN

BN 放

layner-norm

layner-norm layner-norm
N C/H/W

“ / / / ”
LP

BN “ ”
layner-norm
N

N batch size

“ ”

insight

- BN CNN NLP

CNN BN NLP BN

BN NLP
NLP
CNN BN feature
map

7. Decoder





decoder Q K/V Encoder


encoder K/V
LP

seq2seq attention
N

context vector

K/V RNN

K/V K/V
Encoder K/V

Encoder Decoder N N 6

Add&Norm

Encoder
mask

mask

mask mask ...(


)

mask

GAP

" / / / "

” “ ” “

teacher forcing ” “

” “ ” “


” “

mask ”
放“ ” “ ” “

ground truth


LP

” “ ” “

GAP mask ” “ ” “
N

Q K/V encoder

K/V encoder

transformer encoder decode

decoder

8. Transformer
Transformer

Decoder RNN

Encoder

attention

6 encoder encoder




9. Transformer

Transformer Multi-head Attention


https://www.zhihu.com/question/341222779

transformer
LP
N



10. Transformer Q K

transformer K Q -
LP

https://www.zhihu.com/question/319339652

Q/K/V
N

11. Transformer attention

attention
dk
dk (
)






LP
N

12. softmax attention scaled


dk
transformer attention scaled? - LinT -
https://www.zhihu.com/question/339723385/answer/782509914

import numpy as np
arr1=np.random.normal(size=(3,1000))
arr2=np.random.normal(size=(3,1000))
result=np.dot(arr1.T,arr2)
arr_var=np.var(result)
print(arr_var) #result: 2.9 ( 3 )
13. attention score padding mask
padding ( -1000 ) batch_size


Transformer transformer


wordpiece model

GPT

: NLP


LP
N

You might also like