0% found this document useful (0 votes)
202 views42 pages

Transformers For NLP

The document discusses the Transformer architecture for natural language processing. It explains key components of the Transformer like the encoder, embeddings, attention heads, layer normalization, and masking. It also provides a link to an interactive demo of Transformers and concludes with the hope that the reader found it informative.

Uploaded by

Mynam Meghana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views42 pages

Transformers For NLP

The document discusses the Transformer architecture for natural language processing. It explains key components of the Transformer like the encoder, embeddings, attention heads, layer normalization, and masking. It also provides a link to an interactive demo of Transformers and concludes with the hope that the reader found it informative.

Uploaded by

Mynam Meghana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Transformers for NLP

Dr. Kisor K. Sahu


IITBBS
Transformer
architecture
Encoder

N=6
Transformer
architecture
Transformer Original paper embedding d=512

architecture

@ start training @ end training


Original paper embeding d=512

Transformer
architecture
Transformer
architecture
Transformer
architecture
Transformer
architecture
Embedding at a glance
Transformer
architecture
Transformer
architecture
Transformer
architecture
Transformer
architecture Query
Without
activation
Transformer
architecture
Video content is
the value
Transformer
architecture
After training

Final Self-Attention Filter


Original paper: 8 attention heads
Post-normalization

Layer Normalization means standardization of neuron activation along


the axis of feature

Adding a small value to


avoid dividing by 0
Transformer architecture
Transformer architecture
How to implement masking
FUN WITH TRANSFORMERS
Link: https://transformer.huggingface.co/doc/distil-gpt2

The end
Hope that you had a good time with transformer
Thank you!!!!!!!!!!

You might also like