0% found this document useful (0 votes)

148 views50 pages

CS485 Ch5 Transformers

Uploaded by

Mennan Gök

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views50 pages

CS485 Ch5 Transformers

Uploaded by

Mennan Gök

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CS485/585

Deep Generative Networks

Bilkent University
CNNs vs. Transformers
• CNNs exhibit a strong bias towards feature locality,
as well as spatial invariance due to sharing filter
weights across all locations.
• Transformers have strong representation capability
and are free of human-defined inductive bias.
Transformers
Transformers
• To understand transformers, we first need to
understand attention.
• Connections are computed on flight via attentions.
• Requires more data.
Attention
• Which part of the input should I focus?
Sequence Modeling
The book is on the table

Encode

Decode

Kitap masanin ustunde

Sequence to sequence modeling

Sequence Modeling - RNN
The book is on the table

Kitap masanin ustunde

Sequence to sequence modeling

Sequence Modeling - RNN
The book is on the table

Kitap masanin ustunde

Sequence to sequence modeling

Sequence Modeling - RNN
The book is on the table

Kitap masanin ustunde

Attention is all you need

Attention Is All You Need, Neurips 2017

Sequence Modeling
• Challenges with RNNs • Transformer Networks

• Long range • Longe range

dependencies dependencies enabled

• Gradient vanishing • No gradient vanishing

• Serial operations • Parallel computing

Concept of Database
Retrieval from a database
Key 1 Value 1

Query Key 2 Value 2

Key 3 Value 3
Value

Key 4 Value 4
Attention Mechanism
• Mimics the retrieval
• Measure the similarity between query and key and
produce an output based on the similarity.

• 𝑎𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑞, 𝑘, 𝑣 = ∑! 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑞, 𝑘! ×𝑣!

Attention Mechanism
• Mimics the retrieval
• Measure the similarity between query and key and
produce an output based on the similarity.

• 𝑎𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑞, 𝑘, 𝑣 = ∑! 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑞, 𝑘! ×𝑣!

𝑠! 𝑠" 𝑠# 𝑠$

query

𝑘! 𝑘" 𝑘# 𝑘$
Attention Mechanism
𝑠! 𝑠" 𝑠# 𝑠$

query

𝑘! 𝑘" 𝑘# 𝑘$
• Similarity:
– Dot product 𝑞# 𝑘$
%! &"
– Scaled dot product , d is dimensionality of each key
'
– General dot product 𝑞# 𝑊𝑘$
Attention Mechanism
Largest scale dot product
comes from 𝑘)
𝑘!

𝑘"

𝑞!

𝑘#
Attention Mechanism
𝑠! 𝑠" 𝑠# 𝑠$

query

𝑘! 𝑘" 𝑘# 𝑘$

• Similarity: Dot product 𝑞 " 𝑘!

• Similarities compete with SoftMax
#$%('! )
𝑎! = ∑! #$%('! )
Attention Mechanism
Largest scale dot product
comes from 𝑘)
𝑘!
Apply softmax
𝑘"

𝑞!

𝑘#
Attention Mechanism

• 𝑎𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑞, 𝑘, 𝑣 = ∑! 𝑎! ×𝑣!

• Provides a soft attention.

Attention
• Which part of the input should I focus?
Attention
• Self attention
• Query, Key, and Value all depends on the input
features (x).
• Q = conv1(x)
• K = conv2(x)
• V = conv3(x)
Attention

Non-local Neural Networks, CVPR 2018

Attention is all you need
Multihead Attention
Multihead Attention
concatenates multiple
attentions per query with
different weights.
• Q1 = conv1_1(x)
• K1 = conv1_2(x)
• V1 = conv1_3(x)
• Q2 = conv2_1(x)
• K2 = conv2_2(x)
• V2 = conv2_3(x)
Self-attention GAN

Self-Attention Generative Adversarial Networks, ICML 2019

Self-attention GAN

Self-Attention Generative Adversarial Networks, ICML 2019

Self-attention GAN

Visualization of attention maps. These images were generated by SAGAN. We visualize

the attention maps of the last generator layer that used attention, since this layer is the
closest to the output pixels and is the most straightforward to project into pixel space and
interpret.

Self-Attention Generative Adversarial Networks, ICML 2019

Self-attention GAN

Self-Attention Generative Adversarial Networks, ICML 2019

Self-attention
• Pairwise self-attention
– generalizes standard dot-product attention
• Patchwise self-attention
– strictly more powerful than convolution

Exploring Self-attention for Image Recognition, CVPR 2020

Self-attention

Exploring Self-attention for Image Recognition, CVPR 2020

Self-attention

Exploring Self-attention for Image Recognition, CVPR 2020

Attention vs. Convolution
• Convolution has a local • Enable long range
receptive field and therefore dependencies
long range dependencies can
only be processed after
• Computationally expensive
passing through several
convolutional layers
• Attention can be combined
complementary to convolution
• Computationally efficient
layer
Vision Transformer
• Completely abandon convolution?
• Extend transformer to an image, each pixel is a
word?
• Each patch is a word?

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE

RECOGNITION AT SCALE, ICLR 2021
Vision Transformer
• Completely abandon convolution?
– Limited receptive field
– Translation invariance

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE

RECOGNITION AT SCALE, ICLR 2021
Vision Transformer
• Completely abandon convolution?

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE

RECOGNITION AT SCALE, ICLR 2021
Vision Transformer
• Instead of local attention over pixels, do global
attention over patches.

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE

RECOGNITION AT SCALE, ICLR 2021
Positional Encoding
• Ways of encoding spatial information using
positional embedding
– 1-dimensional positional embedding: Considering the
inputs as a sequence of patches in the raster order
– 2-dimensional positional embedding: Considering the
inputs as a grid of patches in two dimensions.
– Learned positional embedding
TransGAN

TransGAN: Two Pure Transformers Can Make One Strong

GAN, and That Can Scale Up, arXiv 2021
TransGAN

TransGAN: Two Pure Transformers Can Make One Strong

GAN, and That Can Scale Up, arXiv 2021
Grid Attention

Grid Self-Attention across different transformer stages. We replace

Standard Self-Attention with Grid Self-Attention when the resolution is
higher than 32 × 32 and the grid size is set to be 16 × 16 by default

TransGAN: Two Pure Transformers Can Make One Strong

GAN, and That Can Scale Up, arXiv 2021
Swin Transformer

Swin Transformer: Hierarchical Vision Transformer using

Shifted Windows, arxiv 2021
Transformers data hungry
• Need large scale of datasets for pretraining
• Data Augmentation is Crucial for TransGAN

TransGAN: Two Pure Transformers Can Make One Strong

GAN, and That Can Scale Up, arXiv 2021
Augmentation for GAN
• 10^5 − 10^6 images required to train a modern
high-quality, high-resolution GAN
• The key problem with small datasets is that the
discriminator overfits to the training examples; its
feedback to the generator becomes meaningless
and training starts to diverge.
• In almost all areas of deep learning, dataset
augmentation is the standard solution against
overfitting.

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Augmentation for GAN

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Augmentation for GAN
• In contrast, a GAN trained under similar dataset
augmentations learns to generate the augmented
distribution
• such “leaking” of augmentations to the generated
samples is highly undesirable

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Designing augmentations that do not leak
• Discriminator augmentation corresponds to putting
distorting goggles on the discriminator, and asking
the generator to produce samples that cannot be
distinguished from the training set when viewed
through the goggles.

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Augmentation for GAN
• Evaluate the discriminator only
using augmented images, and do
this also when training the generator
• The augmentations need to be
differentiable. This is achieved this
by implementing them using
standard differentiable primitives
offered by the deep learning
framework

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Augmentation for GAN

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Results

Training Generative Adversarial Networks with Limited Data, Neurips 2020

Results

Training Generative Adversarial Networks with Limited Data, Neurips 2020

LangChain Programming For Beginners
No ratings yet
LangChain Programming For Beginners
154 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Intro To Intelligent Apps Workshop
100% (1)
Intro To Intelligent Apps Workshop
106 pages
RAG and AI Agents Simplified
No ratings yet
RAG and AI Agents Simplified
14 pages
Measurement of Conductance and Kohlrauch's Law
No ratings yet
Measurement of Conductance and Kohlrauch's Law
23 pages
Chatgpt Prompt Engineering
50% (2)
Chatgpt Prompt Engineering
12 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
LLM Agents - Prompt Engineering Guide
No ratings yet
LLM Agents - Prompt Engineering Guide
16 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
27 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
02-Intro To Gen Ai and Prompting
No ratings yet
02-Intro To Gen Ai and Prompting
6 pages
Anthropic
No ratings yet
Anthropic
21 pages
Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
70% (10)
Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
8 pages
GenAI Pinnacle Roadmap
100% (1)
GenAI Pinnacle Roadmap
8 pages
Building Your Own Autonomous LLM Agents - LinkedIn
No ratings yet
Building Your Own Autonomous LLM Agents - LinkedIn
33 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Amazon Braket: Developer Guide
No ratings yet
Amazon Braket: Developer Guide
54 pages
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
No ratings yet
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
12 pages
Everything You Need To Know About Small Language Models (SLM) and Its Applications
No ratings yet
Everything You Need To Know About Small Language Models (SLM) and Its Applications
3 pages
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
No ratings yet
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
16 pages
Maximize The Business Value of Generative Ai
No ratings yet
Maximize The Business Value of Generative Ai
19 pages
LLMs For Me - Introduction LLMs & Generative Text
No ratings yet
LLMs For Me - Introduction LLMs & Generative Text
38 pages
AI For Kids Assignment 45
No ratings yet
AI For Kids Assignment 45
4 pages
Teachers' Perceptions of The Use of Artificial Intelligence in The Classroom
100% (1)
Teachers' Perceptions of The Use of Artificial Intelligence in The Classroom
11 pages
Lesson 01 Getting Started With GenAI
No ratings yet
Lesson 01 Getting Started With GenAI
48 pages
Generative AI - 48 Hours TOC
100% (1)
Generative AI - 48 Hours TOC
4 pages
Neo4j - GraphRAG - 2024
100% (1)
Neo4j - GraphRAG - 2024
23 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
115 pages
Large Language ModelBrained GUI Agents
No ratings yet
Large Language ModelBrained GUI Agents
78 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Transformers
No ratings yet
Transformers
21 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
AI Agent Index
No ratings yet
AI Agent Index
15 pages
Prompt Engineering
No ratings yet
Prompt Engineering
1 page
Intro Gen AI 6p
100% (1)
Intro Gen AI 6p
6 pages
Generative AI Engineer Assignment
No ratings yet
Generative AI Engineer Assignment
3 pages
Using Transformers For Computer Vision - by Cameron R. Wolfe, Ph.D. - Towards Data Science
No ratings yet
Using Transformers For Computer Vision - by Cameron R. Wolfe, Ph.D. - Towards Data Science
2 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
State of AI Report - 2024 ONLINE
No ratings yet
State of AI Report - 2024 ONLINE
213 pages
Platform Low Code Playbook
No ratings yet
Platform Low Code Playbook
18 pages
LLMs Agents Guide
No ratings yet
LLMs Agents Guide
11 pages
Day 1
No ratings yet
Day 1
32 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Sap Pi Adapters Faq
100% (3)
Sap Pi Adapters Faq
16 pages
Manus
No ratings yet
Manus
9 pages
Last Minute Notes
No ratings yet
Last Minute Notes
2 pages
Strategies For Critical Reading
No ratings yet
Strategies For Critical Reading
22 pages
Company SNP (Eng) - Color - 1-6-61
No ratings yet
Company SNP (Eng) - Color - 1-6-61
95 pages
Ai Potential 8 Steps To Success
No ratings yet
Ai Potential 8 Steps To Success
17 pages
Hugging Face Transformers
No ratings yet
Hugging Face Transformers
8 pages
Agenti Ai Comparison
No ratings yet
Agenti Ai Comparison
2 pages
Lecture Generative AI and Whole Cell Modeling
No ratings yet
Lecture Generative AI and Whole Cell Modeling
50 pages
Strength of Materials Math Worksheet: Answers
No ratings yet
Strength of Materials Math Worksheet: Answers
2 pages
Lecture 8 Â Autobiographical Memory
No ratings yet
Lecture 8 Â Autobiographical Memory
31 pages
Subjectivity Objectivity and Frames of R PDF
No ratings yet
Subjectivity Objectivity and Frames of R PDF
49 pages
2024 05 24 595730v1 Full
No ratings yet
2024 05 24 595730v1 Full
22 pages
Project Report On Conflict Management
No ratings yet
Project Report On Conflict Management
57 pages
Discourse On The Origin of Inequality
No ratings yet
Discourse On The Origin of Inequality
7 pages
Relativistic Electrodynamics PDF
No ratings yet
Relativistic Electrodynamics PDF
10 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
No ratings yet
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
11 pages
Introduction To Cisco PIX and ASA
No ratings yet
Introduction To Cisco PIX and ASA
35 pages
Rotational Motion - Torque and Center of Gravity
No ratings yet
Rotational Motion - Torque and Center of Gravity
39 pages
HUM 112-Sections 11 - Film Report - Emir Bülbül-22103967
No ratings yet
HUM 112-Sections 11 - Film Report - Emir Bülbül-22103967
3 pages
Cambridge Ext2 Ch1 Complex Numbers IWEB
No ratings yet
Cambridge Ext2 Ch1 Complex Numbers IWEB
62 pages
Lecture 5
No ratings yet
Lecture 5
2 pages
Enterprise AI Strategy - AI Agents Vs AI Models - by Mahmudur R Manna - in AI Advances - Freedium
No ratings yet
Enterprise AI Strategy - AI Agents Vs AI Models - by Mahmudur R Manna - in AI Advances - Freedium
8 pages
LLaVA - Large Multimodal Model
No ratings yet
LLaVA - Large Multimodal Model
15 pages
Lab Exp 1 2
No ratings yet
Lab Exp 1 2
26 pages
Gr09 Maths Term2 Pack01 Practice Paper Memo
No ratings yet
Gr09 Maths Term2 Pack01 Practice Paper Memo
5 pages
Science
No ratings yet
Science
4 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Audio Amplifier Applications Low Noise Audio Amplifier Applications
No ratings yet
Audio Amplifier Applications Low Noise Audio Amplifier Applications
5 pages
ARTICLE - Is Agentic RAG Worth The Investment? Agentic RAG Pricing and ROI Breakdown
No ratings yet
ARTICLE - Is Agentic RAG Worth The Investment? Agentic RAG Pricing and ROI Breakdown
1 page
Arholwr Yn Unig: Examiner Only
No ratings yet
Arholwr Yn Unig: Examiner Only
4 pages
Generative AI
No ratings yet
Generative AI
2 pages
Soal Pts GASAL 2023
No ratings yet
Soal Pts GASAL 2023
1 page
Getting Started With Swiper
No ratings yet
Getting Started With Swiper
4 pages
Foundations GPT
No ratings yet
Foundations GPT
8 pages
Bose
No ratings yet
Bose
9 pages
Switch On - Worksheets - 3 - Amer 6
No ratings yet
Switch On - Worksheets - 3 - Amer 6
1 page
Motion and Its Types - What Is Motion - Types of Motion PPT 2
No ratings yet
Motion and Its Types - What Is Motion - Types of Motion PPT 2
1 page
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Ielts
No ratings yet
Ielts
2 pages
Musical Elements Table
No ratings yet
Musical Elements Table
3 pages
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
No ratings yet
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
10 pages
Benefits of Ai
No ratings yet
Benefits of Ai
3 pages
Preheat Calculation 2 PDF
No ratings yet
Preheat Calculation 2 PDF
3 pages
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
No ratings yet
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
2 pages
5.0SMLJ24A Datasheet
No ratings yet
5.0SMLJ24A Datasheet
5 pages
SAQ Ans 24
No ratings yet
SAQ Ans 24
2 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet