0% found this document useful (0 votes)

8 views124 pages

Lesson 14 - Transformer

The document discusses transformer-based models in natural language processing, highlighting their architecture, including key components like positional encoding and multi-head attention. It also covers the evolution of large language models, the concept of transfer learning, and specific pretrained models such as BERT and GPT. The advantages of transformers over RNNs, including parallel processing and handling long-range dependencies, are emphasized, along with the challenges of quadratic computation in self-attention.

Uploaded by

Pim Pat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views124 pages

Lesson 14 - Transformer

Uploaded by

Pim Pat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 124

Transformer-based Models

NLP II 2025
Jakapun Tachaiya (Ph.D.)
Outline
- Transformer
- Transfer Learning
- Pretrained Model
- BERT
- GPT

2
Transformer

3
Evolution of Large Language Models

https://arxiv.org/html/2402.06853v1 4
No RNN, CNN

5
Example Task to train transformer
● Translation
● Dialogue completer

6
Way Smarter than RNN for Language Model!

7
Transformer Key Ideas
● Core Idea: Processes input sequences in
parallel using attention mechanisms, bypassing
the sequential limitations of RNNs.
○ Recurrence: not parallelizable, long “path
lengths”
○ Attention: Parallelizable, short path
lengths.
● Core Architecture:
○ Positional encoding
○ Multi-head attention and self attention
○ Decoder’s masked attention

8
Transformer
Core Architecture:
● Positional encoding
● Multi-head attention and self attention
● Decoder’s masked attention

https://jalammar.github.io/illustrated-transformer/ 9
How does transformer works?

Encoder
What is English?
What is context?

Decoder
How to map English
word to French?

10
Transformer
Core Architecture:
● Positional encoding
● Multi-head attention and self attention
● Decoder’s masked attention

https://jalammar.github.io/illustrated-transformer/ 11
Positional Encoding
LSTM - Read word by word. Know the position of each word.

12
https://www.youtube.com/watch?v=dichIcUZfOw
Positional Encoding
LSTM - Read word by word. Know the position of each word.

13
https://www.youtube.com/watch?v=dichIcUZfOw
Positional Encoding
Transformer - Read all word embedding. All at once ~ (512, 768,..., tokens)
● Lose information position of each word

14
Why position matter?

Change meaning and sentiment of sentence.

15
Absolute Position Embedding

Po
sit
io
n
of
to
ke
n
0

16
Intuition behind position formula
Just Sin function

17
Intuition behind position formula

With diﬀerent frequency of I

18
Intuition behind position formula
Same value at i =4 but diﬀerent at i = 2

19
Input embedding with absolute position embedding
Encodes the position of each token in
a sequence into ﬁxed embeddings
added to input word embeddings.

● Use sinusoidal functions to

represent positions.

● Result in a word embedding with

position information.

20
Transformer
Core Architecture:
● Positional encoding
● Multi-head attention and self attention
● Decoder’s masked attention

https://jalammar.github.io/illustrated-transformer/ 21
Simple/Cross Attention VS Self-attention

Query

Input sentence

22
Multi-head attention

Value Key Query

23
Intuition behind K, Q, V (info retrieval)

24
Intuition behind K, Q, V (info retrieval)

25
Multi-head attention is Scaled dot-product attention

(Multiplicative attention)

Value Key Query

Query Key Value

26
Linear layer with NO activation function (Relu)
1. Mapping inputs onto outputs
2. Changing vector dimensions

Value Key Query

27
Multi-head attention (K, Q, V)

28
K, Q, V attention

29
K, Q, V attention

When You Play The Game Of Throne

When
You
Play
The
Game QT
Of
Throne

K
KQT

30
Attention ﬁlter

Initial random weights >> less meaning After trained weights >> Capture self-attention

31
Attention ﬁlter

32
Multi-head attention

33
Intuition on Multi-head attention

Then Concatenate all layer and pass to linear layer to reduce a size

Each multi-head attention layer focus on specific properties

34
Intuition on Multi-head attention

35
Multi-head attention

36
Multi-Head Cross-Attention
Enables the decoder to selectively focus on speciﬁc parts
of the encoder's output.
● Query (Q): Derived from the decoder's current hidden
state.
● Key (K) and Value (V): Derived from the encoder's
output.
K V Q

37
Multi-Head Cross-Attention
Enables the decoder to selectively focus on speciﬁc parts
of the encoder's output.
● Query (Q): Derived from the decoder's current hidden
state.
● Key (K) and Value (V): Derived from the encoder's
output.
K V Q

38
Self Attention VS Cross Attention
Attention Terminology
● K, Q, V attention
● Multi-head
● Self-attention
○ Encoder
○ Decoder
● Cross-attention

40
Transformer
Core Architecture:
● Positional encoding
● Multi-head attention and self attention
● Decoder’s masked attention

https://jalammar.github.io/illustrated-transformer/ 41
Masked Self-Attention
● Similar to language model, we mask to prevent the model see the output before prediction

Predict one by one word

● Future word is masked

42
Masked Self-Attention
Key idea: Masking Out the Future

● Use a “mask” to block out certain

attention scores.
● We mask to prevent the model see the
output before prediction

On the left:

● Tokens in the rows (as queries) can not

pay attention to the tokens in the
columns (values) that are shaded in

43
44
Masked attention

45
Residual connection

1. Knowledge preservation
2. Vanishing Gradient

Help preserve position aware embedding

46
ADD & NORM

47
ADD & NORM
Layer normalization
● Shift mean to 0 and var to 1
● Standardize along the feature axis

48
Transformer Architecture Summary
Main building block: attention!

● Encoder: self-attention
● Decoder: masked self-attention
● Decoder-encoder: cross-attention

Position encodings/embeddings to inject

information about sequence order

Residual connections + LayerNorm around every

component

49
50
Why Transformer is better than RNN?
1. Self-Attention:
a. Capture dependencies between words in a sentence without being restricted
by their distance from each other.
2. Parallel Processing:
a. Unlike RNNs, which process data sequentially, transformers can process the
entire input sequence in parallel.
3. Handling Long-Range Dependencies:
a. RNNs struggle with long-range dependencies due to the vanishing gradient
problem.
b. Transformers can remember and maintain performance over longer sequences.

51
Scaling Laws: Are Transformers All We Need?
● With Transformers, language modeling performance improves as we increase model size, training
data, and compute resources.
● This power-law relationship has been observed over multiple orders of magnitude with no sign of
slowing!

52
Transformer Drawback!
Quadratic compute in self-attention O(n2)
● Computing all pairs of
interactions/attentions means our
computation grows quadratically with
the sequence length!
● For recurrent models, it only grew
linearly!
● Prevents scaling to long sequences.
One big area of research: linear attention
mechanisms.
● Random attention
● Window attention
● Linear attention
● Flash attention
● Lightning attention
53
Transfer Learning Concept
Another classiﬁcation task
● Can you guess whether it is a land animal or a water animal?
○ Have you ever seen this creature before?
■ You can transfer your knowledge from the past

55
https://slds-lmu.github.io/seminar_nlp_ss20/introduction-transfer-learning-for-nlp.html56
Transfer Learning
Myth: you can’t do deep learning unless you have a million labelled examples for your problem.

Reality:

● You can transfer learned representations from a related task.

a. Transfer "Pretrained weight" in models
● You can train on a nearby surrogate objective, for which it is easy to generate labels.

57
Transfer learning: idea

300-1,000 training samples

58
59
Transfer learning: 3 beneﬁts

60
Model Alignment for Transfer Learning
● Source model is the single most important variable.
● Keep source model and target model well-aligned (close to each other) when possible.
● Source vocabulary should be aligned with target vocabulary (similar domain).
● Source task should be aligned with target task (similar task).

For example:

● Good: product review sentiment → product review categorization

● Good: hotel rating → restaurant rating
● Less good: product review sentiment → biology paper classiﬁcation

61
What is the most common
Transfer Learning Model on NLP?

62
Pre-trained
Language Model
● Learning to model the distribution of
natural language.
● Predicting the next word in a sequence
given context.
● A Base model for specific tasks.
● No need for labeled data (unsupervised
data)

63
The Pretraining / Fine-tuning Paradigm
Pretraining can improve NLP applications by serving as parameter initialization.

64
Pre-trained Models
Three architectures for large language models

66
1. Encoder Only (Autoencoder)
Model Type: Masked Language Models (MLMs) -
Trained by predicting words from surrounding
words on both sides.

Model Name: BERT family

Tasks:

1. Sequence classiﬁcation
2. Token classiﬁcation

67
2. Encoder - Decoder
Model Type: Original Transformer of seq2seq task.

Model Name: BART, T5, FLAN T5, Whisper

Tasks:

1. Machine translation
2. Speech Recognition

68
3. Decoder Only (Auto-Regressive)
Model Type: Causal LLMs/Autoregressive
LLMs/Left-to-right LLMs - predict words left to
right.

Model Name: GPT, LLAMA, Claude, Mistral

Tasks:

1. Text Generation
2. Predicting next word

69
BERT

70
BERT - Bidirectional Encoder Representation from Transformers

71
BERT Ideas
1. Masked Language Model
○ ﬁll-in-the-blank
2. Bidirectional encoder
○ See the future tokens - more
information to infer masked tokens
○ Can’t do language modeling!

72
BERT

2 direction 1 direction Bi-LMs

73
BERT VS GPT

Transformer

74
BERT VS Transformer

75
BERT - 2 Phases Training

76
Phase1: Unsupervised Masked LM Training
15% of the tokens are randomly chosen to
be part of the masking

Three possibilities:
1. 80%: Token is replaced with special token [MASK]
● Lunch was delicious -> Lunch was [MASK]

2. 10%: Token is replaced with a random token.

● Lunch was delicious -> Lunch was gasp

3. 10%: Token is unchanged.

● Lunch was delicious -> Lunch was delicious

77
Phase1: Next Sentence Prediction

[CLS], [SEP]: special tokens

Class 0: is not next sentence.
Class 1: is next sentence0

78
Input Representation

● [CLS], [SEP]: special tokens

● Segment: is this a token from sentence A or B?
● Position embeddings: provide position in sequence (learned in this case, not ﬁxed)
79
BERT Input Embedding

Hidden Size/model Width of 768

80
BERT Input Embedding

81
Training Details
● BooksCorpus (800M words) + Wikipedia (2.5B)
● Masking the input text. 15% of all tokens are chosen.
○ 80% of the time: replaced by designated ‘[MASK]’ token
○ 10% of the time: replaced by random token
○ 10% of the time: unchanged
● Loss is cross-entropy of the prediction at the masked positions.
● Max seq length: 128 tokens for ﬁrst 90%, 512 tokens for ﬁnal 10%
● 1M training steps, batch size 256 = 4 days on 4 or 16 TPUs

82
Fine-tuning BERT Use case
● Sentence/Sentence pair classiﬁcation
○ E.g. spam detection, sentiment analysis, Natural Language Inference

83
Fine-tuning BERT Use case
● Sequence Labeling
○ Tokenization, POS, NER

84
Contextual Embeddings to represent words

85
BERT as a Contextual Representation
Word sense disambiguation - The task of selecting the correct sense for a word

86
Model width

87
BERT is a stacked of encoders

88
Pretrained BERT (Hugging face)

89
Vision transformer
● Can’t feed pixel value directly to transformer because O(n2) attention
○ Use patch of image instead

90
91
Sidenote on Input Token
Why does BERT split input token this way?

92
Subword is the way!
We assume a ﬁxed vocab of tens of thousands of words, built from the training set. All novel
words seen at test time are mapped to a single UNK.
● Combat with Misspelling, new unknown word issue

93
Level of Token

94
Tokens represent Words

95
96
Check if a small frequency of tokens still makes sense
97
98
Tokenizer (subwords) for Transformers

99
WordPiece Tokenization
Similar to BPE and uses frequency occurrences to identify potential merges but makes
the ﬁnal decision based on the likelihood of the merged token.

100
SentencePiece Tokenization

Problem with wordpiece Tokenization: Don’t know how to put it back.

101
SentencePiece Tokenization
Simply treating the input text as a sequence of Unicode characters, including whitespace.

102
Byte-Pair Encoding (BPE) Tokenizer

Byte-pair encoding is a simple

merging strategy for deﬁning a
subword vocabulary.

1. Start with a vocabulary containing only

characters and an “end-of-word” symbol.
2. Using a corpus of text, ﬁnd the most
common adjacent characters “a,b”; add
“ab” as a subword.
3. Replace instances of the character pair
with the new subword; repeat until desired
vocab size.

103
Pre-trained Encoder Decoder

104
Pretraining encoder-decoders
For encoder-decoders, we could do something like language
modeling, but where a preﬁx of every input is provided to the
encoder and is not predicted.

● Model - FLAN T5, T5, BART

The encoder portion can beneﬁt from bidirectional context; the

decoder portion is used to train the whole model through
language modeling, autoregressively predicting and then
conditioning on one token at a time.

105
Pretraining encoder-decoders
1. Higher Computational Cost: Both an encoder and a
decoder are required, leading to increased memory and
computation requirements compared to simpler models like
decoder-only architectures.
2. Slower Inference : The encoder processes the entire input
sequence before the decoder starts generating the output,
resulting in a two-step process that slows down inference
compared to models that perform generation directly (e.g.,
decoder-only models).
3. Limited Suitability for Certain Tasks: These models are
better suited for sequence-to-sequence tasks (e.g.,
translation, summarization) but are less eﬃcient for
general-purpose tasks like text generation, where
decoder-only models excel.

106
Decoder only Pretrained Model

107
Decoder Only Pretrained Model as LLM
● Generating text conditioned on previous text

108
GPT - Generative Pre-Training (OpenAI)

109
GPT
● Uses Transformer decoder instead of encoder
● “Self”-attention: masked so that only can
attend to previous tokens.
● Predict the next token in a sequence
○ Causal language modeling

● Pure LM training objective

○ Can be used for text generation
● GPT: same params as BERT-BASE; GPT2
much bigger; GPT3 much bigger (175B
params)
110
Causal Language Model

https://jalammar.github.io/how-gpt3-works-visualizations-animations/111
GPT Training

112
OpenAI GPT 1 (Generative Pre-Training)
Multitask learning

113
GPT - Formatting Inputs for Fine-tuning Tasks

114
Data format for SFT
● Convert existing Annotated NLP datasets to instruction-following format to continue
training on LLM.
○ Supervised ﬁne-tuning (SFT), Instruction ﬁne-tuning

115
Multi-column dataset
● Conventional Classification Dataset
● merge multiple columns into 1 large prompt for fine-tuning to actually function.

116
https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama
Multi-column dataset
● Now LLM can perform a classiﬁcation! (from causal model)

117
Evolution of GPT

https://www.kdnuggets.com/2023/05/deep-dive-gpt-models.html118
Scaling Laws
LLM performance depends on

● Model size: the number of parameters, not counting embeddings

● Dataset size: the amount of training data
● Compute: Amount of compute (in FLOPS)

Can improve a model by adding parameters (more layers, wider contexts), more data, or
training for more iterations The performance of a large language model (the loss) scales as a
power-law with each of these three.

119
Scaling Laws
● Empirical observation: scaling up models leads to reliable gains in perplexity

120
GPT Scale

Depth L

#head

121
Width d
Chat-GPT

ChatGPT GPT Reinforcement

learning
Language modeling Transformer

https://openai.com/blog/chatgpt/ 122
How does ChatGPT diﬀerent from GPT model?
● ChatGPT is optimized for dialogue and conversation.

● Trained with a frequent set of prompts instructions

Ouyang, Long et al. “Training language models to follow instructions

with human feedback.” ArXiv abs/2203.02155 (2022) 123
Transformer-based Models Summary
● Transformers: A revolutionary deep learning architecture using self-attention. It excels at capturing complex
relationships in sequential data like text and allows for parallel processing, making it efficient for large datasets.
● Transfer Learning: The standard training approach involves pre-training on vast amounts of text to learn
general language patterns, followed by fine-tuning on specific tasks. This leverages learned knowledge
effectively.
● Pre-trained Models: These models (e.g., BERT, GPT) serve as powerful, ready-to-use foundations built upon
the Transformer architecture and transfer learning.
○ BERT: Primarily an encoder-based model, strong for understanding tasks (NLU) like classification and
question answering due to its bidirectional context.
○ GPT: Primarily a decoder-based model, strong for generation tasks (NLG) like text completion and
creative writing due to its autoregressive nature.
● Impact: Transformer-based models represent a significant leap in AI, dramatically improving performance on a
wide range of Natural Language Processing tasks and enabling many modern language technologies.

RADL TTho
No ratings yet
RADL TTho
64 pages
01 The Transformer
No ratings yet
01 The Transformer
64 pages
The Illustrated Transformer – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
No ratings yet
The Illustrated Transformer – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
5 pages
AN2DL_06_2324_AttentionAndTrasformers
No ratings yet
AN2DL_06_2324_AttentionAndTrasformers
60 pages
ML for NLP-LO3
No ratings yet
ML for NLP-LO3
61 pages
Uppwise Standard PPT 2
No ratings yet
Uppwise Standard PPT 2
13 pages
Transformer
No ratings yet
Transformer
10 pages
ece265p-fahmy-day7
No ratings yet
ece265p-fahmy-day7
93 pages
DAA FinalReport
No ratings yet
DAA FinalReport
14 pages
10 Attention N Bert
No ratings yet
10 Attention N Bert
55 pages
Transformers in Machine Learning _ GeeksforGeeks
No ratings yet
Transformers in Machine Learning _ GeeksforGeeks
9 pages
Bert
No ratings yet
Bert
60 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
58 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
Transformer Architecture explained in LLMs
No ratings yet
Transformer Architecture explained in LLMs
2 pages
Transformer
No ratings yet
Transformer
55 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
14.Chapter10_AdvancedDeepLearningForText
No ratings yet
14.Chapter10_AdvancedDeepLearningForText
22 pages
Transformers
No ratings yet
Transformers
15 pages
ตารางเวร-759460-17361885387800 copy
No ratings yet
ตารางเวร-759460-17361885387800 copy
1 page
L.7
No ratings yet
L.7
54 pages
Transformer networks
No ratings yet
Transformer networks
53 pages
AE556_2024_Topic7_Transformer
No ratings yet
AE556_2024_Topic7_Transformer
49 pages
lecture15_transformer
No ratings yet
lecture15_transformer
26 pages
Generative AI
No ratings yet
Generative AI
54 pages
Transformer
No ratings yet
Transformer
31 pages
2407.18921v1
No ratings yet
2407.18921v1
37 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
No ratings yet
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
13 pages
Transformer
No ratings yet
Transformer
59 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
Attention_is_All_You_Need__Explained
No ratings yet
Attention_is_All_You_Need__Explained
46 pages
Transformers
No ratings yet
Transformers
10 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers
No ratings yet
BEVFormer Learning Birds-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers
17 pages
Tianzheng Troy Wang CIS498EAS499 Submission
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
51 pages
LLM
No ratings yet
LLM
41 pages
What Is A Transformer
No ratings yet
What Is A Transformer
11 pages
Week 12
100% (1)
Week 12
64 pages
Transformers
No ratings yet
Transformers
27 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
2024_Transformer_master
No ratings yet
2024_Transformer_master
50 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
The Annotated Transformer: Alexander M. Rush
No ratings yet
The Annotated Transformer: Alexander M. Rush
9 pages
DR 68 V 7 BT 98 Ny 9 M
No ratings yet
DR 68 V 7 BT 98 Ny 9 M
23 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
generative AI Unit 3 notes
No ratings yet
generative AI Unit 3 notes
8 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
22 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Attention is all you need
No ratings yet
Attention is all you need
15 pages
Lesson 12 - Sequence Tagger
No ratings yet
Lesson 12 - Sequence Tagger
34 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
2_Truth_and_Inference_324051_16474825152977-324051-17110007534489
No ratings yet
2_Truth_and_Inference_324051_16474825152977-324051-17110007534489
8 pages
ChulaMarket_DataDictionary (2)
No ratings yet
ChulaMarket_DataDictionary (2)
8 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
IntellBot_Retrieval_Augmented_LLM_Chatbot_for_Cybe (1)
No ratings yet
IntellBot_Retrieval_Augmented_LLM_Chatbot_for_Cybe (1)
22 pages
Transformer
No ratings yet
Transformer
5 pages
Para Organization 2566
No ratings yet
Para Organization 2566
3 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Insights into DeepSeek-V3 - Scaling Challenges and Reflections on
No ratings yet
Insights into DeepSeek-V3 - Scaling Challenges and Reflections on
14 pages
2.1-5
No ratings yet
2.1-5
5 pages
Transformer
No ratings yet
Transformer
41 pages
Artificial Intelligence in Finance 2025
No ratings yet
Artificial Intelligence in Finance 2025
47 pages
1-s2.0-S0020025524005188-main
No ratings yet
1-s2.0-S0020025524005188-main
16 pages
How is Generative AI Transforming Supply Chain Operations and Efficiency
No ratings yet
How is Generative AI Transforming Supply Chain Operations and Efficiency
97 pages
ChulaMarket DataDictionary Vertical
No ratings yet
ChulaMarket DataDictionary Vertical
1 page
Shops
No ratings yet
Shops
1 page
2403.19708v2
No ratings yet
2403.19708v2
17 pages
DSML Brochure
No ratings yet
DSML Brochure
38 pages
21-24
No ratings yet
21-24
4 pages
IoT Analytics. Generative AI Trends Report year 2023
No ratings yet
IoT Analytics. Generative AI Trends Report year 2023
36 pages
Types of Nouns Worksheet
No ratings yet
Types of Nouns Worksheet
2 pages
2020_AdapterDrop - On the Efficiency of Adapters in Transformers_Rücklé et al_
No ratings yet
2020_AdapterDrop - On the Efficiency of Adapters in Transformers_Rücklé et al_
17 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
[FREE PDF sample] Building Generative AI Powered Apps A Hands on Guide for Developers 1st Edition Kansal ebooks
100% (3)
[FREE PDF sample] Building Generative AI Powered Apps A Hands on Guide for Developers 1st Edition Kansal ebooks
52 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
TheImpactofGenerativeAIonBusinessConsultingsuprit[1]
No ratings yet
TheImpactofGenerativeAIonBusinessConsultingsuprit[1]
12 pages
lightning attention 1
No ratings yet
lightning attention 1
19 pages
ji-yang-luo-survey-symbolic-music-generation
No ratings yet
ji-yang-luo-survey-symbolic-music-generation
39 pages
Art Museum
No ratings yet
Art Museum
1 page
towards Controllable Speech Synthesis in the Era of Large Language Models A Survey
No ratings yet
towards Controllable Speech Synthesis in the Era of Large Language Models A Survey
23 pages
Fact or Fake How News Title Sentiment and Writing Style Help AI to Detect COVID-19 Fake News
No ratings yet
Fact or Fake How News Title Sentiment and Writing Style Help AI to Detect COVID-19 Fake News
23 pages
Foundation Models in Robotics: Applications, Challenges, and The Future
No ratings yet
Foundation Models in Robotics: Applications, Challenges, and The Future
33 pages
2501.10465v1
No ratings yet
2501.10465v1
10 pages
CoCoOp
No ratings yet
CoCoOp
11 pages
Learning With AI - Joan Monahan Watson
No ratings yet
Learning With AI - Joan Monahan Watson
232 pages
All you should kno about LLM'S
No ratings yet
All you should kno about LLM'S
10 pages
0000146
No ratings yet
0000146
5 pages
Paper Colossal-AI - A Unified Deep Learning System for Large-Scale Parallel Training
No ratings yet
Paper Colossal-AI - A Unified Deep Learning System for Large-Scale Parallel Training
10 pages
TMAE Updated V2
No ratings yet
TMAE Updated V2
9 pages