Computers 2024 25
Computers 2024 25
2024 25
2024-2
MS J
G E N E R AT I V E
AI
GenerativeAI
Generative AIrefers
referstotodeep-learning
deep-learningmodels
modelsthatthat
cantake
can takeraw
rawdata
data— —say,
say,all
allof
ofWikipedia
Wikipediaor orthe
the
collectedworks
collected worksof ofRembrandt
Rembrandt— —and
and“learn”
“learn”toto
generatestatistically
generate statisticallyprobable
probableoutputs
outputswhen
when
prompted.At
prompted. Ataahigh
highlevel,
level,generative
generativemodels
models
encodeaasimplified
encode simplifiedrepresentation
representationof oftheir
theirtraining
training
dataand
data anddraw
drawfromfromititto
tocreate
createaanew
newworkworkthat’s
that’s
similar,but
similar, butnot
notidentical,
identical,totothe
theoriginal
originaldata.
data.
TO BETTER
U N D E R S TA N D
GEN AI THERE
ARE NEW
CONCEPTS WE
SHOULD LEARN
NNEEUURRAALL NNEETTW
WOORRKK
LOTSA MATH
The more
The more layers
layers you
you have,
have, the
the
deeper you
deeper you gogo –– this
this isis what
what
we call
we call deep
deep learning
learning
INSIDE DEEP
LEARNING WE
H AV E
F O U N D AT I O N
MODELS – WHICH
IS THE BASE OF
G E N E R AT I V E A I
F O U N D AT I O N M O D E L S
• Variational Auto-Encoders (VAEs)
• Multimodal models
A D D T O T H AT A
THING CALLED
TRANSFORMER
ARCHITECTURE
W H AT ’ S T H AT ?
• "The cat chased the mouse."
Both sentences use the same words, but the meaning changes because the order of
the words is different.
A Transformer is a type of computer model that is really good at figuring out
how words in a sentence are connected, even if they are far apart from each other.
In the Transformer, the text is converted to numerical representations called
tokens
AAtoken
tokenisislike
likeaabuilding
buildingblock
blockof oflanguage.
language.When
Whenwewebreak
break
downaasentence
down sentencefor
foraacomputer
computertotounderstand,
understand,we
wesplit
splitititinto
into
smallerpieces,
smaller pieces,and
andeach
eachpiece
pieceisisaa"token."
"token."Tokens
Tokenscan
canbe:
be:
~~words
words
~~parts
partsof
ofwords
words
~~punctuation
punctuationmarks
marks
~~sometimes
sometimesevenevenwhole
wholesentences
sentences
AND THEN
The Transformer takes in the tokens, looks at them all at once, and
figures out how they relate to each other. By doing this for tons of text,
the Transformer can understand patterns in language, like grammar or
meaning.
The power of Transformers comes from their ability to look at all the
elements in a sequence at the same time and figure out how they relate.
This is called the self-attention mechanism.
The most
well-known public use
of Foundation Models
is this little LLM called
Generative Pre-trained
Transformer
GPT
T H E G E N E R AT I V E P R E - T R A I N E D
TRANSFORMER
GPT 3 GPT 4 (current)
• 175 billion parameters • Anywhere between 500 billion to
for Wikipedia but 0.44 for • 2 epochs for text-based data and
Common Crawl corpus 4 epochs for code-based data
• Also allows for image processes
Another popular use
of foundation models
with transformer
architecture are image
generating tools
I M A G E G E N E R AT I O N T O O L S
• The data is comprised of not only words, but also images. The model
learns to understand the connection between words and the visual
features of an image.
• This is where the Transformer part really shines. The Transformer
can understand complex, multi-step relationships between parts of a
text prompt. For example, if the prompt says "A cat with a red hat
sitting on a skateboard," the model knows how to combine the
different elements (cat, red hat, skateboard) into one cohesive image.
I M A G E G E N E R AT I O N T O O L S
• Music
• Film