Fundamentals of AI - Visual Map (AIF-C01)
Fundamentals of AI - Visual Map (AIF-C01)
Underfittin g
Bias
Caused by
Disparities in t h
e
High not training long enough, Scaling dataset
performance of a mode l
not large enough dataset
across different groups;
assumptions in the model” i i ing dataset nfluences the h of the model selecting lower-
I likeli ood probability outputs.
Accuracy of Models
F
D vers fy
Balanced
Creativity abstractio / n
raud Detection
#1 Challenge of AI model that generalizes well to new data High = more random response -> Hallucinations
T K
Lo
model s sensitivity to
op
Th f most-likely candidates th t th m k .
fluctuations or noise in
t e training dat
high b s)
ia mo p re attern s - “error due to the High P g
and diversity m ts th
Li i e n er o ro a e ord Intelligent Document rocessing ID
Overfitting
High K = more probable words, more diverse and creative
runin
M od el i s overfi ing (m mo
tt e ri ing , not Increase regularization to penali e Batch Siz e
z
on new dat
omp x y
Prevention
generali ing ) c le it How Many Attempts Before Adjusting? / Solo vs Group
Caused by
Regularization
T P
Numb f mpl p m ss o
op
0-1 (%)
o e i o a
Inference arameters
od s uns ble ( oss Redu e le rning r e o smoo Hyperparameters not diverse dataset
C V n
High P = broad range of possible words, possibly more creative and diverse output
M el training i ta l c a at f r ther
omputer isio
jumps, do s e n’t c o v
n )
erge op m z o
ti i ati n
N E s
I nterpret and understand digital
Obj ect detecti on
E
umber of poch
g
nsemblin
How images and videos
M od el training i s oo slow
t Increase batch size or increase Response Length
m s m o
Defines the minimum or maximum number of tokens in the output
I age eg entati n
Regularization
Data augmentatio n
DL Use C ases
Alters data to improve robustness. x ss o
M od upd
el ate s are t oo noisy Redu e b c atch size for more stable Avoiding Bad Habits
Penaltie s
Te t cla ificati n
Stop se quence s
N atural Language P rocessing (N P)
Inference Code
Lang u age generati o n
Algorithms
Software that implements the
Mathematical relationship model, by reading the artifacts.
F s
Output
N w
eature
between output and inputs a
Output
F s
e Dat
C lum s t bl /
T
eature
Specified (Supervised)
o n in a a e
raining Dat a
C lum s t bl /
Unknown data s
Predictions
Known data
o n in a a e
(Unspecified for unsupervised P x ls
i e in an i m age
Model Artifact
P x ls
i e in an i m age
Model P s
arameter learning)
-Trained Parameters, a model
Learned & adjusted iteratively Promp t
definition that describes how to
during the training process to Specific set of inputs (+ enrichments) compute inferences;
Hyp pa am t s
er r e er
find best fit (minimise errors) AI
-other metadata
User-defined to control training process towards expected output. Inference Parameters
Artificial
Us -der d o influence model output
efine t
Intelligence
ML Op en sou rce p -
re traine d mod el
Model Training
produces
us om mod
Model Artifacts
Sources Training c t el
DL
Artificial Intelligence (AI)
r
T ainin 80%
g Machine Learning (ML)
Machine
Hybrid model with fine tuning
train the model
Feature n
Learning Manage d PI
creation f o ne w f tu s f m x st
ea re ro e i ing da a t
Gen AI Deplo yment method (production)
elf h te A
tuning h s
yperparameter
replacing m ss f tu s f tu s th t tv l EC2
h
selecting t l
e best mode
i ing ea re or ea re a are no a id
Neural Networks (NN)
F w k
e tractio
reducing th e a m u t f t t b p c ss
o n o da a o e ro e ed us
ing
ML rame or
Training
Learning Model
Feature Engineering
Image rec o o
gniti n an d analyz videos d o se uen i l
e an f r q t a p o ss d
r ce o ata in ne
to create new input labels (features)
cla ss o
ificati n;
u d s d
n er d su
tan s m -the ata ch a ti e d on
irecti
-
Semi structured
Structured
-
Semi supervised
-
Self supervised
Re l- ime
a t Batch
for grid-like d ata o s ps s s o x
relati n hi erie r te t -p ern re ogni ion d
att c t an
s ru ured d a
t ct at
ERT
o u d s d - s sw
E
VAE
Bidirectional Encoder
Generative Pretrained L Short-Term
c nte t f r in a e en encie acr ti e
video, image, text Reinforced (RL)
P <= 60s
P <= 1 h r
Generate synthetic data u sw
Eval d s Deterministic Variation o qu d
Representations from s ( . .p d
Pattern Discovery rocessing time rocessing time ou ate hether ata i ong
/C n
raud identificatio
Continuously improves
Fast, near-instant
Batch processing
om dom pu .
o k .
Transformers Autoencoders Autoencoders
m ss wo ds)
Logistic regression lassificatio fr ran in t real r fa e
Unlabelled
sf mu l b l
i ing r
Ass s l b l ign a e
g
C
t s ct s b f refining
ran a ion e ore
feedback from previous
Infrequent use reali s m so
tic i wo k
age r art r m so d
i age r etects om an alies an om y d al etecti o n, image sy nthe s s;
Binary Multi-class
or
Decision tre e
group similar data points
lusterin
with known f u c s s. ra d a e
iterations;
Serverless
Asynchronous
(e . ., D pF k m
g ee s).a e i age ind . ata learn a compact representation of
“An agent learns to make
KNN algorithm if-else structure to K-Means algorithm P robabilit y density / data called “latent space”
Us s l u l b l t xt actions (determined by
Low latency (ms-s)
Mid-High latency (near real-time)
tfc t f t ms e arge n a e ed e
f s w th
Iden i i a ion o rare i e ,
utput s continuous v lu N w k
Dimension ev tsen bs v t s
, or o er a ion in a
da a ir , en re ine i
E r
ncode
e i a e a e to maximise cumulative
only pay when endpoint is LL
yers of mathematical Keeping most relevant Multi Modal
la
n
Intermittent, short term,
L L uage
ssociation rule learning
rewards.”
processing requests.
Diffusion
transformation based features
Document classificatio arge ang
A Models r
R c st ucts t f m th l t t sp c
f st, th l b ls s l ct higher initial latency
bet een inputs in a dataset
e on r da a ro e a en a e
x I g
te t and ma e, etc. a s hum an -i lk e x
te F wor ard Diffusio n
Inferencing
in pu t an d ou pu
t t T rained on m ss text
a c upt
orr ing da a t w th i noi s e
at the Edge
data ( t
in erne , t b ks oo , FM onl y A m zoa n B edro kc
Offline capability, local inference
tc Pre-built
Edge device
e
Reverse Diffusion
r sf r r
T an o me s th t m zo S geM ker Jumps r
+ Pre-trained
Accuracy
(Raspberry Pi)
Multimodal E s
mbedding
Architecture
denoi e e da a
FM
A a n a a ta t
How many total predictions (both positives and negatives) are correct.
close to where the Large Language Model ( )
LLM I of multiple
ntegration
non t -de erm
inistic +
Best when classes are balanced and errors are equally costly. data is generated,
Deployed on a remote server, types of data, into a Stable Diffusio n
other ML A m zo S geM ker AI
a n a a Custom build + train
in places with limited Edge device connects via API
unified n
us s uc - f t l t t sp c
Tk
e a red ed de ini ion a en a e
representatio
E
internet connections Higher latency,
o en
[Amazon Titan
MA
R Squared
y /T N Rate (TNR)
/C onfusion Matri x E n
Best when correctly identifying negatives is important (e.g., quality control). valuatio
Metrics
Unlabeled data
C ontinued P -
re trainin g
F1 Score
T E
Data Selection Deplo eedbac
Best when both false positives and false negatives matter (e.g., information retrieval). P re - rainning Optimization valuation
m zo OpenSearch Service (Serverless
P C A a n
a at
(kNN) search capability
Mass dataset, Diverse Source
C-ROC
AU
Balanced datasets
-
Self supervised
A m zo
a nc Do umentD
How well the model distinguishes between the positive and negative classes across various thresholds?
create labels from input data NoSQL database
Best for evaluating model performance across thresholds (e.g., binary classification). real time similarity queries
Multi- Threshold Metric s
T
ransfer Learnin g
F ine tunin- g
Prompt Engineering
RAG
Threshold-independent, Curve-based does NOT change the weights of FM DOES change the weights of FM Embeddings Model
Vector Databases
C-PR
(R etrieval-Augmented Generation)
h h m zo
A a n rr
Au o
relational database,
converts data (text, images, etc.) stores t ese vectors and elps
AU
Don’t remember earlier conversation
does NOT change the weights of FM
proprietary on AWS
into numbers (vectors) find similar ones efficiently
$
$$
Imbalanced datasets
Relational
“Augmented” Prompt = Query + “Retrieval” Text m zo
A a n RDS for PostgreSQ
relational database,
Single T u rn Me ss aging
open-source
Labeled data
Instruction based Fine-tuning
$$$
m zo Neptun
M u u
lti T rn Me ssaging
Graph A a n
graph database