GenAI Workshop
GenAI Workshop
A W O R L D W I T H I N
advanced
machine learning models, especially Generative Adversarial
Networks
(GANs), Variational Autoencoders (VAEs), and Transformer-
based
models like GPT (Generative Pretrained Transformer).
How Generative AI Works
1 Training 2 Understanding
The model is trained on
Patterns
The model learns patterns,
large datasets of a specific
structures, and
domain (e.g., text, images).
relationships within the
data.
3 Generation
When prompted, it generates new content based on the learned
patterns, often guided by parameters like context or user input.
Transformers
are a neural network architecture introduced in the 2017 paper
"Attention is All You Need" by Vaswani et al.They perform various
machine learning tasks, particularly in natural language
processing (NLP).
Transformers Architecture
Metadata Handling
Store associated metadata (e.g., text, tags) with vectors.
Use Cases
Image Retrieval
Find similar images by comparing
their embeddings.
Examples of Vector Databases
Vector Database
Pinecone: Scalable vector database for real-time
applications.
Weaviate: Open-source and schema-free vector database.
Milvus: High-performance vector database for massive
datasets.
Purpose of Embeddings
Embeddings encode the semantic meaning and relationships between data points in a way that is machine-readable. Similar
concepts are represented by vectors that are close in the vector space.
Examples
Word Embeddings Image Embeddings Sentence Embeddings
Represent words like "king" and Encodes features of an image (e.g., Encodes entire sentences to capture
"queen" such that their relationship color, shape, objects). context and meaning.
(e.g., "king - man + woman = queen") is
preserved in the vector space.
How Are Embeddings
Generated?
Pretrained Models Custom Models
Models like BERT, GPT, and Fine-tune models to generate
CLIP generate embeddings.. embeddings for specific tasks.
How Embeddings Are Stored
1 Embedding Creation
o Input data (e.g., text, image)
→ Model→ Embedding vector.
2 Metadata Storage
o Alongside embeddings, relevant metadata (e.g., text, IDs) is stored for easy identification and retrieval.
3 Indexing
o Vectors are indexed using ANN (Approximate Nearest Neighbor) algorithms, like KD-Tree or HNSW, to make
similarity search efficient.
4 Vector Database
o Embeddings and metadata are stored in a vector database like Pinecone or Milvus.
Example Storage Structure
{
\-0.34, ...\],"metadata": {
, "category": "Technology" } }
Why Specialized Storage
Is Needed
Embeddings are high-dimensional (e.g., 768 dimensions for BERT).
RoBERTa BLOOM
Enhanced version of BERT by First multilingual LLM
Facebook AI Research. developed through
collaborative efforts.
Fine-Tuning Large Language Models (LLMs)
1 Text generation 2 Translation
3 Summarization 4 Question-answering
Fine-Tuning Process
3 Prepare Dataset
Structure data for training, ensuring compatibility with model requirements.
4 Training
Utilize frameworks (TensorFlow, PyTorch, or libraries like Transformers) to fine- tune the model.
Retriever Generator
Fetches relevant information Generates coherent responses
from a large corpus (e.g., BERT). using retrieved information (e.g.,
GPT-3, T5).
Steps
Query
User provides input (e.g., "What are embeddings?").
1
Retrieval
Relevant documents or embeddings are fetched from a
2
database.
Augmentation
Retrieved data is appended to the query.
3
Generation
The augmented query is passed to an LLM to generate a
4
response.