GenAI-Unit1-3
GenAI-Unit1-3
GenAI-Unit1-3
Introduction to Generative AI
Generative AI focuses on creating new data that resembles the original data it was trained on.
It’s like a student who studies several art styles and creates a new piece that looks like it
belongs to one of those styles.
● Basic Example:
1. If you train a Generative AI model on images of dogs, it learns to generate new,
realistic dog images that weren’t part of the original dataset.
● Key Features:
1. Generates new content (text, images, videos, etc.).
2. Learns patterns and structures from existing data.
3. Doesn’t just memorize data but creates something new.
Input Data Often labeled (e.g., image and Unlabeled or raw data.
category).
Examples Predict house prices, classify Create new house designs, generate
images. new images.
Generative AI uses several approaches. Let’s break them down with examples, mathematical
intuition, and diagrams.
These two networks "compete" with each other, improving over time.
Structure of GANs:
Random Noise (z) → Generator → Fake Data → Discriminator
Real Data → Discriminator
Discriminator → Output (Real or Fake)
Mathematical Explanation:
Example:
Imagine training a GAN to generate handwritten digits (like those in the MNIST dataset):
Key Idea: VAEs are generative models that learn to encode data into a latent space and then
decode it back to reconstruct the data.
Structure of VAEs:
Mathematical Explanation:
● Learn a probabilistic model p(x∣z)p(x|z) that generates data xx from latent variables zz.
● Objective: Maximize the Evidence Lower Bound (ELBO):
L=Eq(z∣x)[logp(x∣z)]−DKL(q(z∣x)∣∣p(z))\mathcal{L} = \mathbb{E}_{q(z|x)} [\log p(x|z)] -
D_{KL}(q(z|x) || p(z))
○ First term: Reconstruction accuracy.
○ Second term: Ensures q(z∣x)q(z|x) (encoder) is close to p(z)p(z) (prior).
Applications:
Diagram: VAE
Input Data --> [Encoder] --> Latent Space --> [Decoder] --> Reconstructed Data
3. Transformers
Key Idea: Transformers process sequences of data in parallel (e.g., text, images). They use a
mechanism called attention to focus on important parts of the input sequence.
Architecture:
Mathematics of Attention:
Attention(Q,K,V)=softmax(QKTdk)VAttention(Q, K, V) =
\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
Example:
● In translation, if the model sees "I eat an apple," attention ensures "eat" is linked to its
French counterpart "mange."
Transformers and Attention Mechanism in Detail
1. Self-Attention:
○
Allows the model to understand relationships between words in the same
sequence.
○ Example: In "The cat chased the mouse, and it ran away," self-attention helps
identify "it" refers to "mouse."
2. Multi-Head Attention:
○ Instead of a single attention mechanism, multiple heads allow the model to focus
on different parts of the input.
Applications of Transformers:
UNIT 2
1. Generative vs Discriminative Models
Definition
● Generative Models:
○Learn the joint probability distribution p(x,y)p(x, y), where xx is the input, and
yy is the output.
○ They can generate new data samples resembling the original dataset.
○ Example: Generative Adversarial Networks (GANs), Variational Autoencoders
(VAEs).
● Discriminative Models:
Comparison Table
Aspect Generative Models Discriminative Models
Goal Generate data xx and predict yy. Predict output yy from input xx.
1. Generative: A model trained on cat images generates a new, realistic cat image.
2. Discriminative: A model determines whether a given image contains a cat or not.
2. Generative AI Architecture
Overview
Generative AI architectures are designed to learn patterns in the data to generate new, realistic
samples.
Components of Architecture
Applications:
Computers process numbers, not words. To handle natural language, we transform words into
numerical vectors while preserving their meaning.
Word Embeddings
1. Definition:
Word embeddings are dense vector representations of words in a continuous space,
where semantically similar words are closer.
Example:
Steps in Training:
1. Preprocessing:
Pre-training:
Transfer Learning:
Diagram:
Random Noise --> [Generator] --> Fake Data --> [Discriminator] --> Real/Fake
# Generator Model
def build_generator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=100),
layers.Dense(784, activation="sigmoid")
])
return model
# Discriminator Model
def build_discriminator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=784),
layers.Dense(1, activation="sigmoid")
])
return model
# Compile GAN
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
# GAN Model
discriminator.trainable = False
gan_input = layers.Input(shape=(100,))
generated_image = generator(gan_input)
gan_output = discriminator(generated_image)
gan = tf.keras.Model(gan_input, gan_output)
gan.compile(optimizer="adam", loss="binary_crossentropy")
# Training Loop
import numpy as np
# Fake data
noise = np.random.normal(size=(batch_size, 100))
fake_images = generator.predict(noise)
fake_labels = np.zeros((batch_size, 1))
# Train Discriminator
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
# Train Generator
noise = np.random.normal(size=(batch_size, 100))
g_loss = gan.train_on_batch(noise, np.ones((batch_size, 1)))
if epoch % 1000 == 0:
print(f"Epoch {epoch}: D Loss: {d_loss_real + d_loss_fake}, G Loss: {g_loss}")
train_gan(generator, discriminator, gan)
UNIT 3
1. Introduction to Large Language Models (LLMs)
Large Language Models (LLMs) are advanced AI systems trained on massive amounts of
textual data to understand, generate, and manipulate natural language. Examples include GPT
(Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from
Transformers), and OpenAI's ChatGPT.
Key Features:
Generative AI Overview
Generative AI models aim to create new data that resembles the training data, such as text,
images, music, and videos.
● Example Tasks: Generating text for writing, creating AI art, or simulating dialogue in
chatbots.
Language models predict the next word in a sequence based on the context.
Foundation Models
Foundation models are large, pre-trained AI models designed to serve as a base for various
tasks.
Key Idea: Train once on large datasets, fine-tune later for specific tasks.
1. Recurrent Neural Networks (RNNs): Used for sequential data but had difficulty
remembering long sequences (vanishing gradient problem).
2. LSTMs (Long Short-Term Memory) & GRUs: Improved memory handling in sequential
tasks.
3. Transformers: Revolutionized NLP by addressing sequence processing efficiently with
attention mechanisms.
5. Responsible AI
What Is Responsible AI?
Responsible AI ensures AI systems are developed and deployed ethically, fairly, and
transparently.
6. One-Shot Learning
Definition:
One-shot learning refers to a model's ability to learn a new task from just one or very few
examples.
● Language Models:
GPT models can perform translation after seeing just one example during fine-tuning.
Example:
Diagram:
Input Text --> Tokenization --> Embedding --> [Multi-Head Attention + Feedforward] --> Output
Generative AI vs Traditional AI
Aspect Generative AI Traditional AI
1. Pre-training:
Prompt engineering is the process of crafting effective prompts (inputs) to elicit the desired
response from a Large Language Model (LLM). Prompts guide LLMs to perform tasks
accurately by providing clear instructions, examples, or context.
1. Enhances Model Performance: Helps the model understand the task more precisely.
2. Task Flexibility: Enables diverse applications without retraining the model.
3. Cost Efficiency: Reduces the need for fine-tuning or additional data.
Prompt:
"Explain quantum mechanics in simple terms suitable for a 12-year-old."
Response:
"A quantum particle behaves like a wave and a particle at the same time…"
Zero-Shot Prompt:
"Write a haiku about the moon."
Few-Shot Prompt:
"Here are two haikus:
1. Input Layer: Accepts user queries and processes them into tokens (smaller units).
2. Embedding Layer: Converts tokens into dense numerical vectors.
3. Transformer Layers:
○ Uses attention mechanisms to focus on relevant parts of the input.
○ Each layer refines the representation of the input for better understanding.
4. Output Layer: Produces the response text by generating one token at a time.
Diagram:
User Input --> Tokenization --> Transformer Layers --> Output Tokens --> Response
import tensorflow as tf
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64, input_length=max_length),
Conv1D(128, 5, activation='relu'),
GlobalMaxPooling1D(),
Dense(64, activation='relu'),
Dense(1, activation='sigmoid')
])
model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64),
SimpleRNN(128, return_sequences=True),
SimpleRNN(128),
Dense(vocab_size, activation='softmax')
])
Generative models for creating new data similar to the training set.
Code Example: VAE for Text Generation
# Define encoder
input_dim = 784
latent_dim = 2
inputs = Input(shape=(input_dim,))
h = Dense(256, activation='relu')(inputs)
z_mean = Dense(latent_dim)(h)
z_log_var = Dense(latent_dim)(h)
# Sampling layer
def sampling(args):
z_mean, z_log_var = args
epsilon = tf.random.normal(shape=(tf.shape(z_mean)[0], latent_dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon
z = Lambda(sampling)([z_mean, z_log_var])
# Define decoder
decoder_h = Dense(256, activation='relu')
decoder_mean = Dense(input_dim, activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)
# VAE model
vae = Model(inputs, x_decoded_mean)
vae.compile(optimizer='adam', loss='binary_crossentropy')
vae.summary()
What is RAG?
1. Retriever: Fetches relevant data from a knowledge base based on the query.
2. Generator: Uses the retrieved data to craft a detailed, accurate response.
Applications of RAG:
6. Fine-Tuning LLMs
What is Fine-Tuning?
Code Example:
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_encodings
)
trainer.train()
Stable Diffusion is a cutting-edge generative AI model that synthesizes images from textual
descriptions, noise, or incomplete data. Based on latent diffusion models, Stable Diffusion
offers a versatile framework for high-quality image generation and editing. It has gained
popularity due to its balance of computational efficiency and output quality.
Key Features
● Purpose: Compress the input data (image) into a latent space using a Variational
Autoencoder (VAE).
● Working:
1. Encoder: Converts high-dimensional image data into a compressed latent
representation.
2. Decoder: Converts the latent representation back to image space after
processing.
B. Noise Prediction Model
● A U-Net architecture predicts the noise added during the forward diffusion process.
● Acts as the backbone of Stable Diffusion for denoising the corrupted latent space.
C. Text Encoder
1. Forward Diffusion:
○
Start with an image dataset.
○
Gradually add Gaussian noise to the images through multiple steps, converting
them into pure noise.
2. Backward Diffusion:
○ The model is trained to predict the noise added at each step using the U-Net.
○ The training objective is to minimize the mean squared error (MSE) between the
predicted and actual noise.
3. Latent Space Training:
Loss Function
Inference in Stable Diffusion involves reversing the noise addition process (diffusion) and
generating a coherent image conditioned on a text prompt.
1. Text Conditioning:
The input text is encoded into a latent representation using the text encoder.
4. Decoding:
Convert the refined latent representation into the final image using the decoder.
Input Prompt:
"A beautiful sunrise over a serene mountain lake, digital art style."
Output Steps:
1. Hugging Face Diffusers Library: A popular Python library for diffusion models.
2. CompVis Library: The official library for Stable Diffusion.
3. Automatic1111: A user-friendly UI for running Stable Diffusion locally.
B. Techniques
Stable Diffusion v2
Customized Variants
● Customize the model for specific domains (e.g., medical imaging, architectural designs).
● Requires additional training data and computing resources.
B. Textual Inversion
● Learn new embeddings for specific concepts (e.g., a person’s face or unique objects)
without modifying the base model.
C. Prompt Engineering
Healthcare
● Drug Discovery: AI generates potential drug candidates using molecule simulation and
property prediction.
● Medical Imaging: Generative models enhance image quality and assist in diagnosis
(e.g., detecting brain strokes or tumors).
● Patient Records Synthesis: AI generates synthetic patient data for training machine
learning models without compromising privacy.
Automobile
Entertainment
● Content Creation: AI generates realistic visuals, music, scripts, or even video game
levels.
● Film Restoration: AI recreates damaged frames in old movies using Generative
Adversarial Networks (GANs).
Education
Manufacturing
Scenario:
Use an LLM to analyze 10-K filings (annual reports filed by public companies) and answer
financial questions.
Example Task:
How it Works:
Benefits:
Scenario:
Analyzing a company's financial health using profit/loss statements, balance sheets, and
market trends.
Example Task:
● Input: "Compare Microsoft and Google in terms of profitability for the fiscal year 2023."
● Output: A detailed analysis of key metrics like gross margin, operating margin, and net
profit.
What is LangChain?
1. Data Collection: Gather domain-specific data (e.g., customer service FAQs, financial
data).
2. Embed Documents: Use embeddings to convert text into vector representations for
retrieval.
3. Integration with LLM: Combine a retrieval mechanism (e.g., vector databases) with a
generative model for responses.
4. Fine-Tuning: Customize the chatbot’s responses for specific needs using domain
examples.
Example:
● Healthcare Chatbot: Answers patient queries based on medical records and FAQs.
● Financial Chatbot: Provides financial advice based on company filings.
A. Text Generation
Example Task:
Example Task:
Example Task:
● Prompt:
"Analyze the sentiment of this statement: 'The product is amazing but delivery was
delayed.'"
● Response:
Sentiment: Mixed
Explanation: Positive sentiment about the product, but negative regarding delivery.
5. Video Creation
Examples:
1. Explainer Videos: Create educational videos by converting text scripts into visuals.
2. Synthetic Avatars: AI generates avatars delivering scripted content.
1. Input Text: "Create a 30-second video showing a bustling city street during sunset."
2. Output Video: AI generates a video clip with the specified features.
Tutorial Codes for Content Generation Tasks
openai.api_key = "your-api-key"
response = openai.Completion.create(
engine="text-davinci-003",
prompt="Summarize this speech: 'Education is the backbone of society...'",
max_tokens=100
)
print(response['choices'][0]['text'])
# Analyze sentiment
text = "The product is amazing but delivery was delayed."
result = sentiment_analyzer(text)
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.95}]