GenAI-Unit1-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

UNIT 1

Introduction to Generative AI

What is Generative AI?

Generative AI focuses on creating new data that resembles the original data it was trained on.
It’s like a student who studies several art styles and creates a new piece that looks like it
belongs to one of those styles.

● Basic Example:
1. If you train a Generative AI model on images of dogs, it learns to generate new,
realistic dog images that weren’t part of the original dataset.
● Key Features:
1. Generates new content (text, images, videos, etc.).
2. Learns patterns and structures from existing data.
3. Doesn’t just memorize data but creates something new.

Difference Between Machine Learning and Generative AI Pipeline


Aspect Machine Learning (ML) Generative AI

Purpose Predicts or classifies data. Generates new, realistic data.

Input Data Often labeled (e.g., image and Unlabeled or raw data.
category).

Output Predictions (e.g., category label). Realistic new data.

Examples Predict house prices, classify Create new house designs, generate
images. new images.

Model Learn a mapping function Learn the distribution p(x)p(x).


Objective f(x)=yf(x) = y.

Types of Generative AI Models

Generative AI uses several approaches. Let’s break them down with examples, mathematical
intuition, and diagrams.

1. Generative Adversarial Networks (GANs)


Key Idea: GANs consist of two networks:

1. Generator: Creates fake data.


2. Discriminator: Tries to differentiate real data from fake data.

These two networks "compete" with each other, improving over time.

Structure of GANs:
Random Noise (z) → Generator → Fake Data → Discriminator
Real Data → Discriminator
Discriminator → Output (Real or Fake)

Mathematical Explanation:

● The Generator creates data G(z)G(z), where zz is random noise.


● The Discriminator learns a function D(x)D(x) that outputs the probability of xx being
real.
● Objective: Minimize the Generator loss and maximize the Discriminator loss:
min⁡Gmax⁡DV(D,G)=Ex∼pdata(x)[log⁡D(x)]+Ez∼pz(z)[log⁡(1−D(G(z)))]\min_G \max_D V(D,
G) = \mathbb{E}_{x \sim p_{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log (1 -
D(G(z)))]

Example:

Imagine training a GAN to generate handwritten digits (like those in the MNIST dataset):

1. The Generator creates fake images of digits.


2. The Discriminator tries to tell apart fake digits from real ones.
3. Over time, the Generator gets so good that its digits are indistinguishable from the real
ones.

Diagram: GAN Training Process


Real Data --> [Discriminator] --> Real/Fake
Generator --> Fake Data --> [Discriminator] --> Real/Fake

2. Variational Autoencoders (VAEs)

Key Idea: VAEs are generative models that learn to encode data into a latent space and then
decode it back to reconstruct the data.

Structure of VAEs:

1. Encoder: Compresses input xx into a smaller representation zz in a latent space.


2. Latent Space: Adds a bit of randomness to make zz more flexible.
3. Decoder: Reconstructs xx from zz.

Mathematical Explanation:

● Learn a probabilistic model p(x∣z)p(x|z) that generates data xx from latent variables zz.
● Objective: Maximize the Evidence Lower Bound (ELBO):
L=Eq(z∣x)[log⁡p(x∣z)]−DKL(q(z∣x)∣∣p(z))\mathcal{L} = \mathbb{E}_{q(z|x)} [\log p(x|z)] -
D_{KL}(q(z|x) || p(z))
○ First term: Reconstruction accuracy.
○ Second term: Ensures q(z∣x)q(z|x) (encoder) is close to p(z)p(z) (prior).

Applications:

1. Image reconstruction (e.g., filling missing parts of an image).


2. Data generation with controlled variability.

Diagram: VAE
Input Data --> [Encoder] --> Latent Space --> [Decoder] --> Reconstructed Data

3. Transformers

Key Idea: Transformers process sequences of data in parallel (e.g., text, images). They use a
mechanism called attention to focus on important parts of the input sequence.

Architecture:

1. Encoder: Processes input and creates a representation.


2. Decoder: Generates output based on the representation.
3. Attention Mechanism: Helps the model "pay attention" to relevant parts of the data.

Mathematics of Attention:

Given a query QQ, key KK, and value VV:

Attention(Q,K,V)=softmax(QKTdk)VAttention(Q, K, V) =
\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

● QKTQK^T: Measures similarity between query and keys.


● softmax\text{softmax}: Converts similarities into probabilities.
● VV: Weighted sum of values based on probabilities.

Example:

● In translation, if the model sees "I eat an apple," attention ensures "eat" is linked to its
French counterpart "mange."
Transformers and Attention Mechanism in Detail

1. Self-Attention:


Allows the model to understand relationships between words in the same
sequence.
○ Example: In "The cat chased the mouse, and it ran away," self-attention helps
identify "it" refers to "mouse."
2. Multi-Head Attention:

○ Instead of a single attention mechanism, multiple heads allow the model to focus
on different parts of the input.

Diagram: Transformer Architecture


Input Sequence --> [Embedding Layer] --> [Multi-Head Attention] --> [Feedforward Network] -->
Output

Applications of Transformers:

1. Text Generation: GPT models like ChatGPT.


2. Translation: Google Translate uses transformers.
3. Image Processing: Vision Transformers (ViTs).

UNIT 2
1. Generative vs Discriminative Models

Definition

● Generative Models:

○Learn the joint probability distribution p(x,y)p(x, y), where xx is the input, and
yy is the output.
○ They can generate new data samples resembling the original dataset.
○ Example: Generative Adversarial Networks (GANs), Variational Autoencoders
(VAEs).
● Discriminative Models:

○ Learn the conditional probability distribution p(y∣x)p(y|x), predicting yy (label)


given xx (input).
○ They classify or predict outcomes but cannot generate new data.
○ Example: Logistic Regression, Support Vector Machines (SVMs), Neural
Networks.

Comparison Table
Aspect Generative Models Discriminative Models

Goal Generate data xx and predict yy. Predict output yy from input xx.

Output New data resembling training Predicted labels or classes.


data.

Example Data synthesis, image Spam email detection, image


Use generation. classification.

Models GANs, VAEs. Logistic Regression, SVMs.

Example of Generative vs Discriminative

1. Generative: A model trained on cat images generates a new, realistic cat image.
2. Discriminative: A model determines whether a given image contains a cat or not.

2. Generative AI Architecture

Overview

Generative AI architectures are designed to learn patterns in the data to generate new, realistic
samples.

● They generally consist of Encoders, Decoders, and Latent Spaces.


● Key architectures include:
1. GANs: Compete using generator-discriminator.
2. VAEs: Learn probabilistic latent representations.
3. Transformers: Use attention mechanisms to process data sequences.

General Architecture Diagram


Input Data --> [Encoder/Generator] --> Latent Representation --> [Decoder/Generator] -->
Generated Data

Components of Architecture

1. Encoder: Compresses input into a lower-dimensional latent space.


2. Latent Space: Represents data features compactly.
3. Decoder: Reconstructs or generates output from the latent space.

Applications:

1. Image generation (GANs).


2. Text generation (Transformers).
3. Data augmentation for training.

3. Transforming Text to Numerical Representations - Word Embeddings

Why Transform Text to Numbers?

Computers process numbers, not words. To handle natural language, we transform words into
numerical vectors while preserving their meaning.

Word Embeddings

1. Definition:
Word embeddings are dense vector representations of words in a continuous space,
where semantically similar words are closer.

2. Popular Word Embedding Techniques:

○ Word2Vec: Learns word representations using neural networks.


○ GloVe (Global Vectors): Captures word co-occurrence statistics.
○ FastText: Represents subword information (useful for rare words).

Example:

● Words like "king" and "queen" might have embeddings like:


king=[0.5,0.8,0.2],queen=[0.5,0.9,0.1]\text{king} = [0.5, 0.8, 0.2], \quad \text{queen} =
[0.5, 0.9, 0.1]
3. Applications:
○ Text classification.
○ Sentiment analysis.
○ Machine translation.

Mathematical Insight: Word2Vec (CBOW and Skip-gram)


1. CBOW (Continuous Bag of Words):
Predict a target word from its context.
P(word∣context)=softmax(WTvc)P(\text{word} | \text{context}) = \text{softmax}(W^T v_c)
2. Skip-gram:
Predict surrounding context words from the target word.
P(context∣word)=softmax(WTvw)P(\text{context} | \text{word}) = \text{softmax}(W^T
v_w)

4. Training a Large Language Model (LLM)

Steps in Training:

1. Preprocessing:

○ Clean and tokenize the data.


○ Example: Split "I love AI!" into ["I", "love", "AI"].
2. Embedding Layer:

○ Transform tokens into vectors.


3. Model Architecture:

○ Use Transformers (e.g., GPT, BERT).


4. Loss Function:

○ Use Cross-Entropy Loss to minimize the difference between predicted and


actual words.

5. Evaluation Metrics for Generative AI Tasks


Task Metric Description

Classification Accuracy, Precision, Measures model performance on labeled data.


Recall

Summarization ROUGE Score Compares overlap between generated and


reference summaries.

Question BLEU, F1 Score Measures accuracy of predicted answers.


Answering
Text Generation Perplexity, BLEU Perplexity evaluates how well the model
Score predicts sequences.

6. Pre-training and Transfer Learning

Pre-training:

Train a model on a large, general-purpose dataset to learn basic features.

Transfer Learning:

Fine-tune the pre-trained model on a specific task.

7. Architecture and Training Process of GANs

Steps in Training GANs:

1. Train the Discriminator to distinguish between real and fake data.


2. Train the Generator to produce fake data that fools the Discriminator.

Diagram:
Random Noise --> [Generator] --> Fake Data --> [Discriminator] --> Real/Fake

8. Tutorial on GANs Using TensorFlow (Image Generation)

Code Example: Basic GAN


import tensorflow as tf
from tensorflow.keras import layers

# Generator Model
def build_generator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=100),
layers.Dense(784, activation="sigmoid")
])
return model

# Discriminator Model
def build_discriminator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=784),
layers.Dense(1, activation="sigmoid")
])
return model

# Compile GAN
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# GAN Model
discriminator.trainable = False
gan_input = layers.Input(shape=(100,))
generated_image = generator(gan_input)
gan_output = discriminator(generated_image)
gan = tf.keras.Model(gan_input, gan_output)
gan.compile(optimizer="adam", loss="binary_crossentropy")

# Training Loop
import numpy as np

def train_gan(generator, discriminator, gan, epochs=10000, batch_size=32):


for epoch in range(epochs):
# Real data
real_images = np.random.normal(size=(batch_size, 784))
real_labels = np.ones((batch_size, 1))

# Fake data
noise = np.random.normal(size=(batch_size, 100))
fake_images = generator.predict(noise)
fake_labels = np.zeros((batch_size, 1))
# Train Discriminator
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
# Train Generator
noise = np.random.normal(size=(batch_size, 100))
g_loss = gan.train_on_batch(noise, np.ones((batch_size, 1)))
if epoch % 1000 == 0:
print(f"Epoch {epoch}: D Loss: {d_loss_real + d_loss_fake}, G Loss: {g_loss}")
train_gan(generator, discriminator, gan)
UNIT 3
1. Introduction to Large Language Models (LLMs)

What Are LLMs?

Large Language Models (LLMs) are advanced AI systems trained on massive amounts of
textual data to understand, generate, and manipulate natural language. Examples include GPT
(Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from
Transformers), and OpenAI's ChatGPT.

Key Features:

● Ability to generate coherent, contextually accurate text.


● Support for diverse tasks like translation, summarization, and answering questions.
● Learn patterns, grammar, and even reasoning from data.

2. Generative AI and Large Language Models

Generative AI Overview

Generative AI models aim to create new data that resembles the training data, such as text,
images, music, and videos.

● Example Tasks: Generating text for writing, creating AI art, or simulating dialogue in
chatbots.

Large Language Models in Generative AI

LLMs are the backbone of text-based generative AI systems.

● Example: ChatGPT generates human-like conversations using LLM technology.

3. Language Models and Foundation Models

What Are Language Models?

Language models predict the next word in a sequence based on the context.

● Given input: "The cat sat on the",


● Language Model predicts: "mat."

Applications of Language Models:

1. Autocomplete: Predicts the next word while typing.


2. Translation: Converts text between languages.
3. Speech Recognition: Transforms spoken words into text.

Foundation Models

Foundation models are large, pre-trained AI models designed to serve as a base for various
tasks.

● Example: GPT-3 is a foundation model that powers multiple NLP applications.

Key Idea: Train once on large datasets, fine-tune later for specific tasks.

4. Development of Neural Natural Language Processing (NLP)

How Did Neural NLP Evolve?

Before Neural NLP:

1. Rule-Based Systems: Models relied on human-designed rules.


2. Statistical Models: Probabilistic models like n-grams predicted word sequences but
struggled with long-term context.

Introduction of Neural Networks in NLP:

1. Recurrent Neural Networks (RNNs): Used for sequential data but had difficulty
remembering long sequences (vanishing gradient problem).
2. LSTMs (Long Short-Term Memory) & GRUs: Improved memory handling in sequential
tasks.
3. Transformers: Revolutionized NLP by addressing sequence processing efficiently with
attention mechanisms.

Impact of Neural NLP:

1. Enhanced context understanding.


2. Improved accuracy in tasks like translation and sentiment analysis.
3. Enabled the creation of LLMs like BERT and GPT.

5. Responsible AI
What Is Responsible AI?

Responsible AI ensures AI systems are developed and deployed ethically, fairly, and
transparently.

Core Principles of Responsible AI:

1. Fairness: Avoiding bias in predictions or decisions.


2. Transparency: Explaining how AI makes decisions.
3. Privacy: Protecting user data.
4. Accountability: Taking responsibility for AI's impact.

Importance in Generative AI and LLMs:

1. Generative AI can inadvertently spread misinformation.


2. LLMs may reflect biases present in their training data.
3. Ethical use is crucial for trust and adoption.

6. One-Shot Learning

Definition:

One-shot learning refers to a model's ability to learn a new task from just one or very few
examples.

● Contrast: Traditional models require thousands of examples to learn effectively.

How One-Shot Learning Works:

1. Leverages prior knowledge gained during pre-training.


2. Generalizes patterns to adapt to new tasks with minimal data.

Examples of One-Shot Learning:

● Language Models:
GPT models can perform translation after seeing just one example during fine-tuning.

Example:

● Input: "Translate to French: I love AI."


● Model learns from this single instance to handle other translations.

Techniques Supporting One-Shot Learning:


1. Meta-Learning:

○ Train models to learn how to learn.


○ Example: Matching Networks, Prototypical Networks.
2. Pre-training and Fine-tuning:

○ Pre-train on a general task (e.g., language understanding).


○ Fine-tune for a specific task (e.g., customer support chatbot).

Mathematics and Architecture Supporting LLMs

Transformer Model Architecture:

Diagram:

Input Text --> Tokenization --> Embedding --> [Multi-Head Attention + Feedforward] --> Output

1. Tokenization: Converts text into smaller units (tokens).


2. Embedding Layer: Maps tokens to vectors.
3. Attention Mechanism: Focuses on important parts of the sequence.

Attention Mechanism Formula:


Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) =
\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

● QQ: Query matrix.


● KK: Key matrix.
● VV: Value matrix.
● dkd_k: Dimension of the key.
4. Output Layer: Generates predictions based on learned representations.

Generative AI vs Traditional AI
Aspect Generative AI Traditional AI

Goal Create new content. Analyze or classify existing data.

Examples Text generation, image Spam filtering, recommendation.


synthesis.
Key Models GANs, VAEs, Transformers. Decision Trees, Logistic
Regression.

Training LLMs: From Basics to Fine-Tuning

1. Pre-training:

○ Train the model on a massive corpus of text to understand language patterns.


○ Example: OpenAI’s GPT models are pre-trained on diverse datasets.
2. Fine-tuning:

○ Adjust the model for specific tasks using labeled data.


○ Example: Fine-tune GPT for customer service by training on dialogue data.

Final Thoughts on Generative AI and LLMs

1. Revolutionary Impact: LLMs have transformed industries from healthcare (automating


medical reports) to entertainment (creating scripts).
2. Challenges: High computational costs, ethical concerns, and bias mitigation remain
areas for improvement.
3. Future Directions: Enhanced one-shot learning, better Responsible AI practices, and
hybrid models combining generative and discriminative capabilities.
UNIT 4

1. Introduction to Prompt Engineering

What is Prompt Engineering?

Prompt engineering is the process of crafting effective prompts (inputs) to elicit the desired
response from a Large Language Model (LLM). Prompts guide LLMs to perform tasks
accurately by providing clear instructions, examples, or context.

Key Elements of a Prompt:

1. Instruction: Clearly specify the task (e.g., "Translate to French").


2. Context: Provide background information to guide the model.
3. Examples (Optional): Demonstrate how to complete the task.

Why is Prompt Engineering Important?

1. Enhances Model Performance: Helps the model understand the task more precisely.
2. Task Flexibility: Enables diverse applications without retraining the model.
3. Cost Efficiency: Reduces the need for fine-tuning or additional data.

2. LLM Apps Using Prompt Engineering: BARD and ChatGPT (GenAI


Chatbots)

BARD (Google's GenAI Chatbot):

● BARD is based on Google’s PaLM 2 (Pathways Language Model).


● It excels in multi-turn dialogue, reasoning, and language understanding.
● Uses prompt engineering for conversational responses and task-specific operations.

Example of a Prompt for BARD:

Prompt:
"Explain quantum mechanics in simple terms suitable for a 12-year-old."

Response:
"A quantum particle behaves like a wave and a particle at the same time…"

ChatGPT (OpenAI’s Chatbot):


● Built on GPT (Generative Pre-trained Transformer) models.
● Implements few-shot, one-shot, and zero-shot learning for task completion.

Prompt Engineering in ChatGPT:

Zero-Shot Prompt:
"Write a haiku about the moon."
Few-Shot Prompt:
"Here are two haikus:

● Morning dew glistens / on the petal of a rose / fleeting yet timeless.


● Gentle breeze whispers / through the tall grass of the field / a song of freedom.
Write one about the stars."

3. LLMs That Power Chatbots: Architecture and Training Data

Architecture of LLMs in Chatbots:

1. Input Layer: Accepts user queries and processes them into tokens (smaller units).
2. Embedding Layer: Converts tokens into dense numerical vectors.
3. Transformer Layers:
○ Uses attention mechanisms to focus on relevant parts of the input.
○ Each layer refines the representation of the input for better understanding.
4. Output Layer: Produces the response text by generating one token at a time.

Diagram:

User Input --> Tokenization --> Transformer Layers --> Output Tokens --> Response

Training Data for Chatbots:

1. Public Datasets: Wikipedia, Common Crawl, BooksCorpus.


2. Domain-Specific Data: Custom data for specialized tasks (e.g., customer service).
3. Conversational Data: Chat history or dialogue datasets to train for multi-turn
conversations.

4. Tutorial Codes for CNNs, RNNs, and VAEs Using TensorFlow

A. Convolutional Neural Networks (CNNs):

Used in image recognition but also for text classification tasks.


Code Example: Text Classification with CNNs

import tensorflow as tf
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense

# Sample text data preprocessing


vocab_size = 5000
max_length = 100

model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64, input_length=max_length),
Conv1D(128, 5, activation='relu'),
GlobalMaxPooling1D(),
Dense(64, activation='relu'),
Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


model.summary()

B. Recurrent Neural Networks (RNNs):

Used for sequential data like text or time-series.

Code Example: Text Generation with RNNs

from tensorflow.keras.layers import SimpleRNN, Embedding

model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64),
SimpleRNN(128, return_sequences=True),
SimpleRNN(128),
Dense(vocab_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


model.summary()

C. Variational Autoencoders (VAEs):

Generative models for creating new data similar to the training set.
Code Example: VAE for Text Generation

from tensorflow.keras.layers import Input, Dense, Lambda


from tensorflow.keras.models import Model

# Define encoder
input_dim = 784
latent_dim = 2

inputs = Input(shape=(input_dim,))
h = Dense(256, activation='relu')(inputs)
z_mean = Dense(latent_dim)(h)
z_log_var = Dense(latent_dim)(h)

# Sampling layer
def sampling(args):
z_mean, z_log_var = args
epsilon = tf.random.normal(shape=(tf.shape(z_mean)[0], latent_dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = Lambda(sampling)([z_mean, z_log_var])

# Define decoder
decoder_h = Dense(256, activation='relu')
decoder_mean = Dense(input_dim, activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)

# VAE model
vae = Model(inputs, x_decoded_mean)
vae.compile(optimizer='adam', loss='binary_crossentropy')
vae.summary()

5. Building RAG (Retrieval-Augmented Generation) Systems

What is RAG?

Retrieval-Augmented Generation combines a generative model (e.g., GPT) with an information


retrieval system.

How RAG Works:

1. Retriever: Fetches relevant data from a knowledge base based on the query.
2. Generator: Uses the retrieved data to craft a detailed, accurate response.

Applications of RAG:

● Customer support systems.


● Scientific data summarization.

Example: OpenAI’s GPT combined with a company’s internal document database.

6. Fine-Tuning LLMs

What is Fine-Tuning?

Fine-tuning adapts a pre-trained LLM to a specific domain or task by retraining it on a smaller,


task-specific dataset.

Steps for Fine-Tuning an LLM:

1. Prepare domain-specific labeled data.


2. Add a task-specific output layer to the LLM.
3. Retrain the model on the dataset.
4. Evaluate performance and adjust.

Tools for Fine-Tuning:

● Hugging Face Transformers Library.


● OpenAI Fine-tuning API.

Code Example:

from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments

# Load model and tokenizer


model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Fine-tuning data preparation


train_text = ["Your fine-tuning dataset here."]
train_encodings = tokenizer(train_text, truncation=True, padding=True, max_length=512)

# Define training arguments


training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=10_000,
save_total_limit=2
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_encodings
)

trainer.train()

7. Optimizing LLM Performance

Techniques for Optimization:

1. Prompt Engineering: Craft better prompts for task clarity.


2. Model Pruning: Remove unnecessary parameters for efficiency.
3. Knowledge Distillation: Compress large models into smaller ones while retaining
performance.
4. Learning Rate Scheduling: Adjust learning rates during training.
5. Batch Size Optimization: Use appropriate batch sizes for computational resources.
UNIT 5
Stable Diffusion: In-Depth Exploration

Stable Diffusion is a cutting-edge generative AI model that synthesizes images from textual
descriptions, noise, or incomplete data. Based on latent diffusion models, Stable Diffusion
offers a versatile framework for high-quality image generation and editing. It has gained
popularity due to its balance of computational efficiency and output quality.

1. Introduction to Stable Diffusion


Stable Diffusion is a deep learning model designed to generate high-resolution, realistic
images from textual prompts or random noise. It is rooted in diffusion probabilistic models,
which iteratively refine noisy data into meaningful images. It’s an open-source implementation
that democratizes access to generative AI for various applications like art generation, video
creation, and content editing.

Key Features

● Text-to-Image Generation: Generates photorealistic images from descriptive prompts.


● Image Editing: Modifies images by reapplying diffusion steps with altered conditions.
● Computational Efficiency: Uses latent representations to reduce computational costs
without sacrificing quality.
● Scalability: Applicable to multiple domains beyond imaging, including video and 3D
rendering.

2. Components of Stable Diffusion


The architecture of Stable Diffusion is built on three key components:

A. Encoder and Decoder

● Purpose: Compress the input data (image) into a latent space using a Variational
Autoencoder (VAE).
● Working:
1. Encoder: Converts high-dimensional image data into a compressed latent
representation.
2. Decoder: Converts the latent representation back to image space after
processing.
B. Noise Prediction Model

● A U-Net architecture predicts the noise added during the forward diffusion process.
● Acts as the backbone of Stable Diffusion for denoising the corrupted latent space.

C. Text Encoder

● Utilizes CLIP (Contrastive Language-Image Pretraining) or similar models to encode


textual prompts into vector representations.
● These vectors condition the diffusion process to guide image generation in alignment
with the prompt.

3. Training Stable Diffusion


Steps in Training

1. Forward Diffusion:


Start with an image dataset.

Gradually add Gaussian noise to the images through multiple steps, converting
them into pure noise.
2. Backward Diffusion:

○ The model is trained to predict the noise added at each step using the U-Net.
○ The training objective is to minimize the mean squared error (MSE) between the
predicted and actual noise.
3. Latent Space Training:

○ Instead of directly processing high-dimensional images, they are encoded into a


smaller latent space for computational efficiency.

Loss Function

The objective is to minimize the error in predicting the noise:

L=Ez,ϵ,t[∥ϵ−ϵθ(zt,t)∥2]L = \mathbb{E}_{z, \epsilon, t} \left[ \| \epsilon - \epsilon_{\theta}(z_t, t)


\|^2 \right]

● ztz_t: Noisy latent representation.


● ϵ\epsilon: True noise.
● ϵθ\epsilon_{\theta}: Predicted noise by the model.
4. Stable Diffusion Inference: Generating Images
A. Process Overview

Inference in Stable Diffusion involves reversing the noise addition process (diffusion) and
generating a coherent image conditioned on a text prompt.

1. Text Conditioning:
The input text is encoded into a latent representation using the text encoder.

2. Latent Noise Sampling:


Start with a random noise sample in the latent space.

3. Denoising with U-Net:


Iteratively refine the noise sample over TT timesteps using the trained U-Net model.

4. Decoding:
Convert the refined latent representation into the final image using the decoder.

B. Example: Prompt-to-Image Generation

Input Prompt:

"A beautiful sunrise over a serene mountain lake, digital art style."

Output Steps:

1. Encode the text with the text encoder (e.g., CLIP).


2. Initialize a noisy latent vector.
3. Perform diffusion steps, conditioned on the text encoding.
4. Decode the final refined latent vector into an image.

5. Methods and Tools for Stable Diffusion


A. Libraries

1. Hugging Face Diffusers Library: A popular Python library for diffusion models.
2. CompVis Library: The official library for Stable Diffusion.
3. Automatic1111: A user-friendly UI for running Stable Diffusion locally.

B. Techniques

● Latent Optimization: Optimize directly in latent space for faster computation.


● ControlNet: Conditions image generation with external guidance (e.g., edge detection,
depth maps).
● Inpainting: Modify specific regions of an image while preserving the rest.
● LoRA (Low-Rank Adaptation): Fine-tune Stable Diffusion efficiently for new styles.

6. Different Versions of Stable Diffusion


Stable Diffusion v1

● The original implementation.


● Focuses on general-purpose text-to-image generation.

Stable Diffusion v2

● Improved text encoder for better alignment with prompts.


● Enhanced diversity in generated images.
● Higher resolution support (e.g., 768x768).

Customized Variants

● DreamBooth: Fine-tuned Stable Diffusion for personalizing generation.


● ControlNet: Allows structured control over the generation process.

7. Advanced Stable Diffusion Techniques


A. Fine-Tuning Stable Diffusion

● Customize the model for specific domains (e.g., medical imaging, architectural designs).
● Requires additional training data and computing resources.

B. Textual Inversion
● Learn new embeddings for specific concepts (e.g., a person’s face or unique objects)
without modifying the base model.

C. Prompt Engineering

● Craft specific prompts to elicit desired styles or themes in generated images.


● Example: Adding terms like "hyper-realistic, cinematic lighting" enhances image realism.

8. Tutorial Code for Stable Diffusion


Text-to-Image Generation with Hugging Face Diffusers
from diffusers import StableDiffusionPipeline

# Load the Stable Diffusion pipeline


pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda") # Use GPU for faster inference

# Generate an image from a prompt


prompt = "A majestic castle under a starry night, fantasy style"
image = pipe(prompt).images[0]

# Save the image


image.save("generated_image.png")

Image Inpainting Example


from PIL import Image
from diffusers import StableDiffusionInpaintingPipeline

# Load the inpainting pipeline


pipe = StableDiffusionInpaintingPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda")
# Load the input image and mask
input_image = Image.open("input_image.png")
mask_image = Image.open("mask_image.png")
# Generate inpainted image
result = pipe(prompt="Add a glowing sun in the background",
image=input_image,
mask_image=mask_image).images[0]
# Save the inpainted image
result.save("inpainted_image.png")
UNIT 6
Generative AI Applications in Detail

Generative AI is a transformative technology that enables machines to create content, perform


domain-specific tasks, and assist in decision-making across industries. Below is an in-depth
explanation of its applications across different domains.

1. Applications of Generative AI in Various Industries

Healthcare

● Drug Discovery: AI generates potential drug candidates using molecule simulation and
property prediction.
● Medical Imaging: Generative models enhance image quality and assist in diagnosis
(e.g., detecting brain strokes or tumors).
● Patient Records Synthesis: AI generates synthetic patient data for training machine
learning models without compromising privacy.

Automobile

● Autonomous Driving: AI generates real-world-like driving scenarios for training


autonomous vehicles.
● Design Prototyping: Automating the creation of car design mockups.

Entertainment

● Content Creation: AI generates realistic visuals, music, scripts, or even video game
levels.
● Film Restoration: AI recreates damaged frames in old movies using Generative
Adversarial Networks (GANs).

Retail and E-Commerce

● Product Description Generation: Automated generation of product titles and


descriptions.
● Recommendation Systems: Dynamic personalized recommendations using generative
approaches.

Education

● Content Creation: Creating educational materials, quizzes, and summaries.


● Language Learning: Chatbots trained for conversational practice.

Manufacturing

● 3D Prototyping: AI generates 3D models of products, accelerating the prototyping


phase.
● Process Optimization: Predicting and generating optimal workflows in factories.

2. Generative AI Applications in Finance

Generative AI is particularly impactful in financial services, as it can process vast amounts of


data, summarize insights, and assist in decision-making.

Case Study 1: Question & Answering on Financial Disclosures

Scenario:

Use an LLM to analyze 10-K filings (annual reports filed by public companies) and answer
financial questions.

Example Task:

● Input: "Summarize the key risks disclosed in Tesla's 10-K filing."


● Output: A concise summary of risks such as supply chain dependencies, market
competition, and technological challenges.

How it Works:

1. Preprocessing: The 10-K is tokenized and segmented into manageable chunks.


2. Query Understanding: LLM processes the question and retrieves relevant sections of
the document.
3. Answer Generation: The model generates a summarized response.

Benefits:

● Saves time in reading lengthy reports.


● Ensures precise and relevant financial insights.

Case Study 2: Fundamental Analysis Using LLMs

Scenario:

Analyzing a company's financial health using profit/loss statements, balance sheets, and
market trends.
Example Task:

● Input: "Compare Microsoft and Google in terms of profitability for the fiscal year 2023."
● Output: A detailed analysis of key metrics like gross margin, operating margin, and net
profit.

3. Creating Domain-Specific Chatbots Using LangChain

What is LangChain?

LangChain is a Python-based framework for building LLM-powered applications, particularly


domain-specific chatbots.

Steps to Create a Domain-Specific Chatbot:

1. Data Collection: Gather domain-specific data (e.g., customer service FAQs, financial
data).
2. Embed Documents: Use embeddings to convert text into vector representations for
retrieval.
3. Integration with LLM: Combine a retrieval mechanism (e.g., vector databases) with a
generative model for responses.
4. Fine-Tuning: Customize the chatbot’s responses for specific needs using domain
examples.

Example:

● Healthcare Chatbot: Answers patient queries based on medical records and FAQs.
● Financial Chatbot: Provides financial advice based on company filings.

4. Content Creation Applications

A. Text Generation

● News Article Generation: AI generates entire articles based on a given headline.

Example Task:

● Input: "India launches Chandrayaan-3 to explore the Moon."


● Output:
"India’s Chandrayaan-3 mission aims to achieve a successful soft landing on the lunar
surface..."
B. Summarization of Speeches

Generative AI summarizes lengthy speeches into concise points.

Example Task:

● Input: Full text of a political leader’s speech.


● Output:
1. "Focusing on economic growth through technology."
2. "Improving education and healthcare access."

C. Sentiment Analysis with Few-Shot Prompting

Few-shot prompting allows sentiment analysis without large training datasets.

Example Task:

● Prompt:
"Analyze the sentiment of this statement: 'The product is amazing but delivery was
delayed.'"
● Response:
Sentiment: Mixed
Explanation: Positive sentiment about the product, but negative regarding delivery.

5. Video Creation

Generative AI can create videos using textual descriptions or existing footage.

Examples:

1. Explainer Videos: Create educational videos by converting text scripts into visuals.
2. Synthetic Avatars: AI generates avatars delivering scripted content.

Tools for Video Creation:

● DeepMotion: For avatar generation.


● RunwayML: Text-to-video generation.

Example Workflow for Text-to-Video Generation:

1. Input Text: "Create a 30-second video showing a bustling city street during sunset."
2. Output Video: AI generates a video clip with the specified features.
Tutorial Codes for Content Generation Tasks

A. Summarization Using OpenAI API


import openai

openai.api_key = "your-api-key"

response = openai.Completion.create(
engine="text-davinci-003",
prompt="Summarize this speech: 'Education is the backbone of society...'",
max_tokens=100
)

print(response['choices'][0]['text'])

B. Sentiment Analysis Using Hugging Face Transformers


from transformers import pipeline

# Load sentiment analysis pipeline


sentiment_analyzer = pipeline("sentiment-analysis")

# Analyze sentiment
text = "The product is amazing but delivery was delayed."
result = sentiment_analyzer(text)
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.95}]

C. LangChain Example for Chatbots


from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
# Load LLM and vector database
llm = OpenAI(model="gpt-3.5-turbo")
vector_db = Chroma("domain-specific-data")
# Build chatbot
chatbot = ConversationalRetrievalChain.from_llm(llm, retriever=vector_db.as_retriever())
# Query chatbot
response = chatbot.run("What are the company's key risks?")
print(response)

You might also like