0% found this document useful (0 votes)

163 views31 pages

GenAI Unit1 3

Uploaded by

jivan.karande21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views31 pages

GenAI Unit1 3

Uploaded by

jivan.karande21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

UNIT 1

Introduction to Generative AI

What is Generative AI?

Generative AI focuses on creating new data that resembles the original data it was trained on.
It’s like a student who studies several art styles and creates a new piece that looks like it
belongs to one of those styles.

● Basic Example:
1. If you train a Generative AI model on images of dogs, it learns to generate new,
realistic dog images that weren’t part of the original dataset.
● Key Features:
1. Generates new content (text, images, videos, etc.).
2. Learns patterns and structures from existing data.
3. Doesn’t just memorize data but creates something new.

Difference Between Machine Learning and Generative AI Pipeline

Aspect Machine Learning (ML) Generative AI

Purpose Predicts or classifies data. Generates new, realistic data.

Input Data Often labeled (e.g., image and Unlabeled or raw data.
category).

Output Predictions (e.g., category label). Realistic new data.

Examples Predict house prices, classify Create new house designs, generate
images. new images.

Model Learn a mapping function Learn the distribution p(x)p(x).

Objective f(x)=yf(x) = y.

Types of Generative AI Models

Generative AI uses several approaches. Let’s break them down with examples, mathematical
intuition, and diagrams.

1. Generative Adversarial Networks (GANs)

Key Idea: GANs consist of two networks:

1. Generator: Creates fake data.

2. Discriminator: Tries to differentiate real data from fake data.

These two networks "compete" with each other, improving over time.

Structure of GANs:
Random Noise (z) → Generator → Fake Data → Discriminator
Real Data → Discriminator
Discriminator → Output (Real or Fake)

Mathematical Explanation:

● The Generator creates data G(z)G(z), where zz is random noise.

● The Discriminator learns a function D(x)D(x) that outputs the probability of xx being
real.
● Objective: Minimize the Generator loss and maximize the Discriminator loss:
min⁡Gmax⁡DV(D,G)=Ex∼pdata(x)[log⁡D(x)]+Ez∼pz(z)[log⁡(1−D(G(z)))]\min_G \max_D V(D,
G) = \mathbb{E}_{x \sim p_{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log (1 -
D(G(z)))]

Example:

Imagine training a GAN to generate handwritten digits (like those in the MNIST dataset):

1. The Generator creates fake images of digits.

2. The Discriminator tries to tell apart fake digits from real ones.
3. Over time, the Generator gets so good that its digits are indistinguishable from the real
ones.

Diagram: GAN Training Process

Real Data --> [Discriminator] --> Real/Fake
Generator --> Fake Data --> [Discriminator] --> Real/Fake

2. Variational Autoencoders (VAEs)

Key Idea: VAEs are generative models that learn to encode data into a latent space and then
decode it back to reconstruct the data.

Structure of VAEs:

1. Encoder: Compresses input xx into a smaller representation zz in a latent space.

2. Latent Space: Adds a bit of randomness to make zz more flexible.
3. Decoder: Reconstructs xx from zz.

Mathematical Explanation:

● Learn a probabilistic model p(x∣z)p(x|z) that generates data xx from latent variables zz.
● Objective: Maximize the Evidence Lower Bound (ELBO):
L=Eq(z∣x)[log⁡p(x∣z)]−DKL(q(z∣x)∣∣p(z))\mathcal{L} = \mathbb{E}_{q(z|x)} [\log p(x|z)] -
D_{KL}(q(z|x) || p(z))
○ First term: Reconstruction accuracy.
○ Second term: Ensures q(z∣x)q(z|x) (encoder) is close to p(z)p(z) (prior).

Applications:

1. Image reconstruction (e.g., filling missing parts of an image).

2. Data generation with controlled variability.

Diagram: VAE
Input Data --> [Encoder] --> Latent Space --> [Decoder] --> Reconstructed Data

3. Transformers

Key Idea: Transformers process sequences of data in parallel (e.g., text, images). They use a
mechanism called attention to focus on important parts of the input sequence.

Architecture:

1. Encoder: Processes input and creates a representation.

2. Decoder: Generates output based on the representation.
3. Attention Mechanism: Helps the model "pay attention" to relevant parts of the data.

Mathematics of Attention:

Given a query QQ, key KK, and value VV:

Attention(Q,K,V)=softmax(QKTdk)VAttention(Q, K, V) =
\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

● QKTQK^T: Measures similarity between query and keys.

● softmax\text{softmax}: Converts similarities into probabilities.
● VV: Weighted sum of values based on probabilities.

Example:

● In translation, if the model sees "I eat an apple," attention ensures "eat" is linked to its
French counterpart "mange."
Transformers and Attention Mechanism in Detail

1. Self-Attention:

○
Allows the model to understand relationships between words in the same
sequence.
○ Example: In "The cat chased the mouse, and it ran away," self-attention helps
identify "it" refers to "mouse."
2. Multi-Head Attention:

○ Instead of a single attention mechanism, multiple heads allow the model to focus
on different parts of the input.

Diagram: Transformer Architecture

Input Sequence --> [Embedding Layer] --> [Multi-Head Attention] --> [Feedforward Network] -->
Output

Applications of Transformers:

1. Text Generation: GPT models like ChatGPT.

2. Translation: Google Translate uses transformers.
3. Image Processing: Vision Transformers (ViTs).

UNIT 2
1. Generative vs Discriminative Models

Definition

● Generative Models:

○Learn the joint probability distribution p(x,y)p(x, y), where xx is the input, and
yy is the output.
○ They can generate new data samples resembling the original dataset.
○ Example: Generative Adversarial Networks (GANs), Variational Autoencoders
(VAEs).
● Discriminative Models:

○ Learn the conditional probability distribution p(y∣x)p(y|x), predicting yy (label)

given xx (input).
○ They classify or predict outcomes but cannot generate new data.
○ Example: Logistic Regression, Support Vector Machines (SVMs), Neural
Networks.

Comparison Table
Aspect Generative Models Discriminative Models

Goal Generate data xx and predict yy. Predict output yy from input xx.

Output New data resembling training Predicted labels or classes.

data.

Example Data synthesis, image Spam email detection, image

Use generation. classification.

Models GANs, VAEs. Logistic Regression, SVMs.

Example of Generative vs Discriminative

1. Generative: A model trained on cat images generates a new, realistic cat image.
2. Discriminative: A model determines whether a given image contains a cat or not.

2. Generative AI Architecture

Overview

Generative AI architectures are designed to learn patterns in the data to generate new, realistic
samples.

● They generally consist of Encoders, Decoders, and Latent Spaces.

● Key architectures include:
1. GANs: Compete using generator-discriminator.
2. VAEs: Learn probabilistic latent representations.
3. Transformers: Use attention mechanisms to process data sequences.

General Architecture Diagram

Input Data --> [Encoder/Generator] --> Latent Representation --> [Decoder/Generator] -->
Generated Data

Components of Architecture

1. Encoder: Compresses input into a lower-dimensional latent space.

2. Latent Space: Represents data features compactly.
3. Decoder: Reconstructs or generates output from the latent space.

Applications:

1. Image generation (GANs).

2. Text generation (Transformers).
3. Data augmentation for training.

3. Transforming Text to Numerical Representations - Word Embeddings

Why Transform Text to Numbers?

Computers process numbers, not words. To handle natural language, we transform words into
numerical vectors while preserving their meaning.

Word Embeddings

1. Definition:
Word embeddings are dense vector representations of words in a continuous space,
where semantically similar words are closer.

2. Popular Word Embedding Techniques:

○ Word2Vec: Learns word representations using neural networks.

○ GloVe (Global Vectors): Captures word co-occurrence statistics.
○ FastText: Represents subword information (useful for rare words).

Example:

● Words like "king" and "queen" might have embeddings like:

king=[0.5,0.8,0.2],queen=[0.5,0.9,0.1]\text{king} = [0.5, 0.8, 0.2], \quad \text{queen} =
[0.5, 0.9, 0.1]
3. Applications:
○ Text classification.
○ Sentiment analysis.
○ Machine translation.

Mathematical Insight: Word2Vec (CBOW and Skip-gram)

1. CBOW (Continuous Bag of Words):
Predict a target word from its context.
P(word∣context)=softmax(WTvc)P(\text{word} | \text{context}) = \text{softmax}(W^T v_c)
2. Skip-gram:
Predict surrounding context words from the target word.
P(context∣word)=softmax(WTvw)P(\text{context} | \text{word}) = \text{softmax}(W^T
v_w)

4. Training a Large Language Model (LLM)

Steps in Training:

1. Preprocessing:

○ Clean and tokenize the data.

○ Example: Split "I love AI!" into ["I", "love", "AI"].
2. Embedding Layer:

○ Transform tokens into vectors.

3. Model Architecture:

○ Use Transformers (e.g., GPT, BERT).

4. Loss Function:

○ Use Cross-Entropy Loss to minimize the difference between predicted and

actual words.

5. Evaluation Metrics for Generative AI Tasks

Task Metric Description

Classification Accuracy, Precision, Measures model performance on labeled data.

Recall

Summarization ROUGE Score Compares overlap between generated and

reference summaries.

Question BLEU, F1 Score Measures accuracy of predicted answers.

Answering
Text Generation Perplexity, BLEU Perplexity evaluates how well the model
Score predicts sequences.

6. Pre-training and Transfer Learning

Pre-training:

Train a model on a large, general-purpose dataset to learn basic features.

Transfer Learning:

Fine-tune the pre-trained model on a specific task.

7. Architecture and Training Process of GANs

Steps in Training GANs:

1. Train the Discriminator to distinguish between real and fake data.

2. Train the Generator to produce fake data that fools the Discriminator.

Diagram:
Random Noise --> [Generator] --> Fake Data --> [Discriminator] --> Real/Fake

8. Tutorial on GANs Using TensorFlow (Image Generation)

Code Example: Basic GAN

import tensorflow as tf
from tensorflow.keras import layers

# Generator Model
def build_generator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=100),
layers.Dense(784, activation="sigmoid")
])
return model

# Discriminator Model
def build_discriminator():
model = tf.keras.Sequential([
layers.Dense(128, activation="relu", input_dim=784),
layers.Dense(1, activation="sigmoid")
])
return model

# Compile GAN
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# GAN Model
discriminator.trainable = False
gan_input = layers.Input(shape=(100,))
generated_image = generator(gan_input)
gan_output = discriminator(generated_image)
gan = tf.keras.Model(gan_input, gan_output)
gan.compile(optimizer="adam", loss="binary_crossentropy")

# Training Loop
import numpy as np

def train_gan(generator, discriminator, gan, epochs=10000, batch_size=32):

for epoch in range(epochs):
# Real data
real_images = np.random.normal(size=(batch_size, 784))
real_labels = np.ones((batch_size, 1))

# Fake data
noise = np.random.normal(size=(batch_size, 100))
fake_images = generator.predict(noise)
fake_labels = np.zeros((batch_size, 1))
# Train Discriminator
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
# Train Generator
noise = np.random.normal(size=(batch_size, 100))
g_loss = gan.train_on_batch(noise, np.ones((batch_size, 1)))
if epoch % 1000 == 0:
print(f"Epoch {epoch}: D Loss: {d_loss_real + d_loss_fake}, G Loss: {g_loss}")
train_gan(generator, discriminator, gan)
UNIT 3
1. Introduction to Large Language Models (LLMs)

What Are LLMs?

Large Language Models (LLMs) are advanced AI systems trained on massive amounts of
textual data to understand, generate, and manipulate natural language. Examples include GPT
(Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from
Transformers), and OpenAI's ChatGPT.

Key Features:

● Ability to generate coherent, contextually accurate text.

● Support for diverse tasks like translation, summarization, and answering questions.
● Learn patterns, grammar, and even reasoning from data.

2. Generative AI and Large Language Models

Generative AI Overview

Generative AI models aim to create new data that resembles the training data, such as text,
images, music, and videos.

● Example Tasks: Generating text for writing, creating AI art, or simulating dialogue in
chatbots.

Large Language Models in Generative AI

LLMs are the backbone of text-based generative AI systems.

● Example: ChatGPT generates human-like conversations using LLM technology.

3. Language Models and Foundation Models

What Are Language Models?

Language models predict the next word in a sequence based on the context.

● Given input: "The cat sat on the",

● Language Model predicts: "mat."

Applications of Language Models:

1. Autocomplete: Predicts the next word while typing.

2. Translation: Converts text between languages.
3. Speech Recognition: Transforms spoken words into text.

Foundation Models

Foundation models are large, pre-trained AI models designed to serve as a base for various
tasks.

● Example: GPT-3 is a foundation model that powers multiple NLP applications.

Key Idea: Train once on large datasets, fine-tune later for specific tasks.

4. Development of Neural Natural Language Processing (NLP)

How Did Neural NLP Evolve?

Before Neural NLP:

1. Rule-Based Systems: Models relied on human-designed rules.

2. Statistical Models: Probabilistic models like n-grams predicted word sequences but
struggled with long-term context.

Introduction of Neural Networks in NLP:

1. Recurrent Neural Networks (RNNs): Used for sequential data but had difficulty
remembering long sequences (vanishing gradient problem).
2. LSTMs (Long Short-Term Memory) & GRUs: Improved memory handling in sequential
tasks.
3. Transformers: Revolutionized NLP by addressing sequence processing efficiently with
attention mechanisms.

Impact of Neural NLP:

1. Enhanced context understanding.

2. Improved accuracy in tasks like translation and sentiment analysis.
3. Enabled the creation of LLMs like BERT and GPT.

5. Responsible AI
What Is Responsible AI?

Responsible AI ensures AI systems are developed and deployed ethically, fairly, and
transparently.

Core Principles of Responsible AI:

1. Fairness: Avoiding bias in predictions or decisions.

2. Transparency: Explaining how AI makes decisions.
3. Privacy: Protecting user data.
4. Accountability: Taking responsibility for AI's impact.

Importance in Generative AI and LLMs:

1. Generative AI can inadvertently spread misinformation.

2. LLMs may reflect biases present in their training data.
3. Ethical use is crucial for trust and adoption.

6. One-Shot Learning

Definition:

One-shot learning refers to a model's ability to learn a new task from just one or very few
examples.

● Contrast: Traditional models require thousands of examples to learn effectively.

How One-Shot Learning Works:

1. Leverages prior knowledge gained during pre-training.

2. Generalizes patterns to adapt to new tasks with minimal data.

Examples of One-Shot Learning:

● Language Models:
GPT models can perform translation after seeing just one example during fine-tuning.

Example:

● Input: "Translate to French: I love AI."

● Model learns from this single instance to handle other translations.

Techniques Supporting One-Shot Learning:

1. Meta-Learning:

○ Train models to learn how to learn.

○ Example: Matching Networks, Prototypical Networks.
2. Pre-training and Fine-tuning:

○ Pre-train on a general task (e.g., language understanding).

○ Fine-tune for a specific task (e.g., customer support chatbot).

Mathematics and Architecture Supporting LLMs

Transformer Model Architecture:

Diagram:

Input Text --> Tokenization --> Embedding --> [Multi-Head Attention + Feedforward] --> Output

1. Tokenization: Converts text into smaller units (tokens).

2. Embedding Layer: Maps tokens to vectors.
3. Attention Mechanism: Focuses on important parts of the sequence.

Attention Mechanism Formula:

Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) =
\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

● QQ: Query matrix.

● KK: Key matrix.
● VV: Value matrix.
● dkd_k: Dimension of the key.
4. Output Layer: Generates predictions based on learned representations.

Generative AI vs Traditional AI
Aspect Generative AI Traditional AI

Goal Create new content. Analyze or classify existing data.

Examples Text generation, image Spam filtering, recommendation.

synthesis.
Key Models GANs, VAEs, Transformers. Decision Trees, Logistic
Regression.

Training LLMs: From Basics to Fine-Tuning

1. Pre-training:

○ Train the model on a massive corpus of text to understand language patterns.

○ Example: OpenAI’s GPT models are pre-trained on diverse datasets.
2. Fine-tuning:

○ Adjust the model for specific tasks using labeled data.

○ Example: Fine-tune GPT for customer service by training on dialogue data.

Final Thoughts on Generative AI and LLMs

1. Revolutionary Impact: LLMs have transformed industries from healthcare (automating

medical reports) to entertainment (creating scripts).
2. Challenges: High computational costs, ethical concerns, and bias mitigation remain
areas for improvement.
3. Future Directions: Enhanced one-shot learning, better Responsible AI practices, and
hybrid models combining generative and discriminative capabilities.
UNIT 4

1. Introduction to Prompt Engineering

What is Prompt Engineering?

Prompt engineering is the process of crafting effective prompts (inputs) to elicit the desired
response from a Large Language Model (LLM). Prompts guide LLMs to perform tasks
accurately by providing clear instructions, examples, or context.

Key Elements of a Prompt:

1. Instruction: Clearly specify the task (e.g., "Translate to French").

2. Context: Provide background information to guide the model.
3. Examples (Optional): Demonstrate how to complete the task.

Why is Prompt Engineering Important?

1. Enhances Model Performance: Helps the model understand the task more precisely.
2. Task Flexibility: Enables diverse applications without retraining the model.
3. Cost Efficiency: Reduces the need for fine-tuning or additional data.

2. LLM Apps Using Prompt Engineering: BARD and ChatGPT (GenAI

Chatbots)

BARD (Google's GenAI Chatbot):

● BARD is based on Google’s PaLM 2 (Pathways Language Model).

● It excels in multi-turn dialogue, reasoning, and language understanding.
● Uses prompt engineering for conversational responses and task-specific operations.

Example of a Prompt for BARD:

Prompt:
"Explain quantum mechanics in simple terms suitable for a 12-year-old."

Response:
"A quantum particle behaves like a wave and a particle at the same time…"

ChatGPT (OpenAI’s Chatbot):

● Built on GPT (Generative Pre-trained Transformer) models.
● Implements few-shot, one-shot, and zero-shot learning for task completion.

Prompt Engineering in ChatGPT:

Zero-Shot Prompt:
"Write a haiku about the moon."
Few-Shot Prompt:
"Here are two haikus:

● Morning dew glistens / on the petal of a rose / fleeting yet timeless.

● Gentle breeze whispers / through the tall grass of the field / a song of freedom.
Write one about the stars."

3. LLMs That Power Chatbots: Architecture and Training Data

Architecture of LLMs in Chatbots:

1. Input Layer: Accepts user queries and processes them into tokens (smaller units).
2. Embedding Layer: Converts tokens into dense numerical vectors.
3. Transformer Layers:
○ Uses attention mechanisms to focus on relevant parts of the input.
○ Each layer refines the representation of the input for better understanding.
4. Output Layer: Produces the response text by generating one token at a time.

Diagram:

User Input --> Tokenization --> Transformer Layers --> Output Tokens --> Response

Training Data for Chatbots:

1. Public Datasets: Wikipedia, Common Crawl, BooksCorpus.

2. Domain-Specific Data: Custom data for specialized tasks (e.g., customer service).
3. Conversational Data: Chat history or dialogue datasets to train for multi-turn
conversations.

4. Tutorial Codes for CNNs, RNNs, and VAEs Using TensorFlow

A. Convolutional Neural Networks (CNNs):

Used in image recognition but also for text classification tasks.

Code Example: Text Classification with CNNs

import tensorflow as tf
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense

# Sample text data preprocessing

vocab_size = 5000
max_length = 100

model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64, input_length=max_length),
Conv1D(128, 5, activation='relu'),
GlobalMaxPooling1D(),
Dense(64, activation='relu'),
Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.summary()

B. Recurrent Neural Networks (RNNs):

Used for sequential data like text or time-series.

Code Example: Text Generation with RNNs

from tensorflow.keras.layers import SimpleRNN, Embedding

model = tf.keras.Sequential([
Embedding(input_dim=vocab_size, output_dim=64),
SimpleRNN(128, return_sequences=True),
SimpleRNN(128),
Dense(vocab_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.summary()

C. Variational Autoencoders (VAEs):

Generative models for creating new data similar to the training set.
Code Example: VAE for Text Generation

from tensorflow.keras.layers import Input, Dense, Lambda

from tensorflow.keras.models import Model

# Define encoder
input_dim = 784
latent_dim = 2

inputs = Input(shape=(input_dim,))
h = Dense(256, activation='relu')(inputs)
z_mean = Dense(latent_dim)(h)
z_log_var = Dense(latent_dim)(h)

# Sampling layer
def sampling(args):
z_mean, z_log_var = args
epsilon = tf.random.normal(shape=(tf.shape(z_mean)[0], latent_dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = Lambda(sampling)([z_mean, z_log_var])

# Define decoder
decoder_h = Dense(256, activation='relu')
decoder_mean = Dense(input_dim, activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)

# VAE model
vae = Model(inputs, x_decoded_mean)
vae.compile(optimizer='adam', loss='binary_crossentropy')
vae.summary()

5. Building RAG (Retrieval-Augmented Generation) Systems

What is RAG?

Retrieval-Augmented Generation combines a generative model (e.g., GPT) with an information

retrieval system.

How RAG Works:

1. Retriever: Fetches relevant data from a knowledge base based on the query.
2. Generator: Uses the retrieved data to craft a detailed, accurate response.

Applications of RAG:

● Customer support systems.

● Scientific data summarization.

Example: OpenAI’s GPT combined with a company’s internal document database.

6. Fine-Tuning LLMs

What is Fine-Tuning?

Fine-tuning adapts a pre-trained LLM to a specific domain or task by retraining it on a smaller,

task-specific dataset.

Steps for Fine-Tuning an LLM:

1. Prepare domain-specific labeled data.

2. Add a task-specific output layer to the LLM.
3. Retrain the model on the dataset.
4. Evaluate performance and adjust.

Tools for Fine-Tuning:

● Hugging Face Transformers Library.

● OpenAI Fine-tuning API.

Code Example:

from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments

# Load model and tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Fine-tuning data preparation

train_text = ["Your fine-tuning dataset here."]
train_encodings = tokenizer(train_text, truncation=True, padding=True, max_length=512)

# Define training arguments

training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=10_000,
save_total_limit=2
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_encodings
)

trainer.train()

7. Optimizing LLM Performance

Techniques for Optimization:

1. Prompt Engineering: Craft better prompts for task clarity.

2. Model Pruning: Remove unnecessary parameters for efficiency.
3. Knowledge Distillation: Compress large models into smaller ones while retaining
performance.
4. Learning Rate Scheduling: Adjust learning rates during training.
5. Batch Size Optimization: Use appropriate batch sizes for computational resources.
UNIT 5
Stable Diffusion: In-Depth Exploration

Stable Diffusion is a cutting-edge generative AI model that synthesizes images from textual
descriptions, noise, or incomplete data. Based on latent diffusion models, Stable Diffusion
offers a versatile framework for high-quality image generation and editing. It has gained
popularity due to its balance of computational efficiency and output quality.

1. Introduction to Stable Diffusion

Stable Diffusion is a deep learning model designed to generate high-resolution, realistic
images from textual prompts or random noise. It is rooted in diffusion probabilistic models,
which iteratively refine noisy data into meaningful images. It’s an open-source implementation
that democratizes access to generative AI for various applications like art generation, video
creation, and content editing.

Key Features

● Text-to-Image Generation: Generates photorealistic images from descriptive prompts.

● Image Editing: Modifies images by reapplying diffusion steps with altered conditions.
● Computational Efficiency: Uses latent representations to reduce computational costs
without sacrificing quality.
● Scalability: Applicable to multiple domains beyond imaging, including video and 3D
rendering.

2. Components of Stable Diffusion

The architecture of Stable Diffusion is built on three key components:

A. Encoder and Decoder

● Purpose: Compress the input data (image) into a latent space using a Variational
Autoencoder (VAE).
● Working:
1. Encoder: Converts high-dimensional image data into a compressed latent
representation.
2. Decoder: Converts the latent representation back to image space after
processing.
B. Noise Prediction Model

● A U-Net architecture predicts the noise added during the forward diffusion process.
● Acts as the backbone of Stable Diffusion for denoising the corrupted latent space.

C. Text Encoder

● Utilizes CLIP (Contrastive Language-Image Pretraining) or similar models to encode

textual prompts into vector representations.
● These vectors condition the diffusion process to guide image generation in alignment
with the prompt.

3. Training Stable Diffusion

Steps in Training

1. Forward Diffusion:

○
Start with an image dataset.
○
Gradually add Gaussian noise to the images through multiple steps, converting
them into pure noise.
2. Backward Diffusion:

○ The model is trained to predict the noise added at each step using the U-Net.
○ The training objective is to minimize the mean squared error (MSE) between the
predicted and actual noise.
3. Latent Space Training:

○ Instead of directly processing high-dimensional images, they are encoded into a

smaller latent space for computational efficiency.

Loss Function

The objective is to minimize the error in predicting the noise:

L=Ez,ϵ,t[∥ϵ−ϵθ(zt,t)∥2]L = \mathbb{E}_{z, \epsilon, t} \left[ \| \epsilon - \epsilon_{\theta}(z_t, t)

\|^2 \right]

● ztz_t: Noisy latent representation.

● ϵ\epsilon: True noise.
● ϵθ\epsilon_{\theta}: Predicted noise by the model.
4. Stable Diffusion Inference: Generating Images
A. Process Overview

Inference in Stable Diffusion involves reversing the noise addition process (diffusion) and
generating a coherent image conditioned on a text prompt.

1. Text Conditioning:
The input text is encoded into a latent representation using the text encoder.

2. Latent Noise Sampling:

Start with a random noise sample in the latent space.

3. Denoising with U-Net:

Iteratively refine the noise sample over TT timesteps using the trained U-Net model.

4. Decoding:
Convert the refined latent representation into the final image using the decoder.

B. Example: Prompt-to-Image Generation

Input Prompt:

"A beautiful sunrise over a serene mountain lake, digital art style."

Output Steps:

1. Encode the text with the text encoder (e.g., CLIP).

2. Initialize a noisy latent vector.
3. Perform diffusion steps, conditioned on the text encoding.
4. Decode the final refined latent vector into an image.

5. Methods and Tools for Stable Diffusion

A. Libraries

1. Hugging Face Diffusers Library: A popular Python library for diffusion models.
2. CompVis Library: The official library for Stable Diffusion.
3. Automatic1111: A user-friendly UI for running Stable Diffusion locally.

B. Techniques

● Latent Optimization: Optimize directly in latent space for faster computation.

● ControlNet: Conditions image generation with external guidance (e.g., edge detection,
depth maps).
● Inpainting: Modify specific regions of an image while preserving the rest.
● LoRA (Low-Rank Adaptation): Fine-tune Stable Diffusion efficiently for new styles.

6. Different Versions of Stable Diffusion

Stable Diffusion v1

● The original implementation.

● Focuses on general-purpose text-to-image generation.

Stable Diffusion v2

● Improved text encoder for better alignment with prompts.

● Enhanced diversity in generated images.
● Higher resolution support (e.g., 768x768).

Customized Variants

● DreamBooth: Fine-tuned Stable Diffusion for personalizing generation.

● ControlNet: Allows structured control over the generation process.

7. Advanced Stable Diffusion Techniques

A. Fine-Tuning Stable Diffusion

● Customize the model for specific domains (e.g., medical imaging, architectural designs).
● Requires additional training data and computing resources.

B. Textual Inversion
● Learn new embeddings for specific concepts (e.g., a person’s face or unique objects)
without modifying the base model.

C. Prompt Engineering

● Craft specific prompts to elicit desired styles or themes in generated images.

● Example: Adding terms like "hyper-realistic, cinematic lighting" enhances image realism.

8. Tutorial Code for Stable Diffusion

Text-to-Image Generation with Hugging Face Diffusers
from diffusers import StableDiffusionPipeline

# Load the Stable Diffusion pipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda") # Use GPU for faster inference

# Generate an image from a prompt

prompt = "A majestic castle under a starry night, fantasy style"
image = pipe(prompt).images[0]

# Save the image

image.save("generated_image.png")

Image Inpainting Example

from PIL import Image
from diffusers import StableDiffusionInpaintingPipeline

# Load the inpainting pipeline

pipe = StableDiffusionInpaintingPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda")
# Load the input image and mask
input_image = Image.open("input_image.png")
mask_image = Image.open("mask_image.png")
# Generate inpainted image
result = pipe(prompt="Add a glowing sun in the background",
image=input_image,
mask_image=mask_image).images[0]
# Save the inpainted image
result.save("inpainted_image.png")
UNIT 6
Generative AI Applications in Detail

Generative AI is a transformative technology that enables machines to create content, perform

domain-specific tasks, and assist in decision-making across industries. Below is an in-depth
explanation of its applications across different domains.

1. Applications of Generative AI in Various Industries

Healthcare

● Drug Discovery: AI generates potential drug candidates using molecule simulation and
property prediction.
● Medical Imaging: Generative models enhance image quality and assist in diagnosis
(e.g., detecting brain strokes or tumors).
● Patient Records Synthesis: AI generates synthetic patient data for training machine
learning models without compromising privacy.

Automobile

● Autonomous Driving: AI generates real-world-like driving scenarios for training

autonomous vehicles.
● Design Prototyping: Automating the creation of car design mockups.

Entertainment

● Content Creation: AI generates realistic visuals, music, scripts, or even video game
levels.
● Film Restoration: AI recreates damaged frames in old movies using Generative
Adversarial Networks (GANs).

Retail and E-Commerce

● Product Description Generation: Automated generation of product titles and

descriptions.
● Recommendation Systems: Dynamic personalized recommendations using generative
approaches.

Education

● Content Creation: Creating educational materials, quizzes, and summaries.

● Language Learning: Chatbots trained for conversational practice.

Manufacturing

● 3D Prototyping: AI generates 3D models of products, accelerating the prototyping

phase.
● Process Optimization: Predicting and generating optimal workflows in factories.

2. Generative AI Applications in Finance

Generative AI is particularly impactful in financial services, as it can process vast amounts of

data, summarize insights, and assist in decision-making.

Case Study 1: Question & Answering on Financial Disclosures

Scenario:

Use an LLM to analyze 10-K filings (annual reports filed by public companies) and answer
financial questions.

Example Task:

● Input: "Summarize the key risks disclosed in Tesla's 10-K filing."

● Output: A concise summary of risks such as supply chain dependencies, market
competition, and technological challenges.

How it Works:

1. Preprocessing: The 10-K is tokenized and segmented into manageable chunks.

2. Query Understanding: LLM processes the question and retrieves relevant sections of
the document.
3. Answer Generation: The model generates a summarized response.

Benefits:

● Saves time in reading lengthy reports.

● Ensures precise and relevant financial insights.

Case Study 2: Fundamental Analysis Using LLMs

Scenario:

Analyzing a company's financial health using profit/loss statements, balance sheets, and
market trends.
Example Task:

● Input: "Compare Microsoft and Google in terms of profitability for the fiscal year 2023."
● Output: A detailed analysis of key metrics like gross margin, operating margin, and net
profit.

3. Creating Domain-Specific Chatbots Using LangChain

What is LangChain?

LangChain is a Python-based framework for building LLM-powered applications, particularly

domain-specific chatbots.

Steps to Create a Domain-Specific Chatbot:

1. Data Collection: Gather domain-specific data (e.g., customer service FAQs, financial
data).
2. Embed Documents: Use embeddings to convert text into vector representations for
retrieval.
3. Integration with LLM: Combine a retrieval mechanism (e.g., vector databases) with a
generative model for responses.
4. Fine-Tuning: Customize the chatbot’s responses for specific needs using domain
examples.

Example:

● Healthcare Chatbot: Answers patient queries based on medical records and FAQs.
● Financial Chatbot: Provides financial advice based on company filings.

4. Content Creation Applications

A. Text Generation

● News Article Generation: AI generates entire articles based on a given headline.

Example Task:

● Input: "India launches Chandrayaan-3 to explore the Moon."

● Output:
"India’s Chandrayaan-3 mission aims to achieve a successful soft landing on the lunar
surface..."
B. Summarization of Speeches

Generative AI summarizes lengthy speeches into concise points.

Example Task:

● Input: Full text of a political leader’s speech.

● Output:
1. "Focusing on economic growth through technology."
2. "Improving education and healthcare access."

C. Sentiment Analysis with Few-Shot Prompting

Few-shot prompting allows sentiment analysis without large training datasets.

Example Task:

● Prompt:
"Analyze the sentiment of this statement: 'The product is amazing but delivery was
delayed.'"
● Response:
Sentiment: Mixed
Explanation: Positive sentiment about the product, but negative regarding delivery.

5. Video Creation

Generative AI can create videos using textual descriptions or existing footage.

Examples:

1. Explainer Videos: Create educational videos by converting text scripts into visuals.
2. Synthetic Avatars: AI generates avatars delivering scripted content.

Tools for Video Creation:

● DeepMotion: For avatar generation.

● RunwayML: Text-to-video generation.

Example Workflow for Text-to-Video Generation:

1. Input Text: "Create a 30-second video showing a bustling city street during sunset."
2. Output Video: AI generates a video clip with the specified features.
Tutorial Codes for Content Generation Tasks

A. Summarization Using OpenAI API

import openai

openai.api_key = "your-api-key"

response = openai.Completion.create(
engine="text-davinci-003",
prompt="Summarize this speech: 'Education is the backbone of society...'",
max_tokens=100
)

print(response['choices'][0]['text'])

B. Sentiment Analysis Using Hugging Face Transformers

from transformers import pipeline

# Load sentiment analysis pipeline

sentiment_analyzer = pipeline("sentiment-analysis")

# Analyze sentiment
text = "The product is amazing but delivery was delayed."
result = sentiment_analyzer(text)
print(result) # Output: [{'label': 'POSITIVE', 'score': 0.95}]

C. LangChain Example for Chatbots

from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
# Load LLM and vector database
llm = OpenAI(model="gpt-3.5-turbo")
vector_db = Chroma("domain-specific-data")
# Build chatbot
chatbot = ConversationalRetrievalChain.from_llm(llm, retriever=vector_db.as_retriever())
# Query chatbot
response = chatbot.run("What are the company's key risks?")
print(response)

AI - ML in Healthcare - Notes
No ratings yet
AI - ML in Healthcare - Notes
34 pages
BCA Software Engineering 1-5 Unit
No ratings yet
BCA Software Engineering 1-5 Unit
120 pages
Lecture 2 Prompt Engineering
No ratings yet
Lecture 2 Prompt Engineering
60 pages
Yugandar - Generative AI Architect
No ratings yet
Yugandar - Generative AI Architect
8 pages
PRINCE2 7 AI Practice Guide v1.1
100% (1)
PRINCE2 7 AI Practice Guide v1.1
50 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
Intro To Intelligent Apps Workshop
100% (1)
Intro To Intelligent Apps Workshop
106 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Generative AI
100% (1)
Generative AI
107 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Unit 4 Generative AI
No ratings yet
Unit 4 Generative AI
5 pages
Lecture Generative AI and Whole Cell Modeling
No ratings yet
Lecture Generative AI and Whole Cell Modeling
50 pages
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
27 pages
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Top 50 GenAI Interview Questions
No ratings yet
Top 50 GenAI Interview Questions
3 pages
Generative Adversial Network
No ratings yet
Generative Adversial Network
21 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Agenti Ai Comparison
No ratings yet
Agenti Ai Comparison
2 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Vector Databases
No ratings yet
Vector Databases
35 pages
Generative AI - 48 Hours TOC
100% (1)
Generative AI - 48 Hours TOC
4 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
A Practical Primer To AI Agents 1736197641
No ratings yet
A Practical Primer To AI Agents 1736197641
23 pages
Day 1
No ratings yet
Day 1
32 pages
320 Cohort 9 Report Final
No ratings yet
320 Cohort 9 Report Final
46 pages
Internship Papers Previous
No ratings yet
Internship Papers Previous
52 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
The Generative AI Revolution-Understanding, Innovating and Capitalizing
100% (1)
The Generative AI Revolution-Understanding, Innovating and Capitalizing
62 pages
Transformers
No ratings yet
Transformers
21 pages
Introduction-to-Artificial-Intelligence-AI (1) - Tsukuyomi
No ratings yet
Introduction-to-Artificial-Intelligence-AI (1) - Tsukuyomi
8 pages
GANppt
100% (1)
GANppt
34 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
Everything You Need To Know About Small Language Models (SLM) and Its Applications
No ratings yet
Everything You Need To Know About Small Language Models (SLM) and Its Applications
3 pages
KAG Graph + Multimodal RAG + LLM Agents = Powerful AI Reasoning - by Gao Dalie (高達烈) - in Towards AI - Freedium
No ratings yet
KAG Graph + Multimodal RAG + LLM Agents = Powerful AI Reasoning - by Gao Dalie (高達烈) - in Towards AI - Freedium
13 pages
GenAI Pinnacle Roadmap
100% (1)
GenAI Pinnacle Roadmap
8 pages
Generative AI 1
No ratings yet
Generative AI 1
40 pages
Langchain Retrieval Augmented Generation White Paper
100% (1)
Langchain Retrieval Augmented Generation White Paper
23 pages
Lesson 01 Getting Started With GenAI
No ratings yet
Lesson 01 Getting Started With GenAI
48 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Generative AI
No ratings yet
Generative AI
2 pages
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
No ratings yet
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
12 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
DLL Matatag - Tle 7 Q1 W1
No ratings yet
DLL Matatag - Tle 7 Q1 W1
15 pages
Generative AI
No ratings yet
Generative AI
5 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
Generative AI
No ratings yet
Generative AI
2 pages
AI-Assisted Education: An Investigation Into Its Perceived Effects On Teacher Efficiency and Student Performance
100% (1)
AI-Assisted Education: An Investigation Into Its Perceived Effects On Teacher Efficiency and Student Performance
8 pages
Projects GenAI Pinnacle Program
No ratings yet
Projects GenAI Pinnacle Program
14 pages
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
Lab7 LLM Chains
No ratings yet
Lab7 LLM Chains
7 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Icso Sample Paper Class-10 2024-25
100% (1)
Icso Sample Paper Class-10 2024-25
2 pages
Chatbot Application For Tourism Using Deep Learning
No ratings yet
Chatbot Application For Tourism Using Deep Learning
5 pages
MFA Thesis Guide MICA 2023
No ratings yet
MFA Thesis Guide MICA 2023
13 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Аlех Кhаng Technologies in the Medical
No ratings yet
Аlех Кhаng Technologies in the Medical
458 pages
The Future of Global AI Governance 20230724ƒ
No ratings yet
The Future of Global AI Governance 20230724ƒ
76 pages
Master Thesis Financial Risk Management
100% (3)
Master Thesis Financial Risk Management
6 pages
Presentation 1
No ratings yet
Presentation 1
5 pages
Artificial Intelligence - 2
No ratings yet
Artificial Intelligence - 2
11 pages
Sil 1
No ratings yet
Sil 1
10 pages
Customer Churn Prediction Report
No ratings yet
Customer Churn Prediction Report
4 pages
PCAST Written Public Comments Nov 2023
No ratings yet
PCAST Written Public Comments Nov 2023
83 pages
Natural Language Processing Rahul Sahai
No ratings yet
Natural Language Processing Rahul Sahai
30 pages
Memoire: Universite Ibn Khaldoun - Tiaret
No ratings yet
Memoire: Universite Ibn Khaldoun - Tiaret
88 pages
Ai Care: Artificial Intelligence Healthcare Solution Provider COMSATS University Islamabad
No ratings yet
Ai Care: Artificial Intelligence Healthcare Solution Provider COMSATS University Islamabad
17 pages
A Survey of Machine Learning Methods For Iot and Their Future Applications
No ratings yet
A Survey of Machine Learning Methods For Iot and Their Future Applications
5 pages
Cost Anlysis
No ratings yet
Cost Anlysis
1 page
Lecture 5 - AI and ML
No ratings yet
Lecture 5 - AI and ML
21 pages
A Brief History of AI: How To Prevent Another Winter (A Critical Review)
No ratings yet
A Brief History of AI: How To Prevent Another Winter (A Critical Review)
21 pages
Grass Report VF
No ratings yet
Grass Report VF
19 pages
Virtual Assistant For The Blind
No ratings yet
Virtual Assistant For The Blind
7 pages
Intelligent Systems Technology Robots A
No ratings yet
Intelligent Systems Technology Robots A
4 pages
AI Is Used in Different Areas
No ratings yet
AI Is Used in Different Areas
5 pages
5 TH International Conference On NLP & Data Mining (NLDM 2025)
No ratings yet
5 TH International Conference On NLP & Data Mining (NLDM 2025)
2 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
01A - AI, ML and DL - Definitions and Relationships
No ratings yet
01A - AI, ML and DL - Definitions and Relationships
10 pages
Bcs Higher Education Qualifications BCS Level 5 Diploma in IT
No ratings yet
Bcs Higher Education Qualifications BCS Level 5 Diploma in IT
4 pages
Comprehensive CS Career Paths
No ratings yet
Comprehensive CS Career Paths
4 pages
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
From Everand
Implement NLP use-cases using BERT: Explore the Implementation of NLP Tasks Using the Deep Learning Framework and Python (English Edition)
Amandeep
No ratings yet