100% found this document useful (1 vote)
63 views86 pages

Gen AI Unit 1

The document outlines a course on Generative AI, focusing on fundamental concepts, models, and applications in various fields such as image, text, audio, and video generation. It details course objectives and outcomes, emphasizing the understanding and application of generative models, transformers, and APIs. Additionally, it discusses the workings of different generative models like GANs and VAEs, and their applications in data augmentation and real-world problem-solving.

Uploaded by

23adl05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
63 views86 pages

Gen AI Unit 1

The document outlines a course on Generative AI, focusing on fundamental concepts, models, and applications in various fields such as image, text, audio, and video generation. It details course objectives and outcomes, emphasizing the understanding and application of generative models, transformers, and APIs. Additionally, it discusses the workings of different generative models like GANs and VAEs, and their applications in data augmentation and real-world problem-solving.

Uploaded by

23adl05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

U21AD603 – Generative AI

Google Classroom: 47ugmsr

1
COURSE OBJECTIVES:
• To learn the fundamental concepts of generative AI
• To acquire the knowledge of encoders, decoders and autoregressive models
• To acquire the knowledge of various generative models for image generation,
style transfer and text generation
• To learn to apply transformers, prompt engineering and APIs for real world
problems
• To learn to implement develop application using chat GPTs and open API

2
COURSE OUTCOMES:
Upon completion of the course, the student will be able to

CO1: To understand the fundamental concepts of generative AI (Understand)


CO2: To understand the encoders, decoders and autoregressive models
(Understand)
CO3: To apply various generative models for image generation, style transfer and
text generation (Apply)
CO4: To apply transformers, prompt engineering and APIs for real world problems
(Apply)
CO5: To develop application using chat GPTs and open API (Apply)

3
Unit - 1
An Introduction to Generative AI – Applications of AI – The
rules of probability – Why use generative models – Unique
challenges of generative models

4
An Introduction to Generative AI

5
But what is artificial intelligence?
AI is a branch of computer science that deals with the creation
of intelligence agents, which are systems that can reason, and
learn, and act autonomously.
Essentially, AI has to do with the theory and methods to build
machines that think and act like humans.
In this discipline, we have machine learning, which is a subfield
of AI.
It is a program or system that trains a model from input
data. That trained model can make useful predictions from new
or never before seen data drawn from the same one used to
train the model.
Machine learning gives the computer the ability to learn
without explicit programming.

6
7
8
Deep Learning

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
• In transformers, hallucinations are words or phrases that are generated by the
model that are often nonsensical or grammatically incorrect.
• Hallucinations can be caused by a number of factors, including the model is not
trained on enough data, or the model is trained on noisy or dirty data, or the
model is not given enough context, or the model is not given enough constraints.
• Hallucinations can be a problem for transformers because they can make the
output text difficult to understand.

25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Applications of Generative AI
• Generative artificial intelligence allows machines to not just learn from data, but
to generate new information that’s similar to the input used to train it. The
implications are multi-dimensional, as the technology can be used in design,
music, art, and more.

41
• In an effort to understand why the technology is mostly used for text
applications, we’ll explore the main applications in detail.
• Audio applications
• Text applications
• Conversational applications
• Data augmentation
• Video/visual applications

42
Audio applications
• To create new sounds from existing data. After the models are trained, they can
create new audio that’s original and unique.
• Each model uses different types of prompts to generate audio content, which
can be:
• Environmental data
• MIDI data
https://jukebox.openai.com/
• User input in real-time
• Text prompts
• Existing audio recordings

43
• There are several applications of generative AI audio models:
1. Data sonification
2. Interactive audio experiences
3. Music generation and composition https://www.aiva.ai/

4. Audio enhancement and restoration


5. Sound effects creation and synthesis
6. Audio captioning and transcription (speech-to-text)
7. Speech synthesis and voice cloning
8. Personalized audio content

44
How do generative AI audio models work?
• WaveNet
• Created by Google DeepMind, WaveNet is a generative audio model grounded on
deep neural networks. Using dilated convolutions, it creates great-quality audio
by referencing previous audio samples.
• Its operational flow consists of:
• Waveform sampling. WaveNet starts with an input waveform, usually a
sequence of audio samples, processed through multiple convolutional layers.
• Dilated convolution. To recognize long-spanning dependencies in audio
waveforms, WaveNet employs dilated convolutional layers. The dilation
magnitude sets the receptive field's size in the convolutional layer, helping the
model distinguish extended patterns.

https://iq.opengenus.org/dilated-convolution/

45
Convolution Dilated convolution

46
• Autoregressive model. Functioning autoregressively, WaveNet sequentially
generates audio samples, each influenced by its predecessors. It then forecasts
the likelihood of the upcoming sample based on prior ones.

• Sampling mechanism. To draw audio samples from the model's predicted


probability distribution, WaveNet adopts a softmax sampling approach, ensuring
varied and realistic audio output.

• Training protocol. The model undergoes training using a maximum possibility


estimation technique, which is designed to increase the training data's
probability when it comes to the model's parameters.

47
48
• Generative Adversarial Networks (GANs)
• A GAN encompasses two neural networks: a generator for creating audio
samples and a discriminator for judging their authenticity.
• Architecture. GANs are structured with a generator and discriminator. The
generator ingests a random noise vector, outputting an audio sample, while the
discriminator evaluates the audio's authenticity.
• Training dynamics. The generator creates audio samples from random noise
during training and the discriminator's task is to categorize them. Working
together, the generator refines its output to appear genuine to the discriminator,
and this synchronization is executed by reducing the binary cross-entropy loss
between the discriminator's findings and the actual labels of each sample.

49
• Adversarial loss. GANs aim to reduce the adversarial loss, which is the gap
between real audio sample distributions and fake ones. This minimization rotates
between the generator's enhancements for more authentic output and the
discriminator's improvements in differentiating real from generated audio.

• Audio applications. GANs have various audio purposes, such as music creation,
audio style modulation, and audio rectification. For music creation, the generator
refines itself to form new musical outputs. For style modulation, it adapts the
style from one sample to another. For rectification, it's trained to eliminate noise
or imperfections.

50
51
Text applications
• Artificial intelligence text generators use AI to create written copy, which can be
helpful for applications like website content creation, report and article
generation, social media post creation, and more.
• There are several applications of generative AI text models:
1. Language translation
2. Content creation
3. Summarization
4. Chatbot and virtual assistants
5. SEO-optimized content

52
How do generative AI text models work?
• AI-driven content generators use natural language processing (NLP) and natural
language generation (NLG) techniques to create text.
Algorithmic structure and training
• Content-based on NLG is crafted and structured by algorithms. These are
typically text-generation algorithms that undergo an initial phase of unsupervised
learning. During this phase, a language transformer model immerses itself in vast
datasets, extracting a variety of insights.
• By training on extensive data, the model becomes skilled in creating precise
vector representations. This helps in predicting words, phrases, and larger textual
blocks with heightened context awareness.

53
Evolution from RNNs to transformers
• Data collection and pre-processing. Text data gathering, cleaning, and tokenization
into smaller units for model inputs.
• Model training. The model is trained on token sequences, and it adjusts its
parameters in order to predict the next token in a sequence according to the previous
ones.
• Generation. After the model is trained, it can create new text by predicting one token
at a time based on the provided seed sequence and on tokens that were previously
generated.
• Decoding strategies. You can use different strategies, such as beam search,
top-k/top-p sampling, or greedy coding to choose the next token.
• Fine-tuning. The pre-trained models are regularly adjusted on particular tasks or
domains to improve performance.

54
Conversational applications
1. Natural Language Understanding (NLU)
• Conversational AI uses sophisticated NLU techniques to understand and interpret
the meanings behind user statements and queries. Through analyzing intent,
context, and entities in user inputs, conversational AI can then extract important
information to generate appropriate answers.

2. Speech recognition
• Conversational AI systems use advanced algorithms to transform spoken
language into text. This lets the systems understand and process user inputs in
the form of voice or speech commands.

55
3. Natural language generation (NLG)
• To generate human-like answers in real time, conversational AI systems use NLG
techniques. By taking advantage of pre-defined templates, neural networks, or
machine learning models, the systems can create meaningful and contextually
appropriate answers to queries.

4. Dialogue management
• Using strong dialogue management algorithms, conversational AI systems can
maintain a context-aware and coherent conversation. The algorithms allow AI
systems to understand and answer user inputs in a natural and human-like way.

56
How do generative AI conversational models
work?

57
Data augmentation
• Through AI, you can create new, synthetic data points that can be added to an
already existing dataset. This is typically used in machine learning and deep
learning applications to enhance model performance, achieved by increasing
both the size and the diversity of the training data.
• Data augmentation can help to overcome challenges of imbalance or limited
datasets. By creating new data points similar to the original data, data scientists
can make sure that models are stronger and better at generalizing unseen data.
• Generative AI models like Variational Autoencoders (VAEs) and Generative
Adversarial Networks (GANs) are promising for the generation of high-quality
synthetic data.

58
• Variational Autoencoders (VAEs)
• Type of generative model that utilizes an encoder-decoder architecture. The
encoder learns a lower-dimensional representation (latent space) of the input
data and the decoder rebuilds the input data from the latent space.
• VAEs force a probabilistic structure on the latent space that lets them create new
data points by sampling from learned distribution. These models are useful for
data augmentation tasks with input data that has a complex structure, like text or
images.

59
• Generative Adversarial Networks (GANs)
• Consisting of two neural networks, a discriminator and a generator, that are
simultaneously trained. The generator creates synthetic data points and the
discriminator assesses the quality of the created data by comparing it to the
original data.
• Both the generator and the discriminator compete against each other, with the
generator attempting to create realistic data points to deceive the discriminator.
The discriminator tries to accurately tell apart real and generated data, and as
the training progresses, the generator gets better at producing high-quality
synthetic data.

60
• There are several applications of generative AI data augmentation models:
1. Medical imaging
• The generation of synthetic medical imaging like MRI scans or X-rays helps to
increase the size of training datasets and enhance diagnostic model performance.
2. Natural language processing (NLP)
• Creating new text samples by changing existing sentences, like replacing words
with synonyms, adding noise, or changing word order. This can help enhance the
performance of machine translation models, text classification, and sentiment
analysis.
3. Computer vision
• The enhancement of image datasets by creating new images with different
transformations, like translations, rotations, and scaling. Can help to enhance the
performance of object detection, image classification, and segmentation models.

61
4. Time series analysis
• Generating synthetic time series data by modeling underlying patterns and
creating new sequences with similar characteristics, which can help enhance the
performance of anomaly detection, time series forecasting, and classification
models.
5. Autonomous systems
• Creating synthetic sensor data for autonomous vehicles and drones allows the
safe and extensive training of artificial intelligence systems without including
real-world risks.
6. Robotics
• Generating both synthetic objects and scenes lets robots be trained for tasks like
navigation and manipulation in virtual environments before they’re deployed
into the real world.

62
How do generative AI data augmentation models
work?

Text data augmentation


• Sentence or word shuffling. Change the position of a sentence or word
randomly.
• Word replacement. You can replace words with synonyms.
• Syntax-tree manipulation. Paraphrase the sentence by using the same word.
• Random word insertion. Add words at random.
• Random word deletion. Remove words at random.

63
Audio data augmentation
• Noise injection. Add random or Gaussian noise to audio datasets to enhance
model performance.
• Shifting. Shift the audio left or right with random seconds.
• Changing speed. Stretches the times series by a fixed rate.
• Changing pitch. Change the audio pitch randomly.

64
Image data augmentation
• Color space transformations. Change the RGB color channels, brightness, and
contrast randomly.
• Image mixing. Blend and mix multiple images.
• Geometric transformations. Crop, zoom, flip, rotate, and stretch images
randomly; however, be careful when applying various transformations on the
same images, as it can reduce the model’s performance.
• Random erasing. Remove part of the original image.
• Kernel filters. Change the blurring or sharpness of the image randomly.

65
Visual/video applications
1. Content creation
• Generative models can be used to create original video content, such as
animations, visual effects, or entire scenes.
2. Video enhancement
• Generative models can upscale low-resolution videos to higher resolutions, fill in
missing frames to smooth out videos, or restore old or damaged video footage.
3. Personalized content
• Generative AI can change videos to fit individual preferences or requirements.
For example, a scene could be adjusted to show a viewer's name on a signboard,
or a product that the viewer had previously expressed interest in.

66
4. Virtual reality and gaming
• Generative AI can be used to generate realistic, interactive environments or
characters. This offers the potential for more dynamic and responsive worlds in
games or virtual reality experiences.
5. Training
• Due to its ability to create diverse and realistic scenarios, generative AI is great
for training purposes. It can generate various road scenarios for driver training or
medical scenarios for training healthcare professionals.
6. Data augmentation
• For video-based machine learning projects, sometimes there isn't enough data.
Generative models can create additional video data that's similar but not
identical to the existing dataset, which enhances the robustness of the trained
models.

67
7. Video compression
• Generative models can help in executing more efficient video compression
techniques by learning to reproduce high-quality videos from compressed
representations.
8. Interactive content
• Generative models can be used in interactive video installations or experiences,
where the video content responds to user inputs in real time.
9. Marketing and advertising
• Companies can use generative AI to create personalized video ads for viewers or
to quickly generate multiple versions of a video advertisement for A/B testing.
10. Video synthesis from other inputs
• Generative AI can produce video clips from textual descriptions or other types of
inputs, allowing for new ways of storytelling or visualization techniques.

68
How do generative AI video models work?
• Variational Autoencoders (VAEs). These models acquire a latent understanding of
videos and then craft new sequences by pulling samples from this acquired latent
domain.
• Generative Adversarial Networks (GANs). These models consist of a generator
and discriminator that work in tandem to produce lifelike videos.
• Recurrent Neural Networks (RNNs). Models adept at recognizing time-based
patterns in videos, producing sequences grounded in these identified patterns.
• Conditional generative models. These models create videos based on specific
given attributes or data. Factors like computational needs, intricacy, and
project-specific demands need to be taken into account when selecting.

69
The rule of Probability
• When implementing machine learning algorithms, you may have come across
situations where the environment that your algorithm is in non-deterministic,
i.e., you cannot guarantee the same output always for the same input. Similarly
in the real-world, there are scenarios such as these where the behavior can vary,
though the input remains the same.

• In ML, we think of separating the variables of our dataset into two broad classes:
• Independent data
• Dependent data

70
• Independent data
• Input (X)
• Categorical data
• Continuous data
• Ordinal data
• Dependent data
• Output (Y)
• Categorical data
• Continuous data
• Ordinal data
• Tensor

71
• Certainly the rate that you would assign to an event to happen. Say, you are
rolling a dice and you say that the certainty with which a 6 shows up on the dice
is ⅙. It means there’s a 16.67% chance that a 6 shows up on the dice. That’s the
certainty you allot to that particular event. This, in turn, is known as probability.
P(X=6)= 1/6
• Probability Axioms:
• The probability is nonnegative and finite number between 0 and 1. (softmax)
• The probability of the sample space is equal to 1.
• For every collection of mutually exclusive events, the probability of their union is the sum of
the individual probabilities.

72
• Conditional probability
• ML developers use conditional probability P(Y/X)
• Joint probability
• P(X,Y) = P(Y/X)P(X)

• Generative Models includes:


• Naïve Bayes classifier
• Gaussian Mixture Models
• Hidden Markov Models
• Deep Boltzmann machines
• GANs

73
1. Generative Models as Probability Distributions
• Generative models estimate the probability distribution of data, p(x),
where x represents real-world samples (e.g., text, images, speech).
Once the distribution is learned, the model can generate new
samples that resemble real data.

• Explicit Density Models: Learn an explicit probability distribution over data


(e.g., Variational Autoencoders, Normalizing Flows).
• Implicit Density Models: Do not explicitly learn p(x) but generate samples
through transformations (e.g., GANs).

74
2. Maximum Likelihood Estimation (MLE)
• Many generative models are trained using Maximum Likelihood Estimation
(MLE), which aims to maximize the probability of observed training data under
the learned model:

where:
• θ represents model parameters,
• xi are training samples,
• Pθ(x) is the estimated probability distribution.

75
3. Bayesian Inference and Latent Variables
• Some generative models introduce latent variables z to explain the observed
data x. These models define a joint probability distribution:

where:
• p(z) is the prior distribution over latent variables,
• p(x∣z) is the likelihood function modeling data given latent factors.
4. Probability Distributions in Generative AI
• Different generative models use specific probability distributions:
• Gaussian Distribution: Commonly used in VAEs for latent variable modeling.
• Categorical Distribution: Used in language models for word generation.
• Bernoulli/Beta Distribution: Used in binary image generation tasks.
• Exponential & Poisson Distributions: Used in event modeling.
76
5. Probabilistic Loss Functions
• Generative models optimize probability-based loss functions:
• Negative Log-Likelihood (NLL): Used in autoregressive models.
• Kullback-Leibler (KL) Divergence: Measures the difference between two
probability distributions, used in VAEs.
• Adversarial Loss (GANs): Minimizes the probability of the discriminator correctly
distinguishing real and fake samples.

77
Why use generative models?
• Deep Learning models used image classification, NLP, NLU and reinforcement
learning.
• MNIST dataset – predict the most likely digit (Y) given an image (X)
• Train the network that generate images
• Then classify the digits

78
Generating images
• No labeled data like MNIST, rather map the space of random numbers into a set
of artificial images using a latent vector Z.

https://this-person-does-not-exist.com/en
79
Style transfer and image transformation

• In addition to mapping artificial images to a space of random numbers, we can


also use generative models to learn a mapping between one kind of image and a
second. This kind of model can, for example, be used to convert an image of a
horse into that of a zebra.
• create deep fake videos in which one actor's face has been replaced with
another's, or transform a photo into a painting

80
81
• Chatbots
• Transformer
• NLP
• Sound composition

82
Unique challenges of generative models
• Most of these models utilize complex data, requiring us to fit large models to
capture all the nuances of their features and distribution.
• This has implications both for the number of examples that we must collect to
adequately represent the kind of data we are trying to generate, and the
computational resources needed to build the model.
• Having complex data, and the fact that we are trying to generate data rather than
a numerical label or value, is that our notion of model accuracy is much more
complicated: we cannot simply calculate the distance to a single label or scores.

83
• Generative models, as a whole, face several broad challenges that impact
their development, deployment, and real-world applicability.
1. Training Complexity and Stability
• Generative models require extensive computational resources and time to
train effectively.
• Instabilities in training, such as imbalanced learning dynamics, make
optimization difficult.
• Selecting appropriate loss functions and architectures is non-trivial and
often problem-specific.
2. Evaluation Challenges
• There are no universal metrics to assess the quality of generated outputs.
• Common evaluation methods, such as statistical similarity or perceptual
metrics, may not always align with human judgment.
• Measuring diversity, realism, and novelty remains an open research
problem.
84
3. Generalization vs. Overfitting
• Generative models may memorize training data instead of learning generalizable
patterns.
• Overfitting can limit their ability to generate novel and diverse outputs.
• Balancing memorization and creativity is a key challenge in generative learning.
4. Bias and Ethical Concerns
• Models trained on biased datasets can amplify and perpetuate societal biases.
• Unchecked generative AI can lead to the production of misleading, offensive, or
harmful content.
• Ensuring fairness and transparency in generative outputs is an ongoing challenge.
5. Interpretability and Control
• Understanding how generative models make decisions is difficult due to their
black-box nature.
• Controlling model outputs to align with human intent requires fine-tuning and
reinforcement techniques.
• Unintended outputs can arise, making filtering mechanisms necessary.
85
6. Data Dependency and Quality
• The effectiveness of generative models depends heavily on the quality and diversity of
training data.
• Data scarcity and the need for high-quality labeled datasets can be a limiting factor.
• Ethical concerns arise when using copyrighted or sensitive data.
7. Security and Misuse Risks
• Generative models can be exploited for malicious activities, such as deepfakes,
misinformation, and phishing attacks.
• Ensuring that these models are used responsibly requires strict content moderation and
regulatory frameworks.
• Protecting against adversarial attacks and unintended biases is crucial.
8. Computational and Scalability Constraints
• Generative models, especially large-scale architectures, demand significant computational
power.
• Deployment at scale requires efficiency optimizations, such as model compression and
quantization.
• Real-time generative applications face latency and cost challenges.

86

You might also like