Module1 L1 L2
Module1 L1 L2
Slide Number
What is GenAI?
• Generative AI refers to a set of algorithms that can generate new
content in any medium such as image, text, audio or video.
• This generated content is similar to the content that the algorithm is
trained on.
• A prominent type of generative AI is the large language model (LLM),
which generates natural language texts based on prompts.
• GPT (Generative Pre-trained Transformer) series is a well-known
example of generative AI.
• ChatGPT is a renowned example of LLMs.
2
3
What is GenAI?
• GenAI :
• algorithms that generate novel content
• unlike traditional predictive ML, they do
not analyse or act on the existing data
4
Generative Vs. Discriminative Modeling [T2 Pg 1 - 5
• Discriminative Modeling is like supervised learning
5
What is Generative modeling? [T2 pg. 1 – 5]
6
What is Generative modelling?
• Any generative modeling process has:
• A training data : examples of the entity the model has to generate.
• Observation : one of the examples from the training data
• Each Observation is defined using many features.
• Ex: Image of a horse has individual pixel values as features
• A generative model has to be probabilistic and not deterministic.
• The model should have some randomness that influences the sample
generated every time by the model.
• The model has to identify the unknown prob. distribution that
justifies/distinguishes the images present in the training data from
those not in the training set.
7
• If the model mimics this distribution, by sampling it can generate new
observations that look realistic.
• Discriminative modelling is done on a labelled data.
• Generative modeling is usually done on an unlabelled data (like
unsupervised learning)
• It can also be used to generate samples of a distinct class in the
training data.
8
Generative Modeling projects - Examples
• StyleGan by NVIDIA – generates hyper-realistic images of human
faces.
9
Generative Modeling projects – Examples[T1 pg 4-
• OpenAI :
• A US based AI research company that promotes and develop friendly AI
applications.
• Started as a non-profit organisation in 2015 .
• In 2019, it became a for profit organisation.
• Significant achievements :
• Gym library for training reinforcement learning algorithms
• Recently – GPT-n models and Dall-E generative models which generates images from
text.
10
Generative models? T1 Pg.4
11
12
• Generative models :
• a powerful type of AI that can generate new data that resembles the training
data.
• They handle different data modalities
• They are used in different domains – text, image, music and video
• They synthesise new data rather than just making predictions/decisions
• They are used in applications generating text, image, music and video.
• When real data is scarce to train an AI model, generative models can be used
to create synthetic data.
13
OpenAI’s generative models https://platform.openai.com/docs/models
14
Evolution of Generative AI
• 1948: Claude Shannon wrote a paper called “A Mathematical Theory
of Communication“. In this paper, he introduced the idea of n-grams,
a statistical model that can generate new text based on existing text.
• 1950: Alan Turing wrote a paper called “Computing Machinery and
Intelligence“. In this paper, he introduced the Turing Test, which is a
way to determine if a machine can behave intelligently like a human.
• 1952: A.L. Hodgkin and A.F. Huxley created a mathematical model
that explained how the brain uses neurons to create an electrical
network. This model inspired the development of artificial neural
networks, which are used in generative AI.
15
• 1965: Alexey Ivakhnenko and Valentin Lapa developed the first
learning algorithm for feedforward neural networks. This algorithm
enabled the networks to learn complex nonlinear functions from data.
• 1979: Kunihiko Fukushima introduced the neocognitron, a powerful
type of neural network known as a deep convolutional neural
network. It was specifically designed to identify and recognize
handwritten digits and various other patterns.
• 1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams wrote a
paper called “Learning Representations by Back-propagating Errors.”
This paper introduced the backpropagation algorithm, which is
commonly used to train neural networks.
16
• 1991: Sepp Hochreiter introduced the long short-term memory
(LSTM) network. It is a type of recurrent neural network that can
learn long-term relationships in sequential data.
• 2001: Yoshua Bengio and his colleagues created a neural network
called the Neural Probabilistic Language Model (NPLM). This model
can learn how words are used in natural language.
• 2014: Diederik Kingma and Max Welling introduced the variational
autoencoder (VAE). It is a type of model that can learn
representations of data and generate new data based on those
learned representations.
17
• 2014: Ian Goodfellow and his colleagues introduced the generative
adversarial network (GAN). It is a type of generative model that
comprises two neural networks: a generator and a discriminator. The
generator aims to generate realistic data, while the discriminator aims
to differentiate between real and fake data.
• 2015: Yann LeCun and his team proposed the diffusion model. It is a
generative model that learns to reverse a process that gradually
transforms data into noise.
• 2016: Aaron van den Oord and his team introduced WaveNet, a
powerful neural network that can create lifelike speech and music
waveforms.
18
• 2017: Ashish Vaswani and his team introduced the Transformer, a
neural network design that leverages attention mechanisms to learn
from sequential information, like language or speech.
• 2018: Alec Radford and his team introduced Generative Pre-trained
Transformer (GPT). This is a big model that uses the Transformer
architecture to create different kinds of text on different subjects.
• 2018: Jacob Devlin and his team introduced BERT, a powerful model
that can understand the meaning of words and sentences in any
language. It uses a technique called Transformers to learn from lots of
text without needing specific labels.
19
• 2019: a researcher named Tero Karras and his team
introduced StyleGAN, an enhanced type of GAN (Generative
Adversarial Network) that can create a wide range of detailed and
realistic images, including faces, animals, landscapes, and more.
• 2020: Large Language Models Take Center Stage: OpenAI’s GPT-3
(Generative Pre-trained Transformer 3) with 175 billion parameters
pushes the boundaries of language generation, demonstrating
impressive capabilities in text creation, translation, and code writing.
• 2020: a team led by Alexei Baevski introduced wav2vec 2.0. It is a
model that can learn speech representations directly from raw audio
and achieved excellent performance in speech recognition tasks.
20
• 2021: Aditya Ramesh and his team created DALL-E, a powerful model
that can create lifelike images based on written descriptions.
• 2021: Focus on Control and Explainability: Researchers grapple with
the “black box” nature of large language models, seeking methods to
improve control over generated outputs and explain the reasoning
behind their creations.
• 2022: Diffusion Models Gain Traction: Diffusion models, known for
their ability to create realistic images, experience a surge in
popularity. Applications in image generation, editing, and inpainting
become prominent.
21
• 2023: Multimodal Generative AI Takes Shape: Models capable of
generating across different modalities, like text and image
combinations, start to emerge. This opens doors for more interactive
and immersive experiences.
• 2023: Ethical Considerations Mount: Concerns around bias,
misinformation, and potential misuse of generative AI lead to
discussions on responsible development and deployment practices.
• 2024: Focus on Real-World Integration: A growing trend towards
integrating generative AI tools into real-world applications across
various industries like customer service, product design, and
marketing.
22
Advantages of generative modeling
23
24
Types of generative models[T1 pg 6]
• Different types of generative models for different data modalities:
1) Text-to-text :
• models that generate text from input text, like conversational agents. Ex:
LLaMa 2, GPT-4, Claude, PaLM 2
• A conversational agent is a program designed to converse with humans in
natural language.
• It can talk to people on phones, computers, and other devices, allowing them
to order food or do other functions through voice, text, or chat.
• It can achieve these using technologies like natural language processing (NLP),
machine learning (ML), speech recognition, text-to-speech synthesis, and
dialog management to interact with people through various mediums.
25
• Llama 2 is a family of pre-trained and fine-tuned
large language models (LLMs) released by Meta AI in 2023.
• Released free of charge for research and commercial use, Llama 2 AI
models are capable of a variety of
natural language processing (NLP) tasks, from text generation to
programming code.
26
• GPT-n by OpenAI:
• Generative Pre-trained Transformer 3
(GPT-3) is a large language model
released by OpenAI in 2020.
• it is a decoder-only transformer model
of deep neural network and convolution
-based architectures with a technique known as
"attention“
• 175 billion parameters.
27
2) Text-to-Image:
• Models that generate images from text captions. Ex: Dall-E 2, Stable Diffusion,
Midjourney and Imagen.
• Dall-E 2 : https://openai.com/index/dall-e-2/
• DALL·E is a 12-billion parameter version of GPT-3 trained to generate images
from text descriptions, using a dataset of text–image pairs.
28
29
3) Text-to-Audio:
• Models that generate audio clips and music from text. Ex: Jukebox, AudioLM and
MusicGen
• Jukebox is a neural network-based tool that uses artificial intelligence to
generate music. https://www.youtube.com/watch?v=AS1l4Xlgm_k
• Developed by OpenAI, Jukebox is capable of composing original songs in different
genres and styles.
• uses a combination of deep learning techniques, including generative modeling
and reinforcement learning, to create music that is both coherent and creative.
• Applications of Jukebox:
• music generation,
• song completion - complete a song given a short melody.
• music style transfer - It can generate new songs in the style of a given artist
30
4) Text-to-video:
• Models that generate video content from text descriptions. Ex:
Phenaki and Emu Video
• Phenaki : A model for generating videos from text, with prompts that
can change over time, and videos that can be as long as multiple
minutes. https://phenaki.video/
5) Text-to-Speech: Models that synthesize speech audio from input
text. Ex: WaveNet and Tacotron
6) Speech-to-text: Models that transcribe speech to text [ also called
Automatic Speech Recognition ASR]. Ex: Whisper and SpeechGPT
31
7) Image-to-text: Models that generate image captions from images.
Ex: CLIP and DALL-E 3.
8) Image to Image: Applications –
• data augmentation,
• Neural style transfer (NST) - manipulate digital images,
or videos, in order to adopt the appearance or visual style of
another image.
• generating a new image by combining the content of
one image with the style of another image.
• The goal of style transfer is to create an image that
preserves the content of the original image while
applying the visual style of another image.
32
33
• Inpainting : removing defects in the image
Ex: Right arm is missing in the original image
34
9) Text-to-code: models that generate programming code from text.
Ex: GPT 3.5
10) Video-to-audio:
Models that analyse video and generate matching audio.
Ex: Soundify
11) Text-to-Math: generates mathematical expressions from text.
• Many other combinations of data modalities exists
• Text is the common modality.
• OpenAI’s GPT-4V model – Sep 2023 takes both text and images to
better OCR to read text from images.
35