0% found this document useful (0 votes)
49 views35 pages

Module1 L1 L2

Uploaded by

shushmareddyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views35 pages

Module1 L1 L2

Uploaded by

shushmareddyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Generative AI

Subject Code & Subject Name : CSE3348 Generative AI


Students: School of Computer Science and Engineering

Faculty Name : Dr J Alamelu Mangai


Designation : Professor
Department : CSE

Slide Number
What is GenAI?
• Generative AI refers to a set of algorithms that can generate new
content in any medium such as image, text, audio or video.
• This generated content is similar to the content that the algorithm is
trained on.
• A prominent type of generative AI is the large language model (LLM),
which generates natural language texts based on prompts.
• GPT (Generative Pre-trained Transformer) series is a well-known
example of generative AI.
• ChatGPT is a renowned example of LLMs.

2
3
What is GenAI?

• GenAI :
• algorithms that generate novel content
• unlike traditional predictive ML, they do
not analyse or act on the existing data

• GenAI models have the ability to generate


text, images and other creative content
indistinguishable from human-generated
content.

4
Generative Vs. Discriminative Modeling [T2 Pg 1 - 5
• Discriminative Modeling is like supervised learning

5
What is Generative modeling? [T2 pg. 1 – 5]

• A generative model describes how a data set is generated in terms of


a probabilistic model.
• By sampling this model, new data can be generated.

6
What is Generative modelling?
• Any generative modeling process has:
• A training data : examples of the entity the model has to generate.
• Observation : one of the examples from the training data
• Each Observation is defined using many features.
• Ex: Image of a horse has individual pixel values as features
• A generative model has to be probabilistic and not deterministic.
• The model should have some randomness that influences the sample
generated every time by the model.
• The model has to identify the unknown prob. distribution that
justifies/distinguishes the images present in the training data from
those not in the training set.

7
• If the model mimics this distribution, by sampling it can generate new
observations that look realistic.
• Discriminative modelling is done on a labelled data.
• Generative modeling is usually done on an unlabelled data (like
unsupervised learning)
• It can also be used to generate samples of a distinct class in the
training data.

8
Generative Modeling projects - Examples
• StyleGan by NVIDIA – generates hyper-realistic images of human
faces.

• GPT by OpenAI : given a short introductory passage, the model


completes the given passage.

9
Generative Modeling projects – Examples[T1 pg 4-
• OpenAI :
• A US based AI research company that promotes and develop friendly AI
applications.
• Started as a non-profit organisation in 2015 .
• In 2019, it became a for profit organisation.
• Significant achievements :
• Gym library for training reinforcement learning algorithms
• Recently – GPT-n models and Dall-E generative models which generates images from
text.

10
Generative models? T1 Pg.4

• Artificial Intelligence (AI) : a broad field of CS focussed on creating intelligent


agents that can reason, learn and act autonomously
• Machine Learning(ML): a subset of AI, focussed on developing algorithms that
can learn from data.
• Deep Learning(DL): uses deep neural networks with many layers, as a mechanism
of ML to learn complex patterns from data.
• Generative models are a type of ML model, that can generate new data based on
patterns learnt from the input data.
• Language Models (LMs): are statistical models used to predict words in a
sequence of natural language text. ”The sky is ********”
• Large Language models (LLMs) : uses deep learning and are trained on massive
data sets.

11
12
• Generative models :
• a powerful type of AI that can generate new data that resembles the training
data.
• They handle different data modalities
• They are used in different domains – text, image, music and video
• They synthesise new data rather than just making predictions/decisions
• They are used in applications generating text, image, music and video.
• When real data is scarce to train an AI model, generative models can be used
to create synthetic data.

13
OpenAI’s generative models https://platform.openai.com/docs/models

14
Evolution of Generative AI
• 1948: Claude Shannon wrote a paper called “A Mathematical Theory
of Communication“. In this paper, he introduced the idea of n-grams,
a statistical model that can generate new text based on existing text.
• 1950: Alan Turing wrote a paper called “Computing Machinery and
Intelligence“. In this paper, he introduced the Turing Test, which is a
way to determine if a machine can behave intelligently like a human.
• 1952: A.L. Hodgkin and A.F. Huxley created a mathematical model
that explained how the brain uses neurons to create an electrical
network. This model inspired the development of artificial neural
networks, which are used in generative AI.

15
• 1965: Alexey Ivakhnenko and Valentin Lapa developed the first
learning algorithm for feedforward neural networks. This algorithm
enabled the networks to learn complex nonlinear functions from data.
• 1979: Kunihiko Fukushima introduced the neocognitron, a powerful
type of neural network known as a deep convolutional neural
network. It was specifically designed to identify and recognize
handwritten digits and various other patterns.
• 1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams wrote a
paper called “Learning Representations by Back-propagating Errors.”
This paper introduced the backpropagation algorithm, which is
commonly used to train neural networks.

16
• 1991: Sepp Hochreiter introduced the long short-term memory
(LSTM) network. It is a type of recurrent neural network that can
learn long-term relationships in sequential data.
• 2001: Yoshua Bengio and his colleagues created a neural network
called the Neural Probabilistic Language Model (NPLM). This model
can learn how words are used in natural language.
• 2014: Diederik Kingma and Max Welling introduced the variational
autoencoder (VAE). It is a type of model that can learn
representations of data and generate new data based on those
learned representations.

17
• 2014: Ian Goodfellow and his colleagues introduced the generative
adversarial network (GAN). It is a type of generative model that
comprises two neural networks: a generator and a discriminator. The
generator aims to generate realistic data, while the discriminator aims
to differentiate between real and fake data.
• 2015: Yann LeCun and his team proposed the diffusion model. It is a
generative model that learns to reverse a process that gradually
transforms data into noise.
• 2016: Aaron van den Oord and his team introduced WaveNet, a
powerful neural network that can create lifelike speech and music
waveforms.

18
• 2017: Ashish Vaswani and his team introduced the Transformer, a
neural network design that leverages attention mechanisms to learn
from sequential information, like language or speech.
• 2018: Alec Radford and his team introduced Generative Pre-trained
Transformer (GPT). This is a big model that uses the Transformer
architecture to create different kinds of text on different subjects.
• 2018: Jacob Devlin and his team introduced BERT, a powerful model
that can understand the meaning of words and sentences in any
language. It uses a technique called Transformers to learn from lots of
text without needing specific labels.

19
• 2019: a researcher named Tero Karras and his team
introduced StyleGAN, an enhanced type of GAN (Generative
Adversarial Network) that can create a wide range of detailed and
realistic images, including faces, animals, landscapes, and more.
• 2020: Large Language Models Take Center Stage: OpenAI’s GPT-3
(Generative Pre-trained Transformer 3) with 175 billion parameters
pushes the boundaries of language generation, demonstrating
impressive capabilities in text creation, translation, and code writing.
• 2020: a team led by Alexei Baevski introduced wav2vec 2.0. It is a
model that can learn speech representations directly from raw audio
and achieved excellent performance in speech recognition tasks.

20
• 2021: Aditya Ramesh and his team created DALL-E, a powerful model
that can create lifelike images based on written descriptions.
• 2021: Focus on Control and Explainability: Researchers grapple with
the “black box” nature of large language models, seeking methods to
improve control over generated outputs and explain the reasoning
behind their creations.
• 2022: Diffusion Models Gain Traction: Diffusion models, known for
their ability to create realistic images, experience a surge in
popularity. Applications in image generation, editing, and inpainting
become prominent.

21
• 2023: Multimodal Generative AI Takes Shape: Models capable of
generating across different modalities, like text and image
combinations, start to emerge. This opens doors for more interactive
and immersive experiences.
• 2023: Ethical Considerations Mount: Concerns around bias,
misinformation, and potential misuse of generative AI lead to
discussions on responsible development and deployment practices.
• 2024: Focus on Real-World Integration: A growing trend towards
integrating generative AI tools into real-world applications across
various industries like customer service, product design, and
marketing.

22
Advantages of generative modeling

• Synthetic data generation using generative models reduces the cost


of labelling and improves the training efficiency.
• Microsoft Research trained their LLM named phi-1 using generative
modelling, for basic Python coding.
• Trained on code from The Stack, Q&A content from StackOverflow,
synthetic codes generated by GPT3.5
• It is a transformer with 1.3 billion parameters.
• “Textbooks Are All You Need, June 2023”
https://www.microsoft.com/en-us/research/publication/textbooks-
are-all-you-need
/

23
24
Types of generative models[T1 pg 6]
• Different types of generative models for different data modalities:
1) Text-to-text :
• models that generate text from input text, like conversational agents. Ex:
LLaMa 2, GPT-4, Claude, PaLM 2
• A conversational agent is a program designed to converse with humans in
natural language.
• It can talk to people on phones, computers, and other devices, allowing them
to order food or do other functions through voice, text, or chat.
• It can achieve these using technologies like natural language processing (NLP),
machine learning (ML), speech recognition, text-to-speech synthesis, and
dialog management to interact with people through various mediums.

25
• Llama 2 is a family of pre-trained and fine-tuned
large language models (LLMs) released by Meta AI in 2023.
• Released free of charge for research and commercial use, Llama 2 AI
models are capable of a variety of
natural language processing (NLP) tasks, from text generation to
programming code.

26
• GPT-n by OpenAI:
• Generative Pre-trained Transformer 3
(GPT-3) is a large language model
released by OpenAI in 2020.
• it is a decoder-only transformer model
of deep neural network and convolution
-based architectures with a technique known as
"attention“
• 175 billion parameters.

27
2) Text-to-Image:
• Models that generate images from text captions. Ex: Dall-E 2, Stable Diffusion,
Midjourney and Imagen.
• Dall-E 2 : https://openai.com/index/dall-e-2/
• DALL·E is a 12-billion parameter version of GPT-3 trained to generate images
from text descriptions, using a dataset of text–image pairs.

28
29
3) Text-to-Audio:
• Models that generate audio clips and music from text. Ex: Jukebox, AudioLM and
MusicGen
• Jukebox is a neural network-based tool that uses artificial intelligence to
generate music. https://www.youtube.com/watch?v=AS1l4Xlgm_k
• Developed by OpenAI, Jukebox is capable of composing original songs in different
genres and styles.
• uses a combination of deep learning techniques, including generative modeling
and reinforcement learning, to create music that is both coherent and creative.
• Applications of Jukebox:
• music generation,
• song completion - complete a song given a short melody.
• music style transfer - It can generate new songs in the style of a given artist

30
4) Text-to-video:
• Models that generate video content from text descriptions. Ex:
Phenaki and Emu Video
• Phenaki : A model for generating videos from text, with prompts that
can change over time, and videos that can be as long as multiple
minutes. https://phenaki.video/
5) Text-to-Speech: Models that synthesize speech audio from input
text. Ex: WaveNet and Tacotron
6) Speech-to-text: Models that transcribe speech to text [ also called
Automatic Speech Recognition ASR]. Ex: Whisper and SpeechGPT

31
7) Image-to-text: Models that generate image captions from images.
Ex: CLIP and DALL-E 3.
8) Image to Image: Applications –
• data augmentation,
• Neural style transfer (NST) - manipulate digital images,
or videos, in order to adopt the appearance or visual style of
another image.
• generating a new image by combining the content of
one image with the style of another image.
• The goal of style transfer is to create an image that
preserves the content of the original image while
applying the visual style of another image.

32
33
• Inpainting : removing defects in the image
Ex: Right arm is missing in the original image

34
9) Text-to-code: models that generate programming code from text.
Ex: GPT 3.5
10) Video-to-audio:
Models that analyse video and generate matching audio.
Ex: Soundify
11) Text-to-Math: generates mathematical expressions from text.
• Many other combinations of data modalities exists
• Text is the common modality.
• OpenAI’s GPT-4V model – Sep 2023 takes both text and images to
better OCR to read text from images.

35

You might also like