0% found this document useful (0 votes)
51 views40 pages

Lecture # 2-1 Probabilistic Models

This is the 3 lec of GEN AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views40 pages

Lecture # 2-1 Probabilistic Models

This is the 3 lec of GEN AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

National University of Computer and Emerging Sciences

Probabilistic Models

AI-4009 Generative AI

Dr. Akhtar Jamil


Department of Computer Science

09/09/2024 Presented by Dr. AKHTAR JAMIL 1


Goals
• Review of Previous Lecture
• Today’s Lecture
– Bayesian Networks
– Terminologies: Loss functions, linear regression, gradient descent,
overfitting, underfitting generalization, regularization, cross-validation

09/09/2024 Presented by Dr. AKHTAR JAMIL 2


Review of Previous Lecture

09/09/2024 Presented by Dr. AKHTAR JAMIL 3


Discriminative Vs Generative Model
• Generative learn the joint probability distribution P(X, Y) , where
X is the input data and Y is the output label.
• Discriminative models learn the conditional probability P(Y | X) ,
which is the probability of the output label Y given the input data
X.

09/09/2024 Presented by Dr. AKHTAR JAMIL 4


What are Generative Models?
Generative machine learning algorithms model complex, high-dimensional objects.

Discriminative Models Generative Models

09/09/2024 Presented by Dr. AKHTAR JAMIL 5


Learning a Generative Model
We are given a training set of examples, e.g., images
of dogs

Present

ed by
6 / 31
09/09/2024 Dr.

AKHTAR

JAMIL

We want to learn a probability distribution p(x ) over images x


such that
Generation: If we sample xnew ∼ p(x ), xnew should look like a dog
(sampling)
Representation learning: We should be able to learn what
these images have in common, e.g., ears, tail, etc.
(features)
First step: how to represent p(x )
Learning a Generative Model
• Defining Probabilistic Models of the Data
• Examples of Probabilistic Models
– The Curse of Dimensionality
• Parameter-Efficient Models through Conditional
Independence
– Bayesian Networks: An Example of Shallow Generative Models
3

Presented
09/09/2024
by Dr. AKHTAR JAMIL 7 / 31
Probabilistic Models: Basic Discrete Distributions
Bernoulli distribution: (biased) coin flip
Domain: { Heads, Tails}
Specify P(X = Heads) = p. Then P(X = Tails) = 1
Present
− p. Write: X ∼ Ber (p) : only one parameter p
ed by
8 / 31
Sampling: flip a (biased) coin
Dr.
09/09/2024

AKHTAR

JAMIL
Categorical distribution: (biased) m-sided dice
Domain: { 1, · · · , Σ
m}
Specify P(Y = i ) = pi , such thatpi =
Write: Y ∼ Cat(p1 , · · · , pm ) : m-1
1
parameters
Sampling: roll a (biased) die
Probabilistic Models: A Multi-Variate
Joint Distribution
• Suppose we want to define a distribution over one pixel in
an image. We use three discrete random variables:
• Red Channel R. Val(R) = {0, · · · ,
255} Present

ed by

• Green Channel G . Val(G ) = { 0, · · ·


9 / 31
09/09/2024 Dr.

AKHTAR

,255}
JAMIL

• Blue Channel B. Val(B) = { 0, · · · ,


• Sampling from the joint distribution (r , g, b) ∼ p(R, G, B) randomly
255}a color for the pixel.
generates
• How many parameters do we need to specify the joint distribution
p(R = r , G = g, B = b)?
256 ∗ 256 ∗ 256 − 1
The Curse of Dimensionality in Probabilistic Models
Suppose we want to model a BW image of digit with n = 28 ·
28 pixels.

Present

ed by

Pixels X1, . . . , Xn are modeled as binary (Bernoulli) random


10 / 31
09/09/2024 Dr.

AKHTAR

variables, i.e., Val(Xi ) = { 0, 1} = { Black, White} .


JAMIL

How many possible states?


n
2×2× ···×2
= 2 n times

Sampling from p(x1, . . . , xn) generates an image


How many parameters to specify the joint distribution
p(x1, . . . , xn) over n binary pixels? 2n − 1 (exponential) =>
curse of dimensionality
Parameter-Efficient Models Through Independence
If X1, . . . , Xn are independent, then

p(x1 , . . . , xn ) = p(x1 )p(x2 ) · · · p(xn )

How many possible states? 2n


Present

ed by

How many parameters to specify the joint distribution p(x1, .


11 / 31
09/09/2024 Dr.

AKHTAR

. . , xn)?
JAMIL

How many to specify the marginal distribution p(x1)? 1


2n entries can be described by just n numbers (if |Val(Xi )| =
2)! Independence assumption is too strong. Model not
likely to be useful For example, each pixel chosen
independently when we sample from it.
Key Notion: Conditional Independence
Two events A, B are conditionally independent given event
C if

p(A ∩ B|C ) = p(A|C )p(B|C )

Random variables X, Y are conditionally independent


Present

ed by

09/09/2024 given Z if for all values x ∈Val(X ), y ∈Val(Y ), z ∈Val(Z )


Dr.
12 / 31

AKHTAR

JAMIL

p(X = x ∩ Y = y |Z = z ) = p(X = x |Z = z )p(Y = y |Z


=z)

We will also write p(X, Y |Z ) = p(X |Z )p(Y |Z ). Note


the more compact notation.
Equivalent definition: p(X |Y , Z ) =
p(X |Z ). We write X ⊥ Y | Z
Today’s Lecture

09/09/2024 Presented by Dr. AKHTAR JAMIL 13


Two Important Rules in Probability

1 Chain rule Let S1 , . . . Sn be events, p(Si )


> 0.
p(S1 ∩ S2 ∩ · · · ∩ Sn) = p(S1)p(S2 | S1) · · · p(Sn | S1 ∩ . . .
Present

09/09/2024
∩ Sn−1) ed by

Dr.
14 / 31
2
AKHTAR

JAMIL

Bayes’ rule Let S1 , S2 be events, p(S1 ) > 0 and p(S2 ) > 0.


p(S1 ∩ S2) p(S2 |S1)p(S
p(S1 | S ) =
p(S ) 1 )
= 2 2
p(S2)
Assumption with conditional independence

15 / 31
09/09/2024

Presented by Dr. AKHTAR JAMIL


Bayesian Networks: General Idea
Use conditional parameterization (instead of joint
parameterization)
For each random variable Xi , specify p(xi |xAi ) for set XAi of
Present
random variables
ed by
16 / 31
Dr.
09/09/2024
Then get joint parametrization as
AKHTAR

JAMIL

• This is a Bayesian Network.


• It is a classical approach for data generation
• Need to guarantee it is a valid probability
distribution.
• Choosing those variable is important. How?
Bayesian Networks: Formal Definition

09/09/2024
What is a Directed Acyclic
Graph?

Present

ed by
18 / 31
09/09/2024 Dr.

AKHTAR

JAMIL

DAG stands for Directed Acyclic


Graph
Bayesian Networks: An Example

09/09/2024
Graph Structure Encodes Conditional Independencies

09/09/2024
Bayesian Networks: An Example 2
• Consider a Bayesian Network with five variables.
– Exercise (E): Whether the person exercises regularly (Yes or No).
– Diet (D): Whether the person has a healthy diet (Yes or No).
– Body Weight (BW): Categorized as Underweight, Normal, Overweight.
– Blood Pressure (BP): Categorized as Low, Normal, High.
– Heart Disease Risk (HR): Risk level of heart disease, categorized as Low,
Medium, High.

09/09/2024 Presented by Dr. AKHTAR JAMIL 21


Bayesian Networks: An Example 2
• We'll assume the following dependencies:
– Exercise (E) and Diet (D) are independent variables.
– Body Weight (BW) depends on both Exercise (E) and Diet (D).
– Blood Pressure (BP) is influenced by Body Weight (BW).
– Heart Disease Risk (HR) is influenced by Blood Pressure (BP) and directly
by Body Weight (BW).
• Draw a possible Gaussian Network

09/09/2024 Presented by Dr. AKHTAR JAMIL 22


Naive Bayes: A Generative Classification Algorithm

09/09/2024 Presented by Dr. AKHTAR JAMIL 23


Naive Bayes: A Generative Classification Algorithm

09/09/2024 Presented by Dr. AKHTAR JAMIL 24


Discriminative Models

09/09/2024 Presented by Dr. AKHTAR JAMIL 25


Machine Learning Fundamentals

09/09/2024 Presented by Dr. AKHTAR JAMIL 26


Workflow of ML tasks

09/09/2024 Presented by Dr. AKHTAR JAMIL 27


Hyperparameters vs Parameters
• Hyperparameters and parameters are both essential components of a
machine learning model.
– Have different purposes and distinct characteristics.
• Parameters:
– Parameters are the internal variables of a machine learning model that are
learned during the training process.
– Model adjusts to fit the training data to understand the relationships in data.
– For example, in a linear regression model, the parameters are the coefficients
assigned to each feature, and in a neural network, the parameters include the
weights and biases of the network's neurons.
– Keep updating these parameters iteratively to minimize a chosen loss function

09/09/2024 Presented by Dr. AKHTAR JAMIL 28


Training, Validation and Testing Data

09/09/2024 Presented by Dr. AKHTAR JAMIL 29


Train, Test and Evaluate model
• Cross-Validation
• Set aside some portion of the data for validation and Train on rest of
it.
• LOOCV (Leave One Out Cross Validation)
– Perform training on the whole training data set but leaves only
one sample for validation
• K-Fold Cross Validation
– The data-set into split into k subsets(folds)
– Perform training on the all the subsets but leave one(k-1)
– Iterate for all folds
09/09/2024 Presented by Dr. AKHTAR JAMIL 30
Cost function
• The cost function helps find optimal model parameters
– Best fit line for the data points.
• Searching for these parameters is a minimization problem
– Model with minimum error between the predicted
value and the actual value.
• One such cost function is:
– Mean Squared Error(MSE):

• : is predicted label
• : Original label
09/09/2024 Presented by Dr. AKHTAR JAMIL 31
Gradient Descent
• Gradient descent is an optimization algorithm
• It helps for searching for the optimal model parameters
• Update parameters according to the gradient values.
• A gradient measures how much the output of a function changes if
you change the parameter values.

09/09/2024 Presented by Dr. AKHTAR JAMIL 32


Gradient Descent
• Initialize w (e.g., randomly)
• Update the values of w based on the gradient:

• Where is learning rate


• To find take derivate of the function with respect to it:

09/09/2024 Presented by Dr. AKHTAR JAMIL 33


Gradient Descent
• To find take derivate of the function with respect to it:

• After solving for the two parameters we get:

09/09/2024 Presented by Dr. AKHTAR JAMIL 34


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 35


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 36


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 37


Thought Provoking Question
• How can we evaluate the performance on the test data set when
we can observe only the training set?

09/09/2024 Presented by Dr. AKHTAR JAMIL 38


References
• Chapter 20, Deep Learning, MIT Press, Ian Goodfellow, Yoshua
Bengio, Aaron Courville
• Lecture slides of https://www.cs.cornell.edu/~kuleshov/

09/09/2024 Presented by Dr. AKHTAR JAMIL 39


Thank You 

09/09/2024 Presented by Dr. AKHTAR JAMIL 40

You might also like