0% found this document useful (0 votes)
27 views

Lecture # 2-1 Probabilistic Models

This is the 3 lec of GEN AI
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lecture # 2-1 Probabilistic Models

This is the 3 lec of GEN AI
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

National University of Computer and Emerging Sciences

Probabilistic Models

AI-4009 Generative AI

Dr. Akhtar Jamil


Department of Computer Science

09/09/2024 Presented by Dr. AKHTAR JAMIL 1


Goals
• Review of Previous Lecture
• Today’s Lecture
– Bayesian Networks
– Terminologies: Loss functions, linear regression, gradient descent,
overfitting, underfitting generalization, regularization, cross-validation

09/09/2024 Presented by Dr. AKHTAR JAMIL 2


Review of Previous Lecture

09/09/2024 Presented by Dr. AKHTAR JAMIL 3


Discriminative Vs Generative Model
• Generative learn the joint probability distribution P(X, Y) , where
X is the input data and Y is the output label.
• Discriminative models learn the conditional probability P(Y | X) ,
which is the probability of the output label Y given the input data
X.

09/09/2024 Presented by Dr. AKHTAR JAMIL 4


What are Generative Models?
Generative machine learning algorithms model complex, high-dimensional objects.

Discriminative Models Generative Models

09/09/2024 Presented by Dr. AKHTAR JAMIL 5


Learning a Generative Model
We are given a training set of examples, e.g., images
of dogs

Present

ed by
6 / 31
09/09/2024 Dr.

AKHTAR

JAMIL

We want to learn a probability distribution p(x ) over images x


such that
Generation: If we sample xnew ∼ p(x ), xnew should look like a dog
(sampling)
Representation learning: We should be able to learn what
these images have in common, e.g., ears, tail, etc.
(features)
First step: how to represent p(x )
Learning a Generative Model
• Defining Probabilistic Models of the Data
• Examples of Probabilistic Models
– The Curse of Dimensionality
• Parameter-Efficient Models through Conditional
Independence
– Bayesian Networks: An Example of Shallow Generative Models
3

Presented
09/09/2024
by Dr. AKHTAR JAMIL 7 / 31
Probabilistic Models: Basic Discrete Distributions
Bernoulli distribution: (biased) coin flip
Domain: { Heads, Tails}
Specify P(X = Heads) = p. Then P(X = Tails) = 1
Present
− p. Write: X ∼ Ber (p) : only one parameter p
ed by
8 / 31
Sampling: flip a (biased) coin
Dr.
09/09/2024

AKHTAR

JAMIL
Categorical distribution: (biased) m-sided dice
Domain: { 1, · · · , Σ
m}
Specify P(Y = i ) = pi , such thatpi =
Write: Y ∼ Cat(p1 , · · · , pm ) : m-1
1
parameters
Sampling: roll a (biased) die
Probabilistic Models: A Multi-Variate
Joint Distribution
• Suppose we want to define a distribution over one pixel in
an image. We use three discrete random variables:
• Red Channel R. Val(R) = {0, · · · ,
255} Present

ed by

• Green Channel G . Val(G ) = { 0, · · ·


9 / 31
09/09/2024 Dr.

AKHTAR

,255}
JAMIL

• Blue Channel B. Val(B) = { 0, · · · ,


• Sampling from the joint distribution (r , g, b) ∼ p(R, G, B) randomly
255}a color for the pixel.
generates
• How many parameters do we need to specify the joint distribution
p(R = r , G = g, B = b)?
256 ∗ 256 ∗ 256 − 1
The Curse of Dimensionality in Probabilistic Models
Suppose we want to model a BW image of digit with n = 28 ·
28 pixels.

Present

ed by

Pixels X1, . . . , Xn are modeled as binary (Bernoulli) random


10 / 31
09/09/2024 Dr.

AKHTAR

variables, i.e., Val(Xi ) = { 0, 1} = { Black, White} .


JAMIL

How many possible states?


n
2×2× ···×2
= 2 n times

Sampling from p(x1, . . . , xn) generates an image


How many parameters to specify the joint distribution
p(x1, . . . , xn) over n binary pixels? 2n − 1 (exponential) =>
curse of dimensionality
Parameter-Efficient Models Through Independence
If X1, . . . , Xn are independent, then

p(x1 , . . . , xn ) = p(x1 )p(x2 ) · · · p(xn )

How many possible states? 2n


Present

ed by

How many parameters to specify the joint distribution p(x1, .


11 / 31
09/09/2024 Dr.

AKHTAR

. . , xn)?
JAMIL

How many to specify the marginal distribution p(x1)? 1


2n entries can be described by just n numbers (if |Val(Xi )| =
2)! Independence assumption is too strong. Model not
likely to be useful For example, each pixel chosen
independently when we sample from it.
Key Notion: Conditional Independence
Two events A, B are conditionally independent given event
C if

p(A ∩ B|C ) = p(A|C )p(B|C )

Random variables X, Y are conditionally independent


Present

ed by

09/09/2024 given Z if for all values x ∈Val(X ), y ∈Val(Y ), z ∈Val(Z )


Dr.
12 / 31

AKHTAR

JAMIL

p(X = x ∩ Y = y |Z = z ) = p(X = x |Z = z )p(Y = y |Z


=z)

We will also write p(X, Y |Z ) = p(X |Z )p(Y |Z ). Note


the more compact notation.
Equivalent definition: p(X |Y , Z ) =
p(X |Z ). We write X ⊥ Y | Z
Today’s Lecture

09/09/2024 Presented by Dr. AKHTAR JAMIL 13


Two Important Rules in Probability

1 Chain rule Let S1 , . . . Sn be events, p(Si )


> 0.
p(S1 ∩ S2 ∩ · · · ∩ Sn) = p(S1)p(S2 | S1) · · · p(Sn | S1 ∩ . . .
Present

09/09/2024
∩ Sn−1) ed by

Dr.
14 / 31
2
AKHTAR

JAMIL

Bayes’ rule Let S1 , S2 be events, p(S1 ) > 0 and p(S2 ) > 0.


p(S1 ∩ S2) p(S2 |S1)p(S
p(S1 | S ) =
p(S ) 1 )
= 2 2
p(S2)
Assumption with conditional independence

15 / 31
09/09/2024

Presented by Dr. AKHTAR JAMIL


Bayesian Networks: General Idea
Use conditional parameterization (instead of joint
parameterization)
For each random variable Xi , specify p(xi |xAi ) for set XAi of
Present
random variables
ed by
16 / 31
Dr.
09/09/2024
Then get joint parametrization as
AKHTAR

JAMIL

• This is a Bayesian Network.


• It is a classical approach for data generation
• Need to guarantee it is a valid probability
distribution.
• Choosing those variable is important. How?
Bayesian Networks: Formal Definition

09/09/2024
What is a Directed Acyclic
Graph?

Present

ed by
18 / 31
09/09/2024 Dr.

AKHTAR

JAMIL

DAG stands for Directed Acyclic


Graph
Bayesian Networks: An Example

09/09/2024
Graph Structure Encodes Conditional Independencies

09/09/2024
Bayesian Networks: An Example 2
• Consider a Bayesian Network with five variables.
– Exercise (E): Whether the person exercises regularly (Yes or No).
– Diet (D): Whether the person has a healthy diet (Yes or No).
– Body Weight (BW): Categorized as Underweight, Normal, Overweight.
– Blood Pressure (BP): Categorized as Low, Normal, High.
– Heart Disease Risk (HR): Risk level of heart disease, categorized as Low,
Medium, High.

09/09/2024 Presented by Dr. AKHTAR JAMIL 21


Bayesian Networks: An Example 2
• We'll assume the following dependencies:
– Exercise (E) and Diet (D) are independent variables.
– Body Weight (BW) depends on both Exercise (E) and Diet (D).
– Blood Pressure (BP) is influenced by Body Weight (BW).
– Heart Disease Risk (HR) is influenced by Blood Pressure (BP) and directly
by Body Weight (BW).
• Draw a possible Gaussian Network

09/09/2024 Presented by Dr. AKHTAR JAMIL 22


Naive Bayes: A Generative Classification Algorithm

09/09/2024 Presented by Dr. AKHTAR JAMIL 23


Naive Bayes: A Generative Classification Algorithm

09/09/2024 Presented by Dr. AKHTAR JAMIL 24


Discriminative Models

09/09/2024 Presented by Dr. AKHTAR JAMIL 25


Machine Learning Fundamentals

09/09/2024 Presented by Dr. AKHTAR JAMIL 26


Workflow of ML tasks

09/09/2024 Presented by Dr. AKHTAR JAMIL 27


Hyperparameters vs Parameters
• Hyperparameters and parameters are both essential components of a
machine learning model.
– Have different purposes and distinct characteristics.
• Parameters:
– Parameters are the internal variables of a machine learning model that are
learned during the training process.
– Model adjusts to fit the training data to understand the relationships in data.
– For example, in a linear regression model, the parameters are the coefficients
assigned to each feature, and in a neural network, the parameters include the
weights and biases of the network's neurons.
– Keep updating these parameters iteratively to minimize a chosen loss function

09/09/2024 Presented by Dr. AKHTAR JAMIL 28


Training, Validation and Testing Data

09/09/2024 Presented by Dr. AKHTAR JAMIL 29


Train, Test and Evaluate model
• Cross-Validation
• Set aside some portion of the data for validation and Train on rest of
it.
• LOOCV (Leave One Out Cross Validation)
– Perform training on the whole training data set but leaves only
one sample for validation
• K-Fold Cross Validation
– The data-set into split into k subsets(folds)
– Perform training on the all the subsets but leave one(k-1)
– Iterate for all folds
09/09/2024 Presented by Dr. AKHTAR JAMIL 30
Cost function
• The cost function helps find optimal model parameters
– Best fit line for the data points.
• Searching for these parameters is a minimization problem
– Model with minimum error between the predicted
value and the actual value.
• One such cost function is:
– Mean Squared Error(MSE):

• : is predicted label
• : Original label
09/09/2024 Presented by Dr. AKHTAR JAMIL 31
Gradient Descent
• Gradient descent is an optimization algorithm
• It helps for searching for the optimal model parameters
• Update parameters according to the gradient values.
• A gradient measures how much the output of a function changes if
you change the parameter values.

09/09/2024 Presented by Dr. AKHTAR JAMIL 32


Gradient Descent
• Initialize w (e.g., randomly)
• Update the values of w based on the gradient:

• Where is learning rate


• To find take derivate of the function with respect to it:

09/09/2024 Presented by Dr. AKHTAR JAMIL 33


Gradient Descent
• To find take derivate of the function with respect to it:

• After solving for the two parameters we get:

09/09/2024 Presented by Dr. AKHTAR JAMIL 34


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 35


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 36


Gradient Descent

09/09/2024 Presented by Dr. AKHTAR JAMIL 37


Thought Provoking Question
• How can we evaluate the performance on the test data set when
we can observe only the training set?

09/09/2024 Presented by Dr. AKHTAR JAMIL 38


References
• Chapter 20, Deep Learning, MIT Press, Ian Goodfellow, Yoshua
Bengio, Aaron Courville
• Lecture slides of https://www.cs.cornell.edu/~kuleshov/

09/09/2024 Presented by Dr. AKHTAR JAMIL 39


Thank You 

09/09/2024 Presented by Dr. AKHTAR JAMIL 40

You might also like