0% found this document useful (0 votes)
6 views

Advanced Machine Learning

The document provides an overview of graphical models in machine learning, detailing both directed and undirected models, their representations, and applications in various fields such as vision and text mining. It explains the fundamentals of machine learning, including supervised, unsupervised, and reinforcement learning, along with the importance and classification of these methods. Additionally, it covers Bayesian networks, their structure, and how they efficiently represent joint probability distributions.

Uploaded by

Vijay Anand.V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Advanced Machine Learning

The document provides an overview of graphical models in machine learning, detailing both directed and undirected models, their representations, and applications in various fields such as vision and text mining. It explains the fundamentals of machine learning, including supervised, unsupervised, and reinforcement learning, along with the importance and classification of these methods. Additionally, it covers Bayesian networks, their structure, and how they efficiently represent joint probability distributions.

Uploaded by

Vijay Anand.V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

UNIT I GRAPHICAL MODEL REPRESENTATION

Directed Graphical Model-Overview, representation of


probability distribution and conditional independence
statements. Undirected Graphical Model- potentials,
conditional independence and graph separability,
factorization- Constructing undirected models from
distributions- Relationship between directed and undirected
models- Common undirected graphical models: Factor
models, Ising and Potts model, Gibbs distribution,log-linear
models, CRFs-Feature-based potentials for flexible
deployment in many applications-Applications in vision and
text mining.
Introduction to Machine Learning
Machine learning
● Machine learning is a branch of artificial intelligence
(AI) and computer science which focuses on the use of
data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.

● Machine learning is a growing technology which enables


computers to learn automatically from past data. Machine
learning uses various algorithms for building mathematical
models and making predictions using historical data or
information.
How does Machine Learning work
● A Machine Learning system learns from historical data, builds the
prediction models, and whenever it receives new data, predicts the
output for it.
● The accuracy of predicted output depends upon the amount of data,
as the huge amount of data helps to build a better model which
predicts the output more accurately.
Apply a prediction function to a feature representation of the image to get the
desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
y = f(x)
output prediction Image
function feature

● Training: given a training set of labeled examples {(x1,y1),


…, (xN,yN)}, estimate the prediction function f by
minimizing the prediction error on the training set
● Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
Need for Machine Learning
● Human cannot process the huge amount of data manually
● can train machine learning algorithms by providing them the huge
amount of data and let them explore the data, construct the
models, and predict the required output automatically
● With the help
● Currently, machine learning is used in self-driving cars, cyber
fraud detection, face recognition, and friend suggestion by
Facebook, etc.
● Various top companies such as Netflix and Amazon have build
machine learning models that are using a vast amount of data to
analyze the user interest and recommend product accordingly. of
machine learning, can save both time and money.
Importance of Machine Learning:

● Rapid increment in the production of data


● Solving complex problems, which are difficult for a human
● Decision making in various sector including finance
● Finding hidden patterns and extracting useful information
from data.
Classification of Machine Learning

● Supervised learning
● Unsupervised learning
● Reinforcement learning
● Supervised learning

○ regression: predict numerical values

○ classification: predict categorical values, i.e., labels


● Unsupervised learning

○ clustering: group data according to "distance"

○ association: find frequent co-occurrences

○ link prediction: discover relationships in data

○ data reduction: project features to fewer features


● Reinforcement learning
Supervised learning
● Supervised learning is a type of machine learning method in
which we provide sample labeled data to the machine learning
system in order to train it, and on that basis, it predicts the output.
● The example of supervised learning is spam filtering.
● Supervised learning can be grouped further in two categories of
algorithms:

○ Classification : predict categorical values, i.e., labels

○ Regression : predict numerical values


Supervised Learning Steps
Unsupervised learning

● Unsupervised learning is a learning method in which a machine


learns without any supervision.
● The training is provided to the machine with the set of data that
has not been labeled, classified, or categorized, and the
algorithm needs to act on that data without any supervision.
● It can be further classifieds into two categories of algorithms:

○ Clustering

○ Association
Reinforcement learning

● Reinforcement learning is a feedback-based learning method, in which a


learning agent gets a reward for each right action and gets a penalty for
each wrong action.
● The agent learns automatically with these feedbacks and improves its
performance. In reinforcement learning, the agent interacts with the
environment and explores it.
● The goal of an agent is to get the most reward points, and hence, it
improves its performance.
20AIPC403
ADVANCED MACHINE LEARNING

Directed Graphical Model

● The Graphical model (GM) is a branch of ML which uses a graph to


represent a domain problem. Many ML & DL algorithms, including
Naive Bayes’ algorithm, the Hidden Markov Model, Restricted
Boltzmann machine and Neural Networks, belong to the GM.

● Probabilistic graphical modeling combines both probability and graph


theory. The probabilistic part reason under uncertainty. So we can use
probability theory to model and argue the real-world problems better.
The graph part models the dependency or correlation.
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING

Probability basics
20AIPC403
ADVANCED MACHINE LEARNING

Axioms of Probability
For any propositions A, B
1. 0<=P(a) <= 1
2. P(true) = 1 and P(false) = 0
3. P (a ∨ b) = P(a) + P (b) − P (a ∧ b)

These three axioms are often called Kolmogorov’s axioms


named after Russian mathematician who showed how to
build up the rest of probability theory from this simple
foundation
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING

Bayes’ Rule and its Use


20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING

Remembrance

!Conditional Probability

!Product Rule

!Bayes Rule
20AIPC403
ADVANCED MACHINE LEARNING

Bayesian Networks
20AIPC403
ADVANCED MACHINE LEARNING

Bayesian Network other names


– Belief networks
– Probabilistic networks
– Casual networks
– Decision network
– Bayesian Model
20AIPC403
ADVANCED MACHINE LEARNING

Definition
"A Bayesian network is a probabilistic graphical model which

represents a set of random variables and their conditional
dependencies using a directed acyclic graph(DAG).”
• Real world applications are probabilistic in nature, and to represent
the relationship between multiple events, we need a Bayesian
Syntax:
network.
• a set of nodes, one per random variable
• a directed, acyclic graph (link ≈ "directly influences")
• a conditional distribution for each node given its parents: P
(Xi | Parents (Xi))

• In the simplest case, conditional distribution represented as a


conditional probability table (CPT) giving the distribution over Xi
for each combination of parent values
20AIPC403
ADVANCED MACHINE LEARNING

Bayesian Network Constitutes of


• Directed Acyclic Graph
• Table of conditional probabilities for each node
Conditional probability can be computed using the full
joint probability distribution but representing entire it is
intractable.

Bayesian Network represents efficiently the joint


probability distribution of the variable
20AIPC403
ADVANCED MACHINE LEARNING

Bayesian
A Bayesian network graph is network
made up of nodes and Arcs
(directed links)
Each node corresponds to the
random variables, and a variable
can be continuous or discrete.

Arc or directed arrows represent


the causal relationship or
probabilistic dependencies between
nodes
Lack of link signifies conditional
• Weather is independent of the other
variables independence

• Toothache and Catch are conditionally


independent given Cavity
20AIPC403
ADVANCED MACHINE LEARNING
20AIPC403
ADVANCED MACHINE LEARNING

Compactness rows for


• A CPT for Boolean X with k Boolean parents has 2k
i
the combinations of parent values
• Each row requires one number p for X = true
(the number for X = false is just 1-p) i
i

• If each variable has no more than k parents, the complete


network requires O(n · 2k) numbers

• i.e., grows linearly with n, vs. O(2n) for the full joint
distribution

• For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers


(vs. 25-1 = 31 in full joint distribution)
20AIPC403
ADVANCED MACHINE LEARNING

Global Semantics
Global semantics defines the full joint distribution is defined as the
product of the local conditional distributions:
P(x ,x ,x ,…x ) =# P(x /parents(x )
1 2 3 n i=1 to n i i

Ex:

P(J, M,A,B,E) = P(J/A) x P(M/A) x P(A/[B,E]) x P(B) x P(E)


or
P(J^M^A^B^E)
Where P(B), P(E) – independent probability
P(J) and P(M) – dependent on parent A
20AIPC403
ADVANCED MACHINE LEARNING
Local Semantics
Local semantics: each node is conditionally independent of its non-
descendants given its parents

Ex:
Ex 2: For E, given C, D then A, B
are independent of E

Ex 1: What’s independent of C given A?


B, D(non- descendant) are conditionally independent of C, given
the value of A.
20AIPC403
ADVANCED MACHINE LEARNING
Markovindependent
● Each node is conditionally Blanket of all others given its
parents, children and children’s parents

Ex:1 Ex 2:

Node C is markov blanket which


B is independent of J and M,
includes its parent A, Child E and
given A and E
Child’s parent D. A, D, E becomes
conditionally independent of B.
20AIPC403
ADVANCED MACHINE LEARNING

Constructing Bayesian
1. Choose an ordering of variables X , … ,X
networks
1 n
2. For i = 1 to n
– add
– selectXparents
i
to the network
from X , …,X such that
1 i-1
P (X | Parents(X )) = P (X | X , ... X )
i i i 1 i-1

This choice of parents guarantees:

P(X , … ,X )=# P(X |X ,…,X )(chain rule)


1 n i =1 in 1 i-1
=" P(X |Parentns(X ))(by construction)
i =1 i i
20AIPC403
ADVANCED MACHINE LEARNING

Example
• I'm at work, neighbor David calls to say my alarm is ringing,
but neighbor Sophia doesn't call. Sometimes it's set off by
minor earthquakes. Is there a burglar?
• Variables:
• Burglary, Earthquake, Alarm, DavidCalls, SophiaCalls

• Network topology reflects "causal" knowledge:


– A burglar can set the alarm off
– An earthquake can set the alarm off
– The can cause Sophia to call
The
–alarm can cause David to call
alarm
20AIPC403
ADVANCED MACHINE LEARNING

Burglary Example
20AIPC403
ADVANCED MACHINE LEARNING

Burglary Example (cntd)


Calculate the probability that alarm has sounded, but
there is neither a burglary, nor an earthquake occurred,
and David and Sophia both called the Harry.
Write the problem statement in the form of probability
distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) * P (D|A)* P (A|¬B ^ ¬E) *
P (¬B) * P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
20AIPC403
ADVANCED MACHINE LEARNING

Applications
It can be used in various tasks including

You might also like