0% found this document useful (0 votes)

42 views38 pages

Machine Learning Models and Theories

Uploaded by

Amrita Chaturvedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views38 pages

Machine Learning Models and Theories

Uploaded by

Amrita Chaturvedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Machine learning models and

theories
Content

1. Bayesian Belief Network

2. Neyman–Pearson lemma
3. Error Bounds for normal density
4. Expectation maximization

Probability and computing /

Michael Mitzenmacher Eli Upfal.
Bayesian Belief Network
Characteristics

Common names:
Bayesian network,
Bayes network
Belief network
Bayes(ian) model
probabilistic directed acyclic graphical model
Bayesian Belief Network

• One important consideration about Bayesian network is representation of

relationship between variables in terms of probability.
• To represent these probabilistic relations among different variables/parameters/
hypothesis, BBN applies directed acyclic graph (DAG) (No cycle graph).
• Each node is associated with a probability function that takes, as input, a
particular set of values for the node's parent variables, and gives (as output) the
probability (or probability distribution, if applicable) of the variable
represented by the node.
Where you can use it in your research ?

• In many problems where decision based outcomes are necessary to show

results in your outcomes.
• Deciding uncertainty in behaviour of proposed model.
• Similar ideas may be applied to undirected, and possibly cyclic, graphs;
such as Markov networks.
Mathematical background of BBN

• Interpretation of P(a)  Probability or chance of occurring event a.

• Interpretation of P( a | b )  Probability or chance of occurring event a when
b has been occur.
• Example: P (Rain | last day was sunny)  What will be probability of Rain
today if last day was sunny.
•
What are conclusions that can be extracted from given
graph

• Grass wet can be seen because of either

Sprinkler or Rain.
• Sprinkler can be seen because of Rain.
• There is no decision can be taken for Rain.
• It can be represented as:
• P(Grass wet | Rain) = Some value
• P(Grass wet | Sprinkler) = Some value
• P(Rain | Grass wet) = decision problem
• P(Rain | Sprinkler) = decision problem
Joint Probability Density function
• Suppose that there are two events which could cause grass to be wet: either the sprinkler is
on or it's raining. Also, suppose that the rain has a direct effect on the use of the sprinkler
(namely that when it rains, the sprinkler is usually not turned on). Then the situation can
be modeled with a Bayesian network (shown to the right). All three variables have two
possible values, T (for true) and F (for false) See the table.
Test case Sprinkler Rain Grass wet Decision
1 0 0 0 0
2 0 0 1 0
3 0 1 0 0
4 0 1 1 1
5 1 0 0 0
6 1 0 1 1
7 1 1 0 1
8 1 1 1 1
Joint probability density function (Adopted from
chain rule of probability)
• P ( A|B) =
• P (A |B)* P (A) = P = P (B,A) = P (A, B)
• P(A,B,C) = P (A|B,C) * P(B|C) * P(C)
• First C will occur then only B will occur then P(C)

• Only and only A will occur.

P(B) P()

P(A) P()
So how to solve this ?

Query 1:

Rain occur

Sprinkler Sprinkler
On Off

Grass not
Grass wet
wet
Solution
• Select probability of Rain occurrence = (.2)
• Select probability of Rain occurrence and Sprinkler On = ( 0.01)
• Select probability of Rain occurrence and Sprinkler On and Wet grass (.99)

Ans = .2 0.01.99 = 0.00198

Some other example
Another example
Query: What is the probability that the alarm has sounded but neither a
burglary nor an earthquake has occurred, and both Ali and Veli call ?

= P (̰̰) P(E) P(Alarm | , E) * P(VC | A) *P (AC|A)

= 0.999*.998*0.001* 0.90*0.70 = 0.00062

P (B | AC) =
Likelihood Test (LR test)

• In statistics, a likelihood ratio test (LR test) is a statistical test used for comparing the
goodness of fit of two statistical models — a null model against an alternative model.
The test is based on the likelihood ratio, which expresses how many times more likely
the data are under one model than the other. This likelihood ratio, or equivalently its
logarithm, can then be used to compute a p-value, or compared to a critical value to
decide whether or not to reject the null model.
Discrete probability distribution

Getting a no. 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6

P( getting even no. ) = 1/6 + 1/6 + 1/6 = ½;

P(probability of getting even no) = 1/6 + 1/6 = 1/3
P () means x is a limiting condition and is a desirable condition.
Neyman–Pearson lemma

Objective: Finding similarity between two approaches θ1 and θ0 using the

likelihood-ratio test with threshold η
Normal Density function

1: Continuous density function

2: A random variable with a Gaussian distribution is said to be normally distributed and is
called a normal deviate.
3: The probability density of the normal distribution is
Expectation in normal density function

• If X is a random variable whose cumulative distribution function admits a

density f(x), then the expected value is defined as the following Lebesgue
integral:
• E [x]=
Concept of decision making in Bayesion decision
theory

• Suppose that we want to classify two kind of fish (A) Sea Bass and (B)
Salmon . We have a fish which has some properties w1, similar to Sea Bass
and some properties w2 similar to Salmon and we don’t know whether it is
Sea Bass or Salmon actually.

• So How to take correct decision when we have this kind of situation?

•
Simple phenomenon of estimation

• Calculate P (W1 | x) and P (W2 | x) from previous samples of fishes.

• { if P (W1 | x) > P (W2 | x) then choose W1 else choose W2 }

• Let us generalise this case when x is a condition and W is correct class/

decision
• We note first that the (joint) probability density of finding a pattern that is
in category ωj and has feature value x can be written two ways: p(ωj , x) =
P(ωj |x)p(x) = p(x|ωj) P(ωj). Rearranging these leads us to the answer to our
question, which is called Bayes’ formula:
To estimate it in a correct way, we have to find error
value regarding both w1 and w2 in continuous space
What is probabilistic error estimation respect to x

LOC Function Adjacent Effort (Ground

point feature truth of decision) Error function or cost function or objective function
1000 103.7 52 5500 and it is responsibility of analyst th develop it for ex.

1700 90.50 17 2000 Error function = (Oi – Ei | Feature value x)

3310 103.87 91 10000
3500 144.90 103 9000
So selection criteria will be

Select the region which is showing minimum error among given decision areas
So again problem is same “ How to find min[P (w1|x), P(W2|x)] ” and can we get specific bound of this
problem?

There are different methods are available in statistics to solve this optimization
problem. Traditionally we solve it by applying global minima and Dynamic
programming but here we will go for Chernoff bound to estimate
Chernoff bound

• Lemma :

Let us discuss it. Assume that we have two cases either a is greater than b and b is greater than a then
Assume a> b then we have to prove only

b<
1<

1<
Bhattacharyya Bound (An extension of Chernoff
bound which is slightly less tight and developed for )
Error probabilities in probabilistic classifier Confusion
estimations matrix

• We can obtain additional insight into the operation of a general classifier

— Bayes or otherwise — if we consider the sources of its error.
• Consider first the two-category case, and suppose the dichotomizer has
divided the space into two regions R1 and R2 in a possibly non-optimal
way.
• There are two ways in which a classification error can occur; either an
observation x falls in R2 and the true state of nature is ω1, or x falls in R1
and the true state of nature is ω2. Since these events are mutually exclusive
and exhaustive, the probability of error is
Error Bounds for Normal Densities

1: Chernoff Bound
2: Bhattacharyya Bound
The concept of bounds are directly taken from moment generating
function of a random variable X.
Chernoff Bound

• Suppose X1, ..., Xn are independent random variables taking values in {0,
1}. Let X denote their sum and let μ = E[X] denote the sum's expected
value. Then for any δ > 0

A similar proof strategy can be used to show that

Probabilistic details of Moment generating
function
For n = 1 : Expectation
(Single order moment)
For n = 2 : Second order
moment
For n = 3: Third order moment

Consider a geometric random

variable X with parameter p, as
Chernoff bounds
Problem 1:

• Consider a biased coin with probability p = 1/3 of landing heads and probability 2/3 of
landing tails. Suppose the coin is flipped some number n of times, and let Xi be a
random variable denoting the ith flip, where Xi = 1 means heads, and Xi = 0 means
tails. Use the Chernoff bound to determine a value for n so that the probability that
more than half of the coin flips come out heads is less that 0.001
Solution
Expectation–maximization (EM) algorithm

• In statistics, an expectation–maximization (EM) algorithm is an iterative method to

find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in
statistical models, where the model depends on unobserved latent variables.
• The EM iteration alternates between performing an expectation (E) step, which creates
a function for the expectation of the log-likelihood evaluated using the current estimate
for the parameters, and a maximization (M) step, which computes parameters
maximizing the expected log-likelihood found on the E step.
Objective of EM algorithm
• Probabilistic models, such as hidden Markov models or Bayesian networks,
are commonly used to model biological data. Much of their popularity can be
attributed to the existence of efficient and robust procedures for learning
parameters from observations.
• Often, however, the only data available for training a probabilistic model are
incomplete. Missing values can occur, for example, in medical diagnosis,
where patient histories generally include results from a limited battery of tests.
• Alternatively, in gene expression clustering, incomplete data arise from the
intentional omission of gene-to-cluster assignments in the probabilistic model.
The expectation maximization algorithm enables parameter estimation in
probabilistic models with incomplete data.
A coin-flipping experiment

Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
No ratings yet
Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
127 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Introduction To Machine Learning CS - 229
No ratings yet
Introduction To Machine Learning CS - 229
109 pages
PLC From Zero To Hero
No ratings yet
PLC From Zero To Hero
388 pages
22cse61 Module 4
No ratings yet
22cse61 Module 4
110 pages
FSMLecture6 - Statistics
No ratings yet
FSMLecture6 - Statistics
61 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
2 Unit PR Statistical Decision Making
No ratings yet
2 Unit PR Statistical Decision Making
61 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
Naive Bays
No ratings yet
Naive Bays
25 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
Chapter 4 Bayesian Networks
No ratings yet
Chapter 4 Bayesian Networks
62 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Unit II AI
No ratings yet
Unit II AI
43 pages
AI Bayes Theorem
No ratings yet
AI Bayes Theorem
10 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Babybayes Master
No ratings yet
Babybayes Master
172 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Bayesian Theory Daniel Restrepo
No ratings yet
Bayesian Theory Daniel Restrepo
8 pages
Unit Iv Learning
No ratings yet
Unit Iv Learning
40 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Lecture-8 Machine Learning With Python
No ratings yet
Lecture-8 Machine Learning With Python
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Data Analytics Unit-2 PPT Notes
No ratings yet
Data Analytics Unit-2 PPT Notes
190 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
23 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Unit IV CI PDF
No ratings yet
Unit IV CI PDF
24 pages
Exp1 A09 DS
No ratings yet
Exp1 A09 DS
6 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
Bojowald - Canonical Gravity and Applications: Cosmology, Black Holes, and Quantum Gravity
100% (1)
Bojowald - Canonical Gravity and Applications: Cosmology, Black Holes, and Quantum Gravity
313 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
Bayes Rule
No ratings yet
Bayes Rule
29 pages
Main
No ratings yet
Main
195 pages
AIC262 - IntroAI - Lab Manual SP22 - V3.1
No ratings yet
AIC262 - IntroAI - Lab Manual SP22 - V3.1
158 pages
Bayes Network
100% (1)
Bayes Network
80 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Notes - Module 4
No ratings yet
Notes - Module 4
17 pages
Running Master
No ratings yet
Running Master
57 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
Chapter 17 Exercise
0% (1)
Chapter 17 Exercise
3 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Water and Steam Chemistry, Deposits and Corrosion.
No ratings yet
Water and Steam Chemistry, Deposits and Corrosion.
41 pages
Unit-5 Bayes' Rule and Bayesian Network
No ratings yet
Unit-5 Bayes' Rule and Bayesian Network
9 pages
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Inductive Type 3728E2T Detailman PDF
No ratings yet
Inductive Type 3728E2T Detailman PDF
148 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Magnet Grade 5 WS
0% (1)
Magnet Grade 5 WS
7 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
MOU (00) - Introduction L
100% (2)
MOU (00) - Introduction L
37 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Csma Ca
No ratings yet
Csma Ca
10 pages
Mathlogicp1 PDF
No ratings yet
Mathlogicp1 PDF
122 pages
Aerobic Respiration Worksheet
No ratings yet
Aerobic Respiration Worksheet
2 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
ARUM760LTE5
0% (1)
ARUM760LTE5
2 pages
09 2024
No ratings yet
09 2024
37 pages
Bayes
No ratings yet
Bayes
10 pages
Principles and Practice 2 Edition T.S. Rappaport: Chapter 5: Mobile Radio Propagation: Small-Scale Fading and Multipath
No ratings yet
Principles and Practice 2 Edition T.S. Rappaport: Chapter 5: Mobile Radio Propagation: Small-Scale Fading and Multipath
95 pages
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
No ratings yet
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
77 pages
GRAPHS
No ratings yet
GRAPHS
3 pages
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
No ratings yet
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
13 pages
Calculating Frequency Bias Setting
100% (1)
Calculating Frequency Bias Setting
5 pages
An Investigation of A Model For Air Resistance Lab
No ratings yet
An Investigation of A Model For Air Resistance Lab
4 pages
Edi Lab - 2019-2020
No ratings yet
Edi Lab - 2019-2020
13 pages
W2915
No ratings yet
W2915
16 pages
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
No ratings yet
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
11 pages
Assign 1
No ratings yet
Assign 1
5 pages
Shell Diala S2 ZU-I Gasoil Tariff: Performance, Features & Benefits Main Applications
No ratings yet
Shell Diala S2 ZU-I Gasoil Tariff: Performance, Features & Benefits Main Applications
2 pages
Macros 4 B
No ratings yet
Macros 4 B
5 pages
MIT's Undergraduate String Theory Project
100% (13)
MIT's Undergraduate String Theory Project
18 pages
Python Tuples PDF
No ratings yet
Python Tuples PDF
3 pages
9 Fraunhofer Snail Trails
No ratings yet
9 Fraunhofer Snail Trails
4 pages
Heat Pipes Write Up With Example
No ratings yet
Heat Pipes Write Up With Example
9 pages
SAP Business Explorer Tools
No ratings yet
SAP Business Explorer Tools
12 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet

Machine Learning Models and Theories

Uploaded by

Machine Learning Models and Theories

Uploaded by

Machine learning models and

1. Bayesian Belief Network

Probability and computing /

• One important consideration about Bayesian network is representation of

• In many problems where decision based outcomes are necessary to show

• Interpretation of P(a)  Probability or chance of occurring event a.

• Grass wet can be seen because of either

• Only and only A will occur.

Ans = .2 *0.01*.99 = 0.00198

= P (̰̰) *P(E) *P(Alarm | , E) * P(VC | A) *P (AC|A)

P( getting even no. ) = 1/6 + 1/6 + 1/6 = ½;

Objective: Finding similarity between two approaches θ1 and θ0 using the

1: Continuous density function

• If X is a random variable whose cumulative distribution function admits a

• So How to take correct decision when we have this kind of situation?

• Calculate P (W1 | x) and P (W2 | x) from previous samples of fishes.

• Let us generalise this case when x is a condition and W is correct class/

LOC Function Adjacent Effort (Ground

1700 90.50 17 2000 Error function = (Oi – Ei | Feature value x)

• We can obtain additional insight into the operation of a general classifier

A similar proof strategy can be used to show that

Consider a geometric random

• In statistics, an expectation–maximization (EM) algorithm is an iterative method to

You might also like

Ans = .2 0.01.99 = 0.00198

= P (̰̰) P(E) P(Alarm | , E) * P(VC | A) *P (AC|A)