Lec 10
Lec 10
吉建民
USTC
[email protected]
2023 年 4 月 29 日
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Used Materials
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
课程大纲
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Uncertainty
Problems:
! partial observability(部分可观察性,e.g., road state, other
drivers’ plans, etc.)
! noisy sensors (e.g., traffic reports)
! uncertainty in action outcomes (e.g., flat tire, etc.)
! immense complexity of modeling and predicting traffic
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Uncertainty
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Method for handling uncertainty
Probability
! Model agent’s degree of belief (信度)
! Given the available evidence,
A25 will get me there in time with probability 0.04
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probability
概率理论提供了一种方法以概括来自我们的惰性和无知的不确定
性。Probabilistic assertions summarize effects of
! Laziness(惰性): failure to enumerate exceptions(例外),
qualifications(条件), etc.
! Ignorance(理论的无知): lack of relevant facts, initial
conditions, etc.
Subjective probability(主观概率):
! Probabilities relate propositions(命题)to agent’s own state
of knowledge.
e.g., P(A25 | no reported accidents) = 0.06
These are not assertions(断言)about the world
Probabilities of propositions change with new evidence:
e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Making decisions under uncertainty
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Syntax
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Syntax
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Syntax
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Axioms(公理)of probability
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Prior probability(先验概率)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Prior probability(先验概率)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probability for continuous variables
Express distribution as a parameterized(参数化的)function of
value:
P(X = x) = U[18, 26](x) = uniform(均匀分布)density between
18 and 26
Normal distribution:
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Conditional probability(条件概率)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Conditional probability
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Normalization(归一化)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration, contd.
Obvious problems:
! Worst-case time complexity O(dn ) where d is the largest arity
! Space complexity O(dn ) to store the joint distribution
! How to find the numbers for O(dn ) entries?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Independence(独立性)
A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B)
Equivalent statements:
P(Toothache|Catch, Cavity) = P(Toothache|Cavity)
P(Toothache, Catch|Cavity) =
P(Toothache|Cavity)P(Catch|Cavity)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Conditional independence contd.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Bayes’ Rule(贝叶斯法则)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Bayes’ Rule(贝叶斯法则)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probabilistic Inference Using Bayes’ Rule
H = “having a headache”
F = “coming down with Flu”
! P(H)=1/10
! P(F)=1/40
! P(H|F)=1/2
One day you wake up with a headache. You come with the
following reasoning: “since 50% of flues are associated with
headaches, so I must have a 50% chance of coming down with flu”
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probabilistic Inference Using Bayes’ Rule
H = “having a headache”
F = “coming down with Flu”
! P(H)=1/10
! P(F)=1/40
! P(H|F)=1/2
The Problem:
P(F|H) = ?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probabilistic Inference Using Bayes’ Rule
The Problem:
P(F|H)
= P(H|F)P(F)/P(H)
= 1/8
+= P(H|F)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Bayes’ Rule and conditional independence
P(Cavity|toothache ∧ catch)
= αP(toothache ∧ catch|Cavity)P(Cavity)
= αP(toothache|Cavity)P(catch|Cavity)P(Cavity)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Where do probability distributions come from?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Where do probability distributions come from?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Summary of Uncertainty
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
作业
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents
Uncertainty
Probability
Inference
Bayesian network
Graphical models
Bayesian networks
Inference in Bayesian networks
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Frequentist vs. Bayesian
客观 vs. 主观
Frequentist(频率主义者): probability is the long-run expected
frequency of occurrence. P(A) = n/N, where n is the number of
times event A occurs in N opportunities.
“某事发生的概率是 0.1”意味着 0.1 是在无穷多样本的极限条件
下能够被观察到的比例
But 在许多情景下不可能进行重复试验
e.g., 发生第三次世界大战的概率是多少?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probability
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Independence/Conditional Independence
在大多数情况下,使用条件独立性能将全联合概率的表示由 n 的
指数关系减为 n 的线性关系。
Conditional independence is our most basic and robust form of
knowledge about uncertain environment.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probability Theory
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Outline
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
What are Graphical Models?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
What is a Graph?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Graphical Models in CS
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Why are Graphical Models useful
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Graphical models: Unifying Framework
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Role of Graphical Models in Machine Learning
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Graph Directionality
! Undirected graphical
! Directed graphical models models
—Directionality associated —links without arrows
with arrows
! Markov random fields(马
! Bayesian networks 尔科夫随机场)
—Express causal —Better suited to express
relationships(因果关系) soft constraints between
between random variables variables
! More popular in AI and ! More popular in Vision and
statistics physics
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Bayesian networks
一种简单的,图形化的数据结构,用于表示变量之间的依赖关系
(条件独立性),为任何全联合概率分布提供一种简明的规范。
Syntax:
! a set of nodes, one per variable
! a directed(有向), acyclic(无环)graph (link ≈ ”direct
influences”)
! a conditional distribution for each node given its parents:
P(Xi |Parents(Xi ))—量化其父节点对该节点的影响
In the simplest case, conditional distribution represented as a
conditional probability table 条件概率表 (CPT) giving the
distribution over Xi for each combination of parent values
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example contd.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Compactness(紧致性)
A CPT for Boolean Xi with k Boolean parents has 2k rows for the
combinations of parent values
一个具有 k 个布尔父节点的布尔变量的条件概率表中有 2k 个独
立的可指定概率
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Global semantics(全局语义)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Global semantics(全局语义)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Local semantics
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Constructing Bayesian networks
Need a method such that a series of locally testable assertions of
conditional independence guarantees the required global semantics
需要一种方法使得局部的条件独立关系能够保证全局语义得以成
立
1. Choose an ordering of variables X1 , . . . , Xn
2. For i = 1 to n
add Xi to the network
select parents from X1 , . . . , Xi−1 such that
P(Xi |Parents(Xi )) = P(Xi |X1 , . . . , Xi−1 )
This choice of parents guarantees the global semantics:
n
#
P(X1 , . . . , Xn ) = P(Xi |X1 , . . . , Xi−1 )(chainrule)
i=1
n (1)
#
= P(Xi |Parents(Xi ))(byconstruction)
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Constructing Bayesian networks
要求网络的拓扑结构确实反映了合适的父节点集对每个变量的那
些直接影响。
添加节点的正确次序是首先添加“根本原因”节点,然后加入受
它们直接影响的变量,以此类推。
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
! P(J|M) = P(J)?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
! P(J|M) = P(J)? No
! P(A|J, M) = P(A|J)?P(A|J, M) =
P(A)?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
Suppose we choose the ordering M, J, A, B, E
! P(J|M) = P(J)? No
! P(A|J, M) = P(A|J)?P(A|J, M) =
P(A)? No
! P(B|A, J, M) = P(B|A)?
! P(B|A, J, M) = P(B)?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
Suppose we choose the ordering M, J, A, B, E
! P(J|M) = P(J)? No
! P(A|J, M) = P(A|J)?P(A|J, M) =
P(A)? No
! P(B|A, J, M) = P(B|A)? Yes
! P(B|A, J, M) = P(B)? No
! P(E|B, A, J, M) = P(E|A)?
! P(E|B, A, J, M) = P(E|A, B)? . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example
Suppose we choose the ordering M, J, A, B, E
! P(J|M) = P(J)? No
! P(A|J, M) = P(A|J)?P(A|J, M) =
P(A)? No
! P(B|A, J, M) = P(B|A)? Yes
! P(B|A, J, M) = P(B)? No
! P(E|B, A, J, M) = P(E|A)? No
! P(E|B, A, J, M) = P(E|A, B)? Yes
.
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.
Example contd.
Conjunctive queries(联合查询):
P(XI , Xj |E = e) = P(Xi |E = e)P(Xj |Xi , E = e)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
Slightly intelligent way to sum out variables from the joint without
actually constructing its"explicit representation
P(X|e) = αP(X, e) = α y P(X, e, y)
注:在贝叶斯网络中可以将全联合分布写成条件概率乘积的行
驶: !
P(X1 , . . . , Xn ) = ni=1 P(Xi |Parents(Xi ))
在贝叶斯网络中可以通过计算条件概率的乘积并求和来回答查
询。
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by enumeration
Slightly intelligent way to sum out variables from the joint without
actually constructing its explicit representation.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Evaluation tree
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Inference by variable elimination
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Complexity of exact inference
Singly connected networks 单联通网络 (or polytrees 多树,对应的
无向图是树):
! any two nodes are connected by at most one (undirected) path
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Naïve Bayes model
P(X1 = x1 , . . . , Xn = xn )
= P(X1 = x1 )P(X2 = x2 |X1 = x1 ) . . . P(Xn = xn |X1 = x1 )
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Naïve Bayes model
!
P(Cause, Effect1 , . . . , Effectn ) = P(Cause) i P(Effecti |Cause)
P(Cause|Effect1 , . . . , Effectn ) = P(Effects,
! Cause)/P(Effects)
= αP(Cause, Effects) = αP(Cause) i P(Effecti |Cause)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Spam detection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Spam detection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Spam detection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Spam detection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Spam detection
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Learning to classify text documents
文本分类是在文档所包含的文本基础上,把给定的文档分配到固
定类别集合中某一个类别的任务。这个任务中常常用到朴素贝叶
斯模型。在这些模型中,查询变量是文档类别,“结果”变量则
是语言中每个词是否出现。我们假设文档中的词的出现都是独立
的,其出现频率由文档类别确定。
! 准确地解释当给定一组类别已经确定的文档作为“训练数
据”时,这样的模型是如何构造的。
! 准确地解释如何对新文档进行分类。
! 这里独立性假设合理吗?请讨论。
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: Learning to classify text documents
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Twenty Newsgroups
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Learning Curve for 20 Newsgroups
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
TFIDF
! TFIDF (tf-idf)
在某一类中词条 w 出现的次数
! Term Frequency: TFw =
该类中所有的词条数目
! Inverse Document Frequency:
语料库的文档总数
IDFw = log( )
包含词条 w 的文档数 +1
! TFIDFw = TFw × IDFw ,某一特定文件内的高词语频率,以
及该词语在整个文件集合中的低文件频率,可以产生出高权
重的 TFIDF
! PRTFIDF (A Probabilistic Classifier Derived from TFIDF)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Example: A Digit Recognizer
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Naïve Bayes for Digits
Simple version:
! One feature Fij for each grid position < i, j >
! Possible feature values are on / off, based on whether
intensity is more or less than 0.5 in underlying image
! Each input maps to a feature vector e.g.,
→ (F0,0 = 0, F0,1 = 0, F0,2 = 1, . . . , F15,15 = 0)
! Here: lots of features, each is binary
Naïve Bayes model:
!
P(Y|F0,0 , . . . , , F15,15 ) ∝ P(Y) i,j P(Fi,j |Y)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Examples: CPTs
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Comments on Naïve Bayes
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Summary
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
作业
! 第三版:14.12,14.13
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .