Illustration of The K2 Algorithm For Learning Bayes Net Structures
Illustration of The K2 Algorithm For Learning Bayes Net Structures
The purpose of this handout is to illustrate the use of the K2 algorithm to learn the topology
of a Bayes Net. The algorithm is taken from [CH93].
Consider the dataset given in [CH93]:
case x1 x2 x3
1 1 0 0
2 1 1 1
3 0 0 1
4 1 1 1
5 0 0 0
6 0 1 1
7 1 1 1
8 0 0 0
9 1 1 1
10 0 0 0
1
The K2 algorithm taken from [CH93] is included below. This algorithm heuristically searches
for the most probable belief–network structure given a database of cases.
1. procedure K2;
2. {Input: A set of n nodes, an ordering on the nodes, an upper bound u on the
3. number of parents a node may have, and a database D containing m cases.}
4. {Output: For each node, a printout of the parents of the node.}
5. for i:= 1 to n do
6. πi := ∅;
7. Pold := f (i, πi ); {This function is computed using Equation 20.}
8. OKToProceed := true;
9. While OKToProceed and |πi | < u do
10. let z be the node in Pred(xi ) - πi that maximizes f (i, πi ∪ {z});
11. Pnew := f (i, πi ∪ {z});
12. if Pnew > Pold then
13. Pold := Pnew ;
14. πi := πi ∪ {z};
15. else OKToProceed := false;
16. end {while};
17. write(’Node: ’, xi , ’ Parent of xi : ’,πi );
18. end {for};
19. end {K2};
where:
πi : set of parents of node xi
qi = |φi |
ri = |Vi |
αijk : number of cases (i.e. instances) in D in which the attribute xi is instantiated with its
kth value, and the parents of xi in πi are instantiated with the j th instantiation in φi .
P ri
Nij = k=1 αijk . That is, the number of instances in the database in which the parents of
xi in πi are instantiated with the j th instantiation in φi .
2
The informal intuition here is that f (i, πi ) is the probability of the database D given that the
parents of xi are πi .
Below, we follow the K2 algorithm over the database above.
Inputs:
• the ordering on the nodes x1 , x2 , x3 . We assume that x1 is the classification target. As such
the Weka system would place it first on the node ordering so that it can be the parent of each
of the predicting attributes.
• the upper bound u = 2 on the number of parents a node may have, and
K2 Algorithm.
i = 1: Note that for i = 1, the attribute under consideration is x1 . Here, r1 = 2 since x1 has two
possible values {0,1}.
1. π1 := ∅
Qq1 (r1 −1)! Q r1
2. Pold := f (1, ∅) = j=1 (N1j +r1 −1)! k=1 α1jk !
Let’s compute the necessary values for this formula.
Hence,
(r1 −1)! Q r1 (2−1)! Q2 1
Pold := f (1, ∅) = (N1 +r1 −1)! k=1 α1 k ! = (N1 +2−1)! k=1 α1 k ! = 11! ∗ 5! ∗ 5! = 1/2772
3
i = 2: Note that for i = 2, the attribute under consideration is x2 . Here, r2 = 2 since x2 has two
possible values {0,1}.
1. π2 := ∅
Qq2 (r2 −1)! Q r2
2. Pold := f (2, ∅) = j=1 (N2j +r2 −1)! k=1 α2jk !
Hence,
(r2 −1)! Q r2 (2−1)! Q2 1
Pold := f (2, ∅) = (N2 +r2 −1)! k=1 α2 k ! = (N2 +2−1)! k=1 α2 k ! = 11! ∗ 5! ∗ 5! = 1/2772
3. Since P red(x2 ) = {x1 }, then the only iteration for i = 2 goes with z = x1 .
Qq2 (r2 −1)! Qr2
Pnew := f (2, π2 ∪ {x1 }) = f (2, {x1 }) = j=1 (N2j +r2 −1)! k=1 α2jk !
4. Since Pnew = 1/900 > Pold = 1/2772 then the iteration for i = 2 ends with π2 = {x1 }.
4
i = 3: Note that for i = 3, the attribute under consideration is x3 . Here, r3 = 2 since x3 has two
possible values {0,1}.
1. π3 := ∅
Qq3 (r3 −1)! Q r3
2. Pold := f (3, ∅) = j=1 (N3j +r3 −1)! k=1 α3jk !
Hence,
(r3 −1)! Q r3 (2−1)! Q2 1
Pold := f (3, ∅) = (N3 +r3 −1)! k=1 α3 k ! = (N3 +2−1)! k=1 α3 k ! = 11! ∗ 4! ∗ 6! = 1/2310
5
– α321 = 0: # of cases with x2 = 1 and x3 = 0 (no case)
– α322 = 5: # of cases with x2 = 1 and x3 = 1 (case 2,4,6,7,9)
– N31 = α311 + α312 = 5
– N32 = α321 + α322 = 5
Qq3 (r3 −1)! Q2 (2−1)! Q2 (2−1)! Q2
f (3, {x2 }) = j=1 (N3j +r3 −1)! k=1 α3jk ! = (5+2−1)! ∗ k=1 α31k ! ∗ (5+2−1)! ∗ k=1 α32k !
1
= ∗6! α311 ! ∗ α312 ! ∗ 6!1 ∗ α321 ! ∗ α322 ! = 6!1 ∗ 4! ∗ 1! ∗ 6!1 ∗ 0! ∗ 5! = 6∗5
1
∗ 16 = 1/180
Here we assume that 0! = 1
4. Since f (3, {x2 }) = 1/180 > f (3, {x1 }) = 1/1800 then z = x2 . Also, since f (3, {x2 }) =
1/180 > Pold = f (3, ∅) = 1/2310, then π3 = {x2 }, Pold := Pnew = 1/180.
5. Now, the next iteration of the algorithm for i = 3, considers adding the remaining predecessor
of x3 , namely x1 , to the parents of x3 .
Qq3 (r3 −1)! Q r3
f (3, π3 ∪ {x1 }) = f (3, {x1 , x2 }) = j=1 (N3j +r3 −1)! k=1 α3jk !
6. Since Pnew = 1/400 < Pold = 1/180 then the iteration for i = 3 ends with π3 = {x2 }.
6
Outputs: For each node, a printout of the parents of the node.
Node: x1 , Parent of x1 : π1 = ∅
Node: x2 , Parent of x2 : π2 = {x1 }
Node: x3 , Parent of x3 : π3 = {x2 }
This concludes the run of K2 over the database D. The learned topology is
x1 → x2 → x3
References
[CH93] Gregory F. Cooper and Edward Herskovits. A bayesian method for the induc-
tion of probabilistic networks from data. Technical Report KSL-91-02, Knowl-
edge Systems Laboratory. Medical Computer Science. Stanford University School
of Medicine, Stanford, CA 94305-5479, Updated Nov. 1993. Available at:
http://www.springerlink.com/content/85c6f40ef659d8b2/fulltext.pdf.