Seminar PPT 1
Seminar PPT 1
2018-2019
SEMINAR PRESENTATION
ON
OPTIMIZING PHYLOGENETIC QUERIES FOR
PERFORMANCE
1. Java 1.6.
2. stable eXist-DB 2.2 for its superior performance and
suitability in our current setting
3. a virtual computer equipped with a four-threaded Intel
Xeon 3.00 GHz CPU and 16 GB RAM, running on
Windows Server 2008 (64-bits).
4. a stable version of SWI-Prolog 7.2.3 for Windows.
5. meta-data
Pruning Aids
LCA queries can be computed in many different ways, and more efficient
procedural approaches probably exist, a rule based deductive evaluation is
probably most intuitive and computationally simple.
ca(i,i,i).
ca(i,j,k) :- ancs(k,i), ancs(k,j).
nlca(i,j,k) :- ca(i,j,k), ca(i,j,l),
ancs(k,l).
lca(i,j,k) :- ca(i,j,k), ¬ nlca(i,j,k).
(The ancs axiom represents the ancestor and the axiom lca returns the LCA X
of a set of nodes in a phylogeny.)
Unfortunately, there are several problems with these rules that lead to
unusual computational overheads.
We can use the rules below to compute the ancestor list for each node where edge(x,y)
means y is parent of x, and root(r) represents the root node of the tree T.
lca(X,Y,H) :- root(R),
alist(X,R,[X],P1), alist(Y,R,[Y],P2), intersect(P1,P2,[H|T]).
alist(Node,Node,_,[Node]).
alist(Start,End,Visited,[Start|Path]) :
edge(Start,X), ¬member(X,Visited),
alist(X,End,[X|Visited],Path).
intersect(_,[],[]).
intersect([],_,[]).
intersect([H1|T1],L2,[H1|L]):
member(H1,L2), intersect(T1,L2,L),!.
intersect([_|T1],L2,L):- intersect(T1,L2,L).
• The LCA rule uses the alist and intersect rules to return thehead of the
intersection list as the LCA. To make this rule work for a set of n nodes, we
need to invoke intersection rule n−1 times, and the ancestor list rule n times.
While it is possible to design smarter rules to compute intersection and avoid
membership tests once the first one failed, we still need to compute the
ancestor lists for all, which alone is as expensive, while the cost of computing
intersection is additional. Although we believe that the analytical discussion
presented above is reason enough in favor of our choice.
Reachability Index for Candidate Pruning
• For the tree in figure below, if we already knew that node a, or any other
member in the LCA subquery list, is not reachable from r, we could fail the
subquery involving LCA without computing it, and thus need not compute the
entire query.
• For the same reason we could fail any subquery involving the operator any.
• But for all other operators, we could leverage the idea of k-hop reachability to
see if two nodes are connected via exactly k nodes.