Complex Network Report
Complex Network Report
Complex Network
Comparing Motif Detection Algorithms/tools
Imran 10CS30021
4/24/2013
Under the Guidance of:
Rishiraj Saha Roy
Prof. Animesh Mukherjee
Introduction
All networks, including biological networks (e.g., metabolic networks, transcription regulatory networks, protein-protein
interaction networks, protein structure networks, neural networks, ecological networks), social networks, technological
networks (e.g., computer networks, electrical circuits), etc., can be represented as graph, which include a wide variety of
subgraphs. One important local property of networks are so-called Network Motifs, which are defined as recurrent and
statistically significant sub-graphs or patterns. Motifs, sub-graphs that repeat themselves in a specific network or even
among various networks, would be consistent with the tenets of evolutionary theory. Each of these sub-graphs, defined by a
particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved
efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have
recently gathered much attention as a useful concept to uncover structural design principles of complex
networks.[1] Although network motifs may provide a deep insight into the networks functional abilities, their detection is
computationally challenging.
Objective
In this term project, we wish to explore several state-of-the-art motif finding algorithms and compare them on aspects like
scalability, ease of integration with existing code modules, and apparent drawbacks. Comparative interpretation of results
can be insightful with respect to understanding how these algorithms work. Apart from this, we will also try to gain some
insight into the semantic properties of the network motifs in the query log.
Algorithms and Tools
FANMOD
Kavosh
Biemann
MAVisto
Kashtan(mfinder)
Motifs Considered
Size 3 Motifs:
3-Clique
3-semi-clique
Size 4 Motifs:
4-Chain
4-Box
4-semi-clique
4-loop-out
4-clique
4-star
) = loge
Results
We present here the raw counts of the motifs and the LNMC for networks of various sizes. We also include the running time
of the 5 tools/algorithms used.
3-clique
4-chain
4-star
4-loopout
4-box
4-semi
clique
4-clique
Beimann
39397
1882
283595
732187
76309
1504
6187
512
Mfinder
39397
1882
283595
732187
76309
1504
6187
512
Kavosh
95.44%
4.56%
25.77%
66.54%
6.94%
0.08%
0.539219%
0.069618%
FANMOD
39397
1882
283595
732187
76309
1504
6187
512
Mavisto
45043
1882
485495
822918
107201
9227
9259
512
Table 1
Running Time:
Size 3
Size 4
Beimann
_______
_______
Mfinder
5.67 min
1.08 hrs
Kavosh
9.09 sec
90.67 sec
FANMOD
_______
_______
Mavisto
12 min 40 sec
2 hr 12 sec
Table 2
Normalized Count:
3-chain
3- clique
4-star
4-loopout
4-chain
4-box
4-semiclique
4-clique
Beimann
15.918035
6.461399
18.180713
15.919468
11.627872
11.992805
19.640088
24.256635
Mfinder
15.918035
6.461399
18.180713
15.919468
11.627872
11.992805
19.640088
24.256635
Kavosh
15.918027
6.46157
18.181255
15.920754
11.628311
11.457723
19.598743
24.659851
FANMOD
15.918035
6.461399
18.180713
15.919468
11.627872
11.992805
19.640088
24.256635
MAVISTO
15.923766
6.333202
18.032217
15.994066
11.900178
13.541494
19.777916
23.991318
Table 3
3-clique
4-star
4-loopout
4-chain
4-box
4semiclique
4-clique
Beimann
52097
979
2934041
73010
226949
675
3107
85
Mfinder
52097
979
2934041
73010
226949
675
3107
85
Kavosh
98.15547%
1.844525%
90.61647%
2.02154%
7.00921%
0.01655%
0.087897%
0.010686%
FANMOD
52097
979
2934041
73010
226949
675
3107
85
Table 4
Running Time:
Size 3
Size 4
Beimann
________
_______
Mfinder
3 min 9 sec
1 hr 2 min
Kavosh
2.48 sec
FANMOD
_______
_______
Table 5
Normalized Count:
3-chain
3-clique
4-star
4-loopout
4-chain
4-box
4-semiclique
4-clique
Beimann
9.035738
11.227082
12.144607
13.805812
14.939941
9.122173
16.632226
19.892286
Mfinder
9.035738
11.227082
12.144607
13.805812
14.939941
9.122173
16.632226
19.892286
Kavosh
9.035738
11.227082
12.145177
13.782722
14.940511
8.892158
16.545049
21.296640
FANMOD
9.035738
11.227082
12.144607
13.805812
14.939941
9.122173
16.632226
19.892286
Table 6
3-clique
4-star
4-loopout
4-chain
4-box
4-semiclique
4-clique
Beimann
1837412
10411
565434017
3930501
12192850
29157
51143
1436
Mfinder
1837412
10411
565434017
3930501
12192850
29157
51143
1436
Kavosh
97.43658%
0.563420%
96.27968%
0.635352%
2.096291%
0.03304%
0.4264%
0.001044%
FANMOD
1837412
10411
565434017
3930501
12192850
29157
51143
1436
Table 7
Running Time:
Size 3
Size 4
Beimann
88 sec
733 sec
Mfinder
3 hrs 12 min
1.54 days
Kavosh
12.804947222 hrs
FANMOD
1 sec
601 sec
Table 8
Normalized Counts:
3-chain
3-clique
4-star
4-loopout
4-chain
4-box
4-clique
4-semiclique
Beimann
13.045956
16.045157
18.202608
20.595303
21.727386
15.691476
24.243536
29.536371
Mfinder
13.045956
16.045157
18.202608
20.595303
21.727386
15.691476
24.243536
29.536371
Kavosh
9.048590
10.061333
18.202800
21.73723
20.54348
15.302440
23.5296
30.98809
FANMOD
13.045956
16.045157
18.202608
20.595303
21.727386
15.691476
24.243536
29.536371
Table 9
Raw Count:
3-chain
3-clique
4-star
4-loopout
4-chain
4-box
4-semiclique
4-clique
Beimann
24360076
145737
8194654585
106982250
934755823
3535399
1832721
39777
Mfinder
24360076
145737
8194654585
106982250
934755823
3535399
1832721
39777
Kavosh
FANMOD
24360076
145737
8194654585
106982250
934755823
3535399
1832721
39777
Table 10
Running Time:
Size 3
Size 4
Beimann
1 min 7 sec
5 hrs 26 min
Mfinder
1.132 days
12.474 Days
Kavosh
16.603 min
7.359 Days
FANMOD
26 sec
3.694 hrs
Table 11
Normalized Count:
3-chain
3-clique
4-star
4-loopout
4-chain
4-box
4-semiclique
4-clique
Beimann
15.111847
19.199322
21.209101
25.265969
27.433591
21.856132
30.223162
36.292410
Mfinder
15.111847
19.199322
21.209101
25.265969
27.433591
21.856132
30.223162
36.292410
Kavosh
FANMOD
Table 12
15.111847
19.199322
21.209101
25.265969
27.433591
21.856132
30.223162
36.292410
Analysis
It was expected that the result of all algorithms/tools used in this project would be different for the larger query size.
But all the results had 100 percent match. So may be , we need still larger network to find this difference. Here is a summary
of out conclusions about the algorithms/tools used:
Kavosh ran for network generated from Query Log of size 1,00,000 but gave meaningless results(percentage greater
than 100.)
FANMOD , Beimann , Kashtan(mfinder) are scalable atleast upto network generated from a Query Log of size atleast
1,00,000.
Motif count given by all the algorithms are same for network generated from Query Log of size atleast upto 1,00,000.
Now we provide a summary of comparison of all the algorithms/tools based on the points of comparison specified in the
objective:
Accuracy
Runtime
For the network sizes we considered, all gave the same results.
o
o
o
Scalability
o
o
FANMOD, Beimann and Kashtan(mfinder) are scalable atleast upto (30853,143306) size network.
Kavosh is scalable atleast upto (6331,16957) size network.
FANMOD:
Implemented in popular graph libraries like igraph and NetworkX .
Can be used in various languages like C, Python, R.
Kashtan(mfinder):
Implemented in C.
Easy to use.
Kavosh:
Implemented in C
Easy to use.
Beimann:
Implemented in Java
Code has to be modified to find motifs belonging to different classes.
Mavisto:
Available as a JAVA Webstart Application
Difficult to use in some Environments.
# 4-chain
: 16
# 3-semi-clique : 8
These kind of structure is expected to be abundant in a real network where community structure is prevalent.
2. #3-clique > #4-loop out
Since a single 4-loop out Motif contains one 3-clique (triangle) ,so one would expect the count of 3-clique(triangle)
will more than count of 4-loopout.
# 4-loopout : 9
# 3-clique : 1
This kind of structure would also be prevalent in a real network where community structure is prevalent.
3. #3-chain > 4-star
Since a single 4-Star Motif contains 3 3-chain,so one would expect the count of 3-chain will atleast equal to thrice the
count of 4-Star.
# 4-star
: 56
# 3-semi clique : 28
This kind of structure will be prevalent in a real network where some nodes have very high degree. In our QLN, the
nodes corresponding to conjunctions will play this role.
# 4-semi-clique : 9
# 3-semi-clique: 6
This kind of structure is going to arise in a real network where some nodes have very high degree. In our QLN, the
nodes corresponding to conjunctions will play this role.
As the motif structure suggests, the middle nodes connect two words which one would not expect to come together in a
query. For example, in motif , upgrade-computer-worm, computer connects two words upgrade and word, which normally
would not come together in a query.
3-Clique :
As the network structure of a 3-clique suggests, it represents a set of three words which have high probability of coming
together in a query. For example ,in motif -mayo-clinic-ocd- all words have strong relation to each other. Mayo is a clinic in
USA with facilities for treatment of OCD(Obsessive-Compulsive Disorder)
4-Chain :
Semantic polysemy refers to the phenomenon that a word, denoted as a string of characters, can have different denotations in
different contexts . In co-occurrence networks, polysemy leads to chains: ambiguous words connect words that are not
connected to each other, and act as a bridge between different topical word clusters. In a chain of length four, one more
word from a topical cluster is observed, which does not connect to the polysemous word since it seems that their
occurrences are deemed rather independent by the significance measure. For example, in motif patriot-point-south-Korea ,
south is used in two sense, with point, it signifies a direction while , South Korea is a separate word in itself.
4-Box Motifs
Synonymy means that different words refer to the same concept. Two words are synonyms if they can be used
interchangeably without changing the meaning, but there are also rather syntactic variants of words that refer to the same
concept, such as nominalizations of adjectives or verb forms of different inflections. For example, in motif , annuity-indexannuities-indexed , annuity and annuities can be used interchangeably depending upon the usage and the same could be said
about the index and indexed.
4Star
A query could be logically divided into two parts, content and intent. Content represents the theme of the query and intent
part is added to specifically add information required about the content part. Initially we thought a 4-star represents a
content word surrounded by three intent word as in case of motif Poker-online-live-result. Here Poker is the content part
and online, live and results are the intent part. But, what we observed is , an intent word could also the center of the 4-
star , surrounded by three content word as in case of cheap-access-isp-car . Here, cheap is the intent part and access, isp
and car are the content parts.
4- Loop Out :
score espn football sat
A 4-loop out represents a set of words in which first three are connected to each other and the last one is connected to one of
the first three. In a QLN, this means that first three words have high probability of occurring together in a query and the last
word will come with word it is connected to. For example, in motif score-espn-football-sat , score, espn and football are
likely to occur together in a query while sat will occur only with score , and not with any other word.
4-SemiClique
works
Michigan
job
listing
A 4-semi-clique represents a structure where all except one pair of nodes are disconnected. In the above examples , the only
pair of unrelated words are second and the last words, that are (mouse, customers) , (outdoor, calculation) ,(locations, kids)
,(michigan, listing)
4-Clique
A 4-clique represents a group of words which are very likely to come together in a single query. The examples we obtained
clearly proves this point.
References
1. Chris Biemann, S tefanie R oos, KarstenW eihe,Q uantifying S emantics using Complex N etwork
Analysis.
2. Sebastian W ernicke, A Faster Algorithm for Detecting N etwork Motifs.
3. Rishiraj SahaR oy and NiloyGanguly , Smith Agarwal, MonojitChoudhury, Structural Complexity
of Web Search Queries through the Lenses of Language Models, Networks and Users.
4. Falk S chreiberand Henning S chwobbermeyer, M AVisto: A toolfor the exploration of network
motifs.
5. N . Kashtan , S .itzkovitz, R . M ilo and U . Alon, Efficient sampling algorithm for estimating
subgraph concentrations and detecting network motifs.
6. Zahra Razaghi Moghadam Kashani, HayedehAhrabian, ElaheElahi, Abbas Nowzari-Dalini, Elnaz
SaberiAnsari, SaharAsadi, Shahin Mohammadi, Falk Schreiberand Ali Masoudi-Nejad , Kavosh: a
new algorithm for finding network motifs.
7. M AVIS T O --http://mavisto.ipk-gatersleben.de/
8. www.wikpedia.org