GraphMining 01 Introduction
GraphMining 01 Introduction
Course Information
Konstantina Lazaridou
PhD Candidate @ Information Systems Research - Web Science
§ Learn
• How to efficiently query, and store a graph using graph mining techniques
• Analyze networks to understand the properties and the behaviors of individuals
• Think in a research perspective (novelty, clarity, …)
• Solve practical problems
Text:
✗
Dear Dr. Davide Mottin,
Thanks,
[First Name-Last Name]
• Power laws
• Influence propagation
§ Graph querying and indexing:
• Exact and approximate queries
• Reachability queries
• Frequent subgraph mining
• Graph indexing
§ Node classification and node similarity
§ Link prediction
§ Communities and anomalies
Second part
Course Information
August 2016
>= 50 billions of pages
At least 4.73 billion pages indexed by search engines
Source: http://www.worldwidewebsize.com/
Twitter
313 Mln users
100Mln relationships
2500 types of
relationships
Complex
Social Networks
Ubiquituous
Large Road Networks
Valuable
§ Usefulness
• Analysis will discover non trivial patterns, and allow simple smooth explorations
• They reveal user behaviors
• They are valuable (Facebook, Twitter, Amazon ... All of them based on graphs!!!)
Graph
mining
SPARQL
g.V().hasLabel('movie').as('a','b').
where(inE('rated').count().is(gt(10))).
select('a','b').
by('name').
but
by(inE('rated').values('stars').mean()).
order().
by(select('b'),decr). limit(10
Not user friendly
GREMLIN Not interactive
MATCH (node1:Label1)-->(node2:Label2)
WHERE node1.propertyA = {value}
RETURN node2.propertyA, node2.propertyB
CYPHER
GRAPH MINING WS 2016 38
Lecture road
Course Information
BUT
we often use both without distinction
a
a c b
d
a c
b
c a
a
…
b
b b
G1 G2 G3
𝐷 = 𝐺- , 𝐺/ , … , 𝐺1 , 𝐺2 = 𝑉2 , 𝐸2 , 𝑙2 , 𝑙2 : 𝐸2 ∪ 𝑉2 → Σ
§ Adjacent node:
• A node u is adjacent to a node v if there is an edge between u and v, i.e. 𝑢, 𝑣 ∈ 𝐸
§ Path:
• Sequence of adjacent, non-repeating nodes in a graph
• Length of a path = number of edges
§ Diameter of a graph:
• Size of the longest shortest path
1 => {2}
Adjacency list
2 => {4}
3 => {1,2,4,6}
Dynamic, temporal
Static graph
graph
Adjacency 3D Matrix
matrix A (tensor)
Q G’
a
Suppose 𝜎 = 2, the frequent subgraphs are
b (only edge labels)
c • a, b, c
• a-a, a-c, b-c, c-c
• a-c-a …
b
Exponential number of patterns!!!