0% found this document useful (0 votes)
147 views

GraphMining 01 Introduction

Graph Mining

Uploaded by

Maz Har Ul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views

GraphMining 01 Introduction

Graph Mining

Uploaded by

Maz Har Ul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Graph Mining: Introduction

Davide Mottin, Konstantina Lazaridou


Hasso Plattner Institute

Graph Mining course Winter Semester 2016


Lecture road

Course Information

Introduction to graph mining

Graphs: models and basic concepts

GRAPH MINING WS 2016 2


Organization of the lecture
• Lecture and slides in English
• Tuesday 15.15 – 16.45
• Two mandatory assignments:
• [Individual] One presentation of a paper of choice among a list of papers about topics
covered in the lectures. Two slots: 13/12 (first part) – 07/02 (second part)
• [Individual] One small project of graph analytics to be completed before the end of the
course
• Examination (in English!):
• Oral exam (in the first three weeks after the lecture period)
• Grading scheme:
• 20%: Presentation
• 10%: Project
• 70%: Exam
• Lectures will be recorded and online (tele-task)
• There is no official textbook for the course
• Registration is required for this lecture, notify the studienreferat and
myself: [email protected]

GRAPH MINING WS 2016 3


About the lecturers
Davide Mottin
Postdoctoral Researcher @ Knowledge Discovery and Data Mining

PhD in 2015, University of Trento


Research Interests: Graph Mining, Data Mining, Graph databases,
Preference models, Query paradigms

Konstantina Lazaridou
PhD Candidate @ Information Systems Research - Web Science

MSc in 2015, University Ioannina


Research Interests: Graph Mining, Social Network Analysis, Web data
Mining, Opinion and Sentiment Analysis, Data Stream Mining

GRAPH MINING WS 2016 4


Course Web site
Lecture material (slides, papers,
books, tutorials, assignments,
…) available online
https://hpi.de/en/mueller/teac
hing/aktuelle-vorlesung/ws-
1617/graph-mining.html

§ The slides are also


available in the
intranet!

GRAPH MINING WS 2016 5


Objectives
§ Understanding
• Where graphs are, why they are important, and what are new applications
• The main challenges from data mining perspective

§ Learn
• How to efficiently query, and store a graph using graph mining techniques
• Analyze networks to understand the properties and the behaviors of individuals
• Think in a research perspective (novelty, clarity, …)
• Solve practical problems

§ Work on real scale data and existing tools

GRAPH MINING WS 2016 6


Prerequisites
§ Basic computer science and programming.
§ Data-mining knowledge is a plus but is not strictly required.
§ Basic probability theory and linear algebra are beneficial,
although a small recap of the main concepts will be done at the
beginning of the required lectures.

GRAPH MINING WS 2016 7


Schedule (tentative)
18.10 Introduction to graph mining
25.10 Social network analysis - Diffusion
01.11 Graph Querying: exact, approximate, and reachability
08.11 Frequent subgraph mining
15.11 Graph indexing
17.11 HPI-Kolloquium – Invited speaker: prof. Danai Koutra
22.11 Node classification
Some practical graph mining framework
29.11
Project assignment
06.12 Link prediction
13.12 Student paper presentation [first part]
20.12 Christmas break
27.12 Christmas break
03.01 Non overlapping communities
10.01 Overlapping communities
17.01 Anomaly detection
Graph summarization
24.01
Report handover
31.01 Summary of algorithms for different graph models
07.02 Student paper presentation [second part]

GRAPH MINING WS 2016 8


Course Material - 1
There is no official book in the course. However, the slides are based on materials from
these books:
§ © Aggarwal, C.C. and Wang, H. eds.,
2010.Managing and mining graph data (Vol.
40). New York: Springer.

§ © Chakrabarti, D. and Faloutsos, C., 2012.


Graph mining: laws, tools, and case
studies. Synthesis Lectures on Data Mining and
Knowledge Discovery, 7(1), pp.1-207.

§ © Easley, D. and Kleinberg, J., 2010. Networks,


crowds, and markets: Reasoning about a highly
connected world. Cambridge University Press.

GRAPH MINING WS 2016 9


Course Material - 2
Some material is inspired, imported and modified from several
existing courses.
§ Graph Mining and Exploration at Scale (prof. Danai Koutra)
• http://web.eecs.umich.edu/~dkoutra/courses/F15_598/

§ Social and Information Network Analysis (prof. Jure Leskovec)


• http://web.stanford.edu/class/cs224w/

§ Online Social Networks and Media (prof. Evaggelia Pitoura, prof.


Panayotis Tsaparas)
• http://www.cs.uoi.gr/~tsap/teaching/cs-l14/

§ Data Mining meets Graph Mining (prof. Leman Akoglu)


• http://www3.cs.stonybrook.edu/~leman/courses/14CSE590/index.htm

GRAPH MINING WS 2016 10


How to send emails
To: [email protected]
Subject: Problem – Help

Text:


Dear Dr. Davide Mottin,

I’m a student at the third year, attending the course,


number of shoes, quantity of food eaten yesterday
To: [email protected]
Subject: [GraphMining] Subgraph isomorphism
The slides are not clear. I don’t understand the things
theres.
Text:
Your sincerely, Hi Davide,
BigBug92

✔ the subgraph isomorphism concept is not entirely clear


to me. Why is the function bijective?

Thanks,
[First Name-Last Name]

GRAPH MINING WS 2016 11


Some rule of thumbs
§ I’m available for any kind of concern
§ Use the mailing list: https://lists.hpi.uni-
potsdam.de/listinfo/graphmining-ws1617
§ Seldom send email to me directly, unless it is a very important
concern
§ Be quick and precise in the emails
§ Ask me questions in the course, or right after/before the lecture. If
the question requires more time ask for a meeting with me:
• Better if you cluster and come in group instead of alone so I can answer to many
questions at the same time

§ If you think the course load/organization is unfair please let me know


before the end of the semester. After that there will be NO possibility
for discussion.

GRAPH MINING WS 2016 12


Feedback
§ The course is taught for the first time:
• Any feedback is appreciated
• Any comments on slides and clarity as well
• There might be some mistake here and there (but we will do our best)
• Ask questions if you don’t understand something. Better a question in class than a
doubt during the exam!

GRAPH MINING WS 2016 13


(There's) no such thing as a stupid question

GRAPH MINING WS 2016 14


Content of the course
§ Background concepts: probability theory/statistics, basic linear algebra, basic graph
concepts (morphisms, degrees, matrix representation, ...)
§ Social network analysis:
• Diffusion
First part

• Power laws
• Influence propagation
§ Graph querying and indexing:
• Exact and approximate queries
• Reachability queries
• Frequent subgraph mining
• Graph indexing
§ Node classification and node similarity
§ Link prediction
§ Communities and anomalies
Second part

• Overlapping/Non overlapping communities


• Anomaly detection
§ Graph summarization
§ Summary of algorithms for different models (graph streams, evolving graphs, probabilistic
graphs, colored graphs)
§ Graph mining frameworks

GRAPH MINING WS 2016 15


About the presentations
§ The presentation will be 15 mins in total
• 10 minutes presentation
• 5 minutes questions

§ The group will be divided into two halves:


• One half will present on December 13 papers regarding the first part of the course
• The other half will present on February 7 regarding the second part of the course

§ Every person presents one paper


§ First come first served
• if two people ask to present the same paper, the second has to change the choice

Paper list for the first part of the course: https://goo.gl/YMR0wD

GRAPH MINING WS 2016 16


Questions?

GRAPH MINING WS 2016 17


Lecture road

Course Information

Introduction to graph mining

Graphs: models and basic concepts

GRAPH MINING WS 2016 18


The web

August 2016
>= 50 billions of pages
At least 4.73 billion pages indexed by search engines
Source: http://www.worldwidewebsize.com/

GRAPH MINING WS 2016 19


Social graphs facebook
1.5 Bln users

450 Bln Relationships

600 Mln groups

10.5 USD per user

Twitter
313 Mln users

500 Mln Tweets/day

Avg 208 followers/user

They are complex: Groups, links, preferences, attributes

GRAPH MINING WS 2016 20


Knowledge graphs
20Mln entities

100Mln relationships

2500 types of
relationships

Other knowledge graphs:


• YAGO
• DBPedia
• DBLP
• Pubmed
• Linkmdb
• …

Connect entities such as persons, organizations, countries,


objects through semantic relationships (e.g. owns a company)

GRAPH MINING WS 2016 21


Biological networks

Protein-protein interaction networks Metabolic networks


Nodes: Proteins Nodes: Metabolites and enzymes
Edges: Physical interactions Edges: Chemical reactions

GRAPH MINING WS 2016 22


What else?

Source: http://screenrant.com/game-thrones-protagonist-tyrion-math/ Source: http://phys.org/news/2016-02-math-reveals-unseen-worlds-star.html

Anything that involves relationships (implicit or explicit)


can be modeled as a graph!
GRAPH MINING WS 2016 23
Graphs are everywhere

Complex
Social Networks
Ubiquituous
Large Road Networks
Valuable

Recommendation Graphs Knowledge Graphs

GRAPH MINING WS 2016 24


Why Graphs? Why now?
§ Describe complex data with a simple structure
• Nature, social, concepts, roads, circuits …

§ Same representation for many disciplines


• Computer science, biology, physics, economics, ...

§ Availability of (BIG) data


• Large networks are now available and require complex algorithms
• Networks are evolving over time (e.g., new users/friends in Facebook)

§ Usefulness
• Analysis will discover non trivial patterns, and allow simple smooth explorations
• They reveal user behaviors
• They are valuable (Facebook, Twitter, Amazon ... All of them based on graphs!!!)

GRAPH MINING WS 2016 25


”Graph mining is the process of discovering,
retrieving and analyzing non trivial patterns
in graph shaped data”

Graph
mining

GRAPH MINING WS 2016 26


What can we do with graph mining?
§ Compressing graphs without losing information
§ Finding complex structures fast
§ Recognizing communities and social patterns
§ Study the propagation of viruses
§ Predicting if two people will become friends
§ Understanding what are the important nodes
§ Showing how the network will evolve
§ Helping the visualization of complex structures
§ Finding roles, positive and negative influence prediction
§ …

GRAPH MINING WS 2016 27


What is involved in graph mining?
§ Basic graph algorithms (shortest paths, BFS, DFS, isomorphisms,
traversals, random walks …)
§ Storage and indexing
§ Smart representations for compactness
§ Modeling of problems as graphs
§ Distance metrics and similarity measures
§ Exact, Approximate, and heuristic algorithms
§ Evolving structures
§ Interactivity and online updates
§ Complexity (most of the problems are not polynomially
solvable)

GRAPH MINING WS 2016 28


Practical applications of graph mining

GRAPH MINING WS 2016 29


Finding substructures

GRAPH MINING WS 2016 30


Community detection

GRAPH MINING WS 2016 31


Influence propagation

GRAPH MINING WS 2016 32


Link prediction

GRAPH MINING WS 2016 33


Graph evolution

GRAPH MINING WS 2016 34


Detecting frauds

GRAPH MINING WS 2016 35


Visualization

Several visualization tools:


• General: Gephi, GraphViz, …
• Biological: Cytoscape, Network
Workbench
• Social: EgoNet, NodeXL, ...
• Relational: Tulip

GRAPH MINING WS 2016 36


Lost in the graph?

Hopefully not after this course ;)

GRAPH MINING WS 2016 37


Current: Query languages
SELECT ?name ?email Query languages ARE:
WHERE • Expressive
{
?person a foaf:Person .
• Powerful
?person foaf:name ?name . • Scalable
?person foaf:mbox ?email . • Compact
}

SPARQL
g.V().hasLabel('movie').as('a','b').
where(inE('rated').count().is(gt(10))).
select('a','b').
by('name').
but
by(inE('rated').values('stars').mean()).
order().
by(select('b'),decr). limit(10
Not user friendly
GREMLIN Not interactive
MATCH (node1:Label1)-->(node2:Label2)
WHERE node1.propertyA = {value}
RETURN node2.propertyA, node2.propertyB

CYPHER
GRAPH MINING WS 2016 38
Lecture road

Course Information

Introduction to graph mining

Graphs: models and basic concepts

GRAPH MINING WS 2016 39


Network or graphs?
§ Network refers to real systems
• Web, Social, Biological, …
• Terminology: Network, node, link/relationship

§ Graph is an abstract mathematical model of a network


• Web graph, Social graph
• Terminology: Graph, vertex/node, edge

BUT
we often use both without distinction

GRAPH MINING WS 2016 40


Graphs
0.1
a G = (V, E)
G = (V, E,p)
G = (V, E, l)
Verteces Edges Labeling
Probability
0.9 0.2
c
a 0.2b function
𝑙: 𝑉 ∪ 𝐸 → Σ
0.5
a
0.3
b
𝐸 ⊆ 𝑉×𝑉
0.6
c • Undirected Graphs
• Co-authorship, Roads, Biological
• Directed graphs
• Follows, …
0.8
b • Labeled (or colored) Graphs
• Knowledge graphs, …
• Probabilistic graphs
• Causal graphs

GRAPH MINING WS 2016 41


Graph databases (set of graphs)
a a

a
a c b
d
a c

b
c a
a

b

b b

G1 G2 G3

𝐷 = 𝐺- , 𝐺/ , … , 𝐺1 , 𝐺2 = 𝑉2 , 𝐸2 , 𝑙2 , 𝑙2 : 𝐸2 ∪ 𝑉2 → Σ

Set of small labeled graphs


Chemical compounds, Business models, 3D objects

GRAPH MINING WS 2016 42


An example?

Give me an example of network you know.


What are the nodes?
What are the edges?
What shape?

GRAPH MINING WS 2016 43


Important Terminology
§ Degree of a node:
• Number of ”neighbors” of a node Degree of v: 3
• In directed graphs In-degree: 1
a
Out-degree: 2
⁃ In-degree: number of inbound links
a
⁃ Out-degree: number of outgoing links v

§ Adjacent node:
• A node u is adjacent to a node v if there is an edge between u and v, i.e. 𝑢, 𝑣 ∈ 𝐸

§ Path:
• Sequence of adjacent, non-repeating nodes in a graph
• Length of a path = number of edges

§ Diameter of a graph:
• Size of the longest shortest path

GRAPH MINING WS 2016 44


Graph representation
1 0 1 0 0 0 0 Adjacency
2
0 0 0 1 0 0 matrix
1 1 0 1 0 1
3 4
A = 1 𝑖, 𝑗 ∈ 𝐸
0 0 0 0 0 0 𝑎27 = 8
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0 0 1 0 0 0
5
0 0 0 0 0 0
6

1 => {2}
Adjacency list
2 => {4}
3 => {1,2,4,6}

What are the advantages/disadvantages of one or another


representation?

GRAPH MINING WS 2016 45


Static vs Evolving graph
tn

A1
A1
A1 t1
A1
A1

Dynamic, temporal
Static graph
graph

Adjacency 3D Matrix
matrix A (tensor)

GRAPH MINING WS 2016 46


Graph Isomorphism
G1 G2

Given two graphs,𝐺- : 𝑉- , 𝐸- , 𝑙- , 𝐺/ : 〈𝑉/ , 𝐸/ , 𝑙/ 〉 𝐺- is isomorphic


𝐺/ iff exists a bijective function 𝑓: 𝑉- → 𝑉/ s.t.:
1. For each 𝑣- ∈ 𝑉- , 𝑙 𝑣- = 𝑙(𝑓 𝑣- )
2. 𝑣- , 𝑢- ∈ 𝐸- iff 𝑓 𝑣- , 𝑓 𝑢- ∈ 𝐸/

GRAPH MINING WS 2016 47


Subgraph Isomorphism

Q G’

A graph ,𝑄: 𝑉M , 𝐸M , 𝑙M is subgraph isomorphic to a graph


𝐺: 〈𝑉, 𝐸, 𝑙〉 if exists a subgraph 𝐺 N ⊑ 𝐺, isomorphic to Q

GRAPH MINING WS 2016 48


Frequent Subgraph Mining
a
Problem
Find all subgraphs of G that appear at least
a c c 𝜎 times

a
Suppose 𝜎 = 2, the frequent subgraphs are
b (only edge labels)
c • a, b, c
• a-a, a-c, b-c, c-c
• a-c-a …

b
Exponential number of patterns!!!

GRAPH MINING WS 2016 49


Questions?

GRAPH MINING WS 2016 50

You might also like