Graph Mining
Graph Mining
Andrea Marino
Node Properties
5 Clustering Coefficient: counting the number of triangles.
6 Centrality: computing closeness and betwennees.
Grouping Nodes
7 Clustering: overview, algorithms and spectral clustering.
8 Finding Patterns in graphs with applications to community
detection: listing cliques.
Andrea Marino Graph Mining Algorithms
Social Networks
The figure does not help us too much, since the center of the
figure itself seems just a mess of nodes and edges.
It is now time to take advantage of the notions and the
algorithms developed in the field of graph theory.
Is this a “dense” graph? That is, are the nodes connected (almost)
as much as they could be?
Definition
The degree of a node is the number of edges adjacent to it, that
is, the number of its neighbors.
Definition
If a graph has no loop and has no multiedge, it is said to be simple.
Definition
If, for any pair of nodes, there exists a path between them,
then the graph is said to be connected.
For directed graphs, a graph is strongly connected if, for any
pair of nodes x and y , there exists a path from x to y and
vice versa.
If a directed graph is not strongly connected but removing the
directions of the edges the resulting graph is connected, then
we say that the graph is weakly connected.
Definition
A giant component is a (strongly) connected component of a given
(directed) graph that contains a constant fraction of the entire
graph’s nodes.
Representing a graph
Traversing a Graph
The key idea of the breadth-first search is to mark each node which
has already been visited. Nodes that are visited for the first time
are put into a queue, which is a first-in first-out data structure.
At the beginning the source node is marked, and inserted into
the queue.
Until this queue is not empty, we “serve” the first node in the
queue, that is:
we examine all its neighbors and,
for each of them, if it is not marked, we set its state to marked
and we insert it into the queue.
Applications:
By executing a BFS from any node of the graph, we can both
compute the diameter of the graph, that is, the maximum
distance between two nodes of the graph, and the average
distance of the graph.