0% found this document useful (0 votes)
55 views

Large Scale Distributed Graph Processing: Data Mining (CS6720)

The document summarizes models for large scale distributed graph processing including the massively parallel computation (MPC) model. It describes how the MPC model works with distributed memory across machines, synchronous communication rounds, and limitations on memory and message sizes. It then provides examples of how fundamental graph algorithms like broadcasting and finding a maximal matching can be implemented in the MPC model.

Uploaded by

Rachit Tibrewal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Large Scale Distributed Graph Processing: Data Mining (CS6720)

The document summarizes models for large scale distributed graph processing including the massively parallel computation (MPC) model. It describes how the MPC model works with distributed memory across machines, synchronous communication rounds, and limitations on memory and message sizes. It then provides examples of how fundamental graph algorithms like broadcasting and finding a maximal matching can be implemented in the MPC model.

Uploaded by

Rachit Tibrewal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

03-03-2020

Large Scale Distributed Graph


John Augustine
Jan 16, 2020
Processing
Data Mining (CS6720)

1 2

Shared Memory PRAM Massively Parallel Computation (MPC) Model


• Input data size 𝑁 words; each word = 𝑂(log 𝑁) bits.
MapReduce
• The number of machines 𝑘. (Machines identified by {1, 2,…, 𝑘}.)
Programming
Parallel &
Distributed
Models • Memory size per machine 𝑆 words.
Computing Models Think like a vertex • 𝑆 ≥ 𝑁 is uninteresting. Assume: 𝑆 = 𝑂(𝑁 ) for some 𝜖 ∈ (0,1].
• Also, require 𝑆𝑘 ≥ 𝑁.
• Synchronous communication rounds
Massively Parallel
Computation
• Local computation within each machine
• Create messages for other machines. Sum of message sizes ≤ 𝑆.
Message Passing
• Send… Receive. Ensure no machine requires > 𝑆 memory.
𝑘-machine model • Goal: Solve problem in as few rounds as possible.

3 4
03-03-2020

Initial Data Distribution


On Graphs:
𝑁=𝑂 𝑛
• Typically, data is split into words (often as ⟨𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒⟩ pairs). (Strongly)
Superlinear

• The words could be either randomly distributed or arbitrarily 𝑁 =𝑂 𝑛+𝑚


distributed.
• Load balanced so that no machine has much more than other
machines. = 𝑂(𝑚)
• Output: usually distributed & depends on problem. Memory
• Questions Size 𝑆
• How to achieve random load balanced distribution?
• How to remove duplicates? 𝑁 = 𝑂(𝑛) 𝑁 = 𝑛 for
𝛼 ∈ (0,1).
Near (Strongly)
Linear Sublinear

5 6

Broadcasting Maximal Matching


• Let 𝑆 = 𝑛 for some constant 𝜖 > 0. • A matching in a graph 𝐺 = (𝑉, 𝐸) is a set of edges that don’t share
common vertices.
• One machine src needs to broadcast 𝑛 words.
• Approach 1: the machine sends 𝑘 messages of size 𝑛. If 𝑘 > 𝑛 ???
• A maximum matching is a matching of maximum possible cardinality.
• Approach 2: Build 𝑛 -ary tree with src as root.
The image part with relationship ID rId3 was not found in the file.

• A maximal matching is a matching that ceases to be one when any


• Broadcast takes 𝑂(ℎ𝑒𝑖𝑔ℎ𝑡) rounds edge is added to it.
• ℎ𝑒𝑖𝑔ℎ𝑡 = 𝑂 log 𝑘 =𝑂
• A maximal matching has cardinality at least half of a maximum
since 𝑁 = 𝑝𝑜𝑙𝑦 𝑆 (𝑂(𝑛 ) for graphs) matching. Homework: Prove this.

7 8
03-03-2020

Sequential Algorithm for Filtering: Idea to find a maximal matching in


finding a maximal matching. the superlinear memory regime
1. Let 𝑋 = ∅. Preprocessing.
Let ℓ be a designated “leader” machine (say, machine 0). Assume it doesn’t hold any edge at the
2. For each 𝑒 = 𝑢, 𝑣 ∈ 𝐸, beginning. (Why is this OK?) During the course of the algorithm, ℓ maintains a matching (initially
1. If neither 𝑢 nor 𝑣 is an endpoint of any edge in 𝑋, then 𝑋 = 𝑋 ∪ {𝑒}. empty).
Other machines are called regular machines. 𝐺 = 𝑉 , 𝐸 denotes graph during phase 𝑟. We use
3. Output 𝑋. 𝑚 for number of edges in 𝐺 . 𝐺 ← 𝐺.
Steps in each phase 0,1, … (until 𝐺 becomes empty.)
Correctness: 1. Each regular machine marks each local edge independently with probability 𝑝 = and
sends the marked edges to the leader ℓ.
• Invariant: 𝑋 is a matching at all times.
2. The leader ℓ recomputes the maximal matching with edges it received but without losing any
• Suppose 𝑋 is not maximal at the end. Then some edge 𝑒 can be edge from the previous matching. (How?)
added to it and it will remain a matching. But why was 𝑒 rejected? 3. The leader ℓ broadcasts the matching so computed (≤ 𝑛/2 edges) to all machines.
4. Each regular machine removes edges that have at least one common vertex with the received
matching. Isolated vertices are also removed.

9 10

Outline of the Analysis Claim: At most whp at end of round 𝑟


• Correctness is obvious (similar to the sequential algorithm) if • Let 𝐺 = 𝑉 , 𝐸 be the leftover graph at the end of round 𝑟 − 1.
bandwidth limitation is not violated. • For some pair of vertices 𝑢, 𝑣 ∈ 𝑉 , can 𝑒 = 𝑢, 𝑣 have been sent to
the leader? No! (Why? If sent, at least one of 𝑢 or 𝑣 would have been
matched, and therefore discarded.)
• Claims:
• The leader ℓ receives at most 𝑛 edges (whp) in step 1. (Homework)
• Consider any set of vertices 𝐽 with > edges with both end
• If a phase 𝑟 starts with 𝑚 edges, then the number of edges at the end of points in 𝐽.
round 𝑟 is with high probability. • What is the chance that V = 𝐽?
• The total number of rounds is log m∈𝑂 . Why? Pr 𝑎𝑙𝑙 𝑖𝑛𝑑𝑢𝑐𝑒𝑑 𝑒𝑑𝑔𝑒𝑠 𝑛𝑜𝑡 𝑠𝑒𝑛𝑡 ≤ 1 − 𝑝 ≤𝑒 .
There are at most 2 subsets of 𝑉, so by union bound, the result holds.

11 12
03-03-2020

Data Distribution
The 𝑘-machine Model
The Random Vertex Partitioning (RVP)
• Input data size 𝑁 words; each word = 𝑂(log 𝑁) bits. • Typically, data is split into words (often as ⟨𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒⟩ pairs).
• The number of machines 𝑘. (Machines identified by {1, 2,…, 𝑘}.) • The words could be either randomly distributed or arbitrarily
distributed.
• Each pair of machines connected by a link.
• Typically used in processing large graphs.
• Memory size is unbounded (but usually not abused).
• RVP: Most common approach is to randomly partition vertices into 𝑘
• Synchronous communication rounds parts and place each part into one of the machines. Then, a copy of
• Local computation within each machine each edge is placed in the (≤ 2) machines that contain either of its
• Each machine creates one message of 𝑂(log 𝑛) bits for every other machine. end points.
• Send… Receive. • Other partitioning of graph data is also conceivable (e.g., random
edge partitioning, arbitrary edge partitioning, etc.).
• Goal: Solve problem in as few rounds as possible.

13 14

How to design 𝑘-machine algorithms?


RVP is Load Balanced
Answer: Think like a vertex.
Mapping Lemma: Under RVP of a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices and You have two “think like a vertex” (point to point message passing) models:
𝑚 edges, whp, (i) the Congested Clique (CC), and
1. every machine has at most 𝑂 vertices and (ii) the Node Capacitated Clique (NCC).
2. the number of edges associated with each link is at most 𝑂 +
edges, Simulation:
where Δ is the maximum degree in 𝐺. 1. Design algorithm in CC or NCC with good bounds.
Proof of part 1 is easy. Just use Chernoff bound. 2. Automatically simulate the CC/NCC algorithm in the 𝑘-machine using
Proof of part 2 is more complicated and uses Bernstein’s inequality that standard simulator.
we are not covering. So the proof is omitted. 3. Claim bounds in the 𝑘-machine model.

15 16
03-03-2020

The CC and NCC Models (point to point) Simulating CC/NCC in the 𝑘-machine model
• We have 𝑛 nodes 𝑉 = {1,2, … , 𝑛}. • Assume there is a hash function ℎ: V → {1, 2, … , 𝑘} that is a simple uniform
hash function. (Claims will hold under 𝑂(log 𝑛)-universal families.)
• The input graph 𝐺 = (𝑉, 𝐸) is known locally, i.e., each node 𝑣 ∈ 𝑉 • Assume that each node 𝑣 ∈ 𝑉 is placed in machine ℎ(𝑣).
knows its incident edges. • Each machine 𝑖 now contains (and therefore simulates) nodes 𝑉 =
• The nodes can communicate via synchronous message passing, but {𝑣|ℎ 𝑣 = 𝑖}. We know that 𝑉 ∈ 𝑂 whp.
each message must be at most 𝑂(log 𝑛) bits.
Simulation of one NCC round (at each machine 𝑖):
1. Machine 𝑖 performs local computation for all nodes in 𝑉 as per the CC/NCC
CC: Each node can send 𝑛 − 1 messages (one for every other node) algorithm.
2. The messages to be sent are then individually sent to the machine that holds their
NCC: Each node can send at most 𝑂 log 𝑛 messages to 𝑂 log 𝑛 respective recipient nodes.
carefully nodes. 3. Incoming messages are received and handed over to the recipient nodes.

17 18

Conversion Theorem (CC  𝑘-machine) Proof of the Conversion Theorem


Theorem: Consider a CC algorithm 𝐴 in which each node sends and receives at Consider round 𝑖 of 𝐴 under CC.
most Δ′ messages per round. Suppose further that 𝐴 takes 𝑇 communication
rounds to complete under CC with a total message complexity of 𝑀. Then, 𝐴 can Let 𝐺 = (𝑉, 𝐸 ) be the graph on the input node set 𝑉 with edges 𝑒 =
be simulated in the 𝑘-machine model in
𝑢, 𝑣 ∈ 𝐸 iff 𝑢 and 𝑣 communicated during round 𝑖.
𝑂
𝑀
+𝑇
Δ Note: max degree of 𝐺 (denoted Δ ) is at most Δ .
𝑘 𝑘
rounds.
| |
By the Mapping Lemma, at most 𝑂 + messages will be
Corollary 1: Consider an NCC algorithm 𝐴 that takes 𝑇 communication rounds communicated across each link. QED.
to complete under NCC. Then, 𝐴 can be simulated in the 𝑘-machine model in
𝑂 1+ 𝑇 rounds. (Proof is obvious, hence omitted.)

19 20
03-03-2020

Conversion Theorem (Broadcast CC  NCC) Proof of Theorem


Theorem: Suppose 𝐴 is an algorithm under CC that performs only • Arrange the nodes in NCC in the form of a tree
broadcast based communication (i.e., a node sending the same message to wherein each node 𝑖, 𝑖 > 1, has parent . 
every other node) and suppose further that 𝐴 requires 𝑇
communication rounds and 𝑅 broadcasts in total. Then, 𝐴 can be • Consider any round 𝑟 of the broadcast CC
simulated in NCC (point to point communication) in 𝑂 𝑅 + 𝑇 rounds algorithm with 𝑅 > 1 broadcasts.
comprising 𝑂(𝑅) broadcast calls. • Each node that has to broadcast a message
upcasts the message to the root in a pipelined
Corollary: 𝐴 can be simulated in the 𝑘-machine model in fashion. This takes 𝑂 log 𝑛 + 𝑅 rounds in NCC.
See next slide.
𝑅 • The messages are then broadcasted down to all
𝑂 +𝑇 nodes in the tree in 𝑂(log 𝑛 + 𝑅 ) rounds. (HW)
𝑘 • Thus, each round 𝑟 takes 𝑂 log 𝑛 + 𝑅 rounds.
rounds. QED

21 22

Upcasting messages in a tree (under NCC) Claim: Upcasting takes 𝑂 log 𝑛 + 𝑅 rounds.
Input: some 𝑅 nodes in the tree have a message each • A “clump” is a maximal connected collection of vertices such that their 𝑈 ’s are non-
Output: those messages must reach the root. empty.
• There is at most one clump with the root. Call it the head clump.
Strawman Attempt: Those 𝑅 nodes send it to the root. Violates NCC. • Each clump has a node closest to the root. Call it the root of the clump.
• Claim: two messages are “friends” if they are part of the same clump. Once two
messages become friends, they will remain friends.
Correct attempt: • Consequence: clumps can coalesce, but not break apart.
• At the start of a round 𝑟, let each node 𝑣 ∈ 𝑉 has a set of messages 𝑈 . Initially, 𝑈 • Claim: The root of any non-root clump moves closer to the root in each round. This can
contains the message that 𝑣 wishes to broadcast (empty otherwise.) happen in two ways:
• Then in round 𝑟, each 𝑣 picks one message 𝑥 from 𝑈 and sends it to the parent. In turn, 1. The root moves up and does not coalesce with another clump whose root is higher.
it receives up to two messages (say, 𝑦 and 𝑧) from its children. Thus, at the end of the 2. The root moves up and coalesces with another clump whose root is higher.
round, 𝑈 ← 𝑈 ∖ 𝑥 ∪ 𝑦, 𝑧 .
• Consequence: Every message will be part of the root clump in 𝑂 log 𝑛 rounds.
Homework: How can we adapt this algorithm to ensure each node knows when to
terminate the algorithm? • Claim: When a clump becomes a root clump, it will reduce to just the root in at most 𝑅
rounds. Homework: articulate why this is true.

23 24
03-03-2020

Breadth First Search (Take 1) Breadth First Search (Take 1)


• Input under CC: each node is aware of its neighbors. Exactly one node • Algorithm:
is designated as the root node. 1. In the very first round, the root sends a “hello” message to all its children.
2. Every other node waits until it receives a “hello” message and when it
receives for the first time, it sends out a “hello” message to its neighbors
• Output under CC: each node must know its level in the BFS tree, its from which it did not hear any “hello” message.
(How would you establish parent-child relationship? Each non-root node 𝑣 picks
parent and its children. an arbitrary node 𝑢 among those that first sent “hello” messages to 𝑣. Node 𝑣
then sends 𝑢 a message saying “hi, I am 𝑣 and I am a child of yours.” )
• BFS in CC (without broadcasts) takes O(𝐷) rounds and O(𝑚)
messages, where 𝐷 is the graph diameter. Δ = Δ.
• Thus, in 𝑘-machine model, round complexity is 𝑂 + 𝐷 .
Can we do better?

25 26

Breadth First Search (Take 2)


• Algorithm:
1. In the very first round, the root sends a “hello” message to all its children.
2. Every other node waits until it receives a “hello” message and when it receives for
the first time, it sends out a “hello” message to its neighbors from which it did not
hear any “hello” message.
(How would you establish parent-child relationship? Each non-root node 𝑣 picks an
arbitrary node 𝑢 among those that first sent “hello” messages to 𝑣. Node 𝑣 then sends
𝑢 a message saying “hi, I am 𝑣 and I am a child of yours.” )
• BFS in CC (without broadcasts) takes O(𝐷) rounds and O 𝑚 𝑂 𝑛
messages broadcasts, where 𝐷 is the graph diameter. Δ = Δ.
• Thus, in 𝑘-machine model, round complexity is 𝑂 + 𝐷 O +𝐷 .

27

You might also like