0% found this document useful (0 votes)

106 views

Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm

The document describes how to perform a breadth-first graph search on a large graph using an iterative map-reduce algorithm. It explains that the algorithm works by having mappers "explode" gray nodes to output new gray nodes for each edge, and having reducers consolidate the data for each node to determine its final attributes like distance and color. An example implementation in Hadoop is provided.

Uploaded by

Minni Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views

Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm

Uploaded by

Minni Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 25

breadth-first graph

search using an iterative

map-reduce algorithm
posted june 18th, 2009 by cailin
in
• hadoop
• mapreduce
• social-graph
• tech

i've noticed two trending topics in the

tech world today: social graph
manipulation and map-reduce
algorithms. in the last blog, i gave a
quickie guide to setting up hadoop, an
open-source map-reduce
implementation and an example of how
to use hive - a sql-like database layer on
top of that. while this is one reasonable
use of map-reduce, this time we'll
explore it's more algorithmic uses, while
taking a glimpse at both of these trendy
topics!
for a great introduction to graph theory,
start here. (note though - in code
examples that follow i use the term
"node" instead of "vertex"). these days,
on of the most common uses of a graph
is the "social graph" - e.g. your network
of friends, as represented on a social
network such as linkedin or facebook.
one way to store a graph is using an
adjacency list. in an adjacency list, each
"node on the graph" (e.g. each person)
is stored with a link to a list of the
"edges emanating from that node" (e.g.
their list of friends). for example :
frank -> {mary, jill}
jill -> {frank, bob, james}
mary -> {william, joe, erin}
or, in the machine world we might
represent those people by their integer
ids and wind up with something like
0-> {1, 2}
2-> {3, 4, 5}
1-> {6, 7, 8}
single threaded breadth-first
search
one common task involving a social
graph is to use it to construct a tree,
starting from a particular node. e.g. to
construct the tree of mary's friends and
the mary's friends of friends, etc. the
simplest way to do this is to perform
what is called a breadth-first search. you
can read all about that here. this
reference also includes a pseudo-code
implementation. below follows a Java
implementation. (the Node class is a
simple bean. you can download
Node.java here and Graph.java here.)
package demo.algorithm.sort;

import java.util.*;

public class Graph {

private Map<Integer, Node> nodes;

public Graph() {
this.nodes = new
HashMap<Integer, Node>();
}

public void breadthFirstSearch(int

source) {

// Set the initial conditions

for the source node
Node snode = nodes.get(source);
snode.setColor(Node.Color.GRAY);
snode.setDistance(0);

Queue<Integer> q = new
LinkedList<Integer>();
q.add(source);

while (!q.isEmpty()) {
Node unode =
nodes.get(q.poll());

for (int v : unode.getEdges())

{
Node vnode = nodes.get(v);
if (vnode.getColor() ==
Node.Color.WHITE) {

vnode.setColor(Node.Color.GRAY);

vnode.setDistance(unode.getDistance(
) + 1);

vnode.setParent(unode.getId());
q.add(v);
}
}

unode.setColor(Node.Color.BLACK);
}

public void addNode(int id, int[]

edges) {

// A couple lines of hacky code

to transform our
// input integer arrays (which
are most comprehensible
// write out in our main method)
into List<Integer>
List<Integer> list = new
ArrayList<Integer>();
for (int edge : edges)
list.add(edge);

Node node = new Node(id);

node.setEdges(list);
nodes.put(id, node);
}

public void print() {

for (int v : nodes.keySet()) {
Node vnode = nodes.get(v);
System.out.printf("v = %2d
parent = %2d distance = %2d \n",
vnode.getId(), vnode.getParent(),
vnode.getDistance());
}
}

public static void main(String[]

args) {
Graph graph = new Graph();
graph.addNode(1, new int[] { 2,
5 });
graph.addNode(2, new int[] { 1,
5, 3, 4 });
graph.addNode(3, new int[] { 2,
4 });
graph.addNode(4, new int[] { 2,
5, 3 });
graph.addNode(5, new int[] { 4,
1, 2 });

graph.breadthFirstSearch(1);
graph.print();
}

}
parallel breadth-first search
using hadoop
okay, that's nifty and all, but what if
your graph is really really big? this
algorithm marches slowly down the tree
on level at a time. once your past the
first level there will be many many
nodes whose edges need to be
examined, and in the code above this
happens sequentially. how can we
modify this to work for a huge graph and
run the algorithm in parallel? enter
hadoop and map-reduce!
start here for a decent introduction to
graph algorithms on map-reduce. on
again though, this resource gives some
tips, but no actual code.
let's talk through how we go about this,
and actually write a little code! basically,
the idea is this - every Map iteration
"makes a mess" and every Reduce
iteration "cleans up the mess". let's say
we by representing a node in the
following text format
ID EDGES|DISTANCE_FROM_SOURCE|
COLOR|
where EDGES is a comma delimited list
of the ids of the nodes that are
connected to this node. in the beginning,
we do not know the distance and will use
Integer.MAX_VALUE for marking
"unknown". the color tells us whether or
not we've seen the node before, so this
starts off as white.
suppose we start with the following input
graph, in which we've stated that node
#1 is the source (starting point) for the
search, and as such have marked this
one special node with distance 0 and
color GRAY.
1 2,5|0|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|
WHITE|
3 2,4|Integer.MAX_VALUE|WHITE|
4 2,3,5|Integer.MAX_VALUE|
WHITE|
5 1,2,4|Integer.MAX_VALUE|
WHITE|
the mappers are responsible for
"exploding" all gray nodes - e.g. for
exploding all nodes that live at our
current depth in the tree. for each gray
node, the mappers emit a new gray
node, with distance = distance + 1. they
also then emit the input gray node, but
colored black. (once a node has been
exploded, we're done with it.) mappers
also emit all non-gray nodes, with no
change. so, the output of the first map
iteration would be
1 2,5|0|BLACK|
2 NULL|1|GRAY|
5 NULL|1|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|
WHITE|
3 2,4|Integer.MAX_VALUE|WHITE|
4 2,3,5|Integer.MAX_VALUE|
WHITE|
5 1,2,4|Integer.MAX_VALUE|
WHITE|
note that when the mappers "explode"
the gray nodes and create a new node
for each edge, they do not know what to
write for the edges of this new node - so
they leave it blank.
the reducers, of course, receive all data
for a given key - in this case it means
that they receive the data for all "copies"
of each node. for example, the reducer
that receives the data for key = 2 gets
the following list of values :
2 NULL|1|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|
WHITE|
the reducers job is to take all this data
and construct a new node using
• the non-null list of edges
• the minimum distance
• the darkest color
using this logic the output from our first
iteration will be :
1 2,5,|0|BLACK
2 1,3,4,5,|1|GRAY
3 2,4,|Integer.MAX_VALUE|WHITE
4 2,3,5,|Integer.MAX_VALUE|
WHITE
5 1,2,4,|1|GRAY
the second iteration uses this as the
input and outputs :
1 2,5,|0|BLACK
2 1,3,4,5,|1|BLACK
3 2,4,|2|GRAY
4 2,3,5,|2|GRAY
5 1,2,4,|1|BLACK
and the third iteration outputs:
1 2,5,|0|BLACK
2 1,3,4,5,|1|BLACK
3 2,4,|2|BLACK
4 2,3,5,|2|BLACK
5 1,2,4,|1|BLACK
subsequent iterations will continue to
print out the same output.
how do you know when you're done?
you are done when there are no output
nodes that are colored gray. note - if not
all nodes in your input are actually
connected to your source, you may have
final output nodes that are still white.
here's an actual implementation in
hadoop. once again, Node is little more
than a simple bean, plus some ugly
string processing nonsense. you can
download Node.java here and
GraphSearch.java here.
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.Iterator;
import java.util.List;

import
org.apache.commons.logging.Log;
import
org.apache.commons.logging.LogFactor
y;
import
org.apache.hadoop.conf.Configuration
;
import
org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.Tool;
import
org.apache.hadoop.util.ToolRunner;
/**
* This is an example Hadoop
Map/Reduce application.
*
* It inputs a map in adjacency list
format, and performs a breadth-first
search.
* The input format is
* ID EDGES|DISTANCE|COLOR
* where
* ID = the unique identifier for a
node (assumed to be an int here)
* EDGES = the list of edges
emanating from the node (e.g.
3,8,9,12)
* DISTANCE = the to be determined
distance of the node from the source
* COLOR = a simple status tracking
field to keep track of when we're
finished with a node
* It assumes that the source node
(the node from which to start the
search) has
* been marked with distance 0 and
color GRAY in the original input.
All other
* nodes will have input distance
Integer.MAX_VALUE and color WHITE.
*/
public class GraphSearch extends
Configured implements Tool {

public static final Log LOG =

LogFactory.getLog("org.apache.hadoop
.examples.GraphSearch");

/**
* Nodes that are Color.WHITE or
Color.BLACK are emitted, as is. For
every
* edge of a Color.GRAY node, we
emit a new Node with distance
incremented by
* one. The Color.GRAY node is
then colored black and is also
emitted.
*/
public static class MapClass
extends MapReduceBase implements
Mapper<LongWritable, Text,
IntWritable, Text> {
public void map(LongWritable
key, Text value,
OutputCollector<IntWritable, Text>
output,
Reporter reporter) throws
IOException {

Node node = new

Node(value.toString());

// For each GRAY node, emit

each of the edges as a new node
(also GRAY)
if (node.getColor() ==
Node.Color.GRAY) {
for (int v :
node.getEdges()) {
Node vnode = new Node(v);

vnode.setDistance(node.getDistance()
+ 1);

vnode.setColor(Node.Color.GRAY);
output.collect(new
IntWritable(vnode.getId()),
vnode.getLine());
}
// We're done with this node
now, color it BLACK

node.setColor(Node.Color.BLACK);
}

// No matter what, we emit the

input node
// If the node came into this
method GRAY, it will be output as
BLACK
output.collect(new
IntWritable(node.getId()),
node.getLine())

}
}

/**
* A reducer class that just emits
the sum of the input values.
*/
public static class Reduce extends
MapReduceBase implements
Reducer<IntWritable, Text,
IntWritable, Text> {

/**
* Make a new node which
combines all information for this
single node id.
* The new node should have
* - The full list of edges
* - The minimum distance
* - The darkest Color
*/
public void reduce(IntWritable
key, Iterator<Text> values,
OutputCollector<IntWritable,
Text> output, Reporter reporter)
throws IOException {

List<Integer> edges = null;

int distance =
Integer.MAX_VALUE;
Node.Color color =
Node.Color.WHITE;

while (values.hasNext()) {
Text value = values.next();
Node u = new Node(key.get()
+ "\t" + value.toString());

// One (and only one) copy

of the node will be the fully
expanded
// version, which includes
the edges
if (u.getEdges().size() > 0)
{
edges = u.getEdges();
}

// Save the minimum distance

if (u.getDistance() <
distance) {
distance =
u.getDistance();
}

// Save the darkest color

if (u.getColor().ordinal() >
color.ordinal()) {
color = u.getColor();
}
}

Node n = new Node(key.get());

n.setDistance(distance);
n.setEdges(edges);
n.setColor(color);
output.collect(key, new
Text(n.getLine()));

}
}

static int printUsage() {

System.out.println("graphsearch
[-m <num mappers>] [-r <num
reducers>]");

ToolRunner.printGenericCommandUsage(
System.out);
return -1;
}

private JobConf
getJobConf(String[] args) {
JobConf conf = new
JobConf(getConf(),
GraphSearch.class);
conf.setJobName("graphsearch");

// the keys are the unique

identifiers for a Node (ints in this
case).

conf.setOutputKeyClass(IntWritable.c
lass);
// the values are the string
representation of a Node

conf.setOutputValueClass(Text.class)
;

conf.setMapperClass(MapClass.class);

conf.setReducerClass(Reduce.class);

for (int i = 0; i < args.length;

++i) {
if ("-m".equals(args[i])) {

conf.setNumMapTasks(Integer.parseInt
(args[++i]));
} else if ("-
r".equals(args[i])) {

conf.setNumReduceTasks(Integer.parse
Int(args[++i]));
}
}

return conf;
}

/**
* The main driver for word count
map/reduce program. Invoke this
method to
* submit the map/reduce job.
*
* @throws IOException
* When there is
communication problems with the job
tracker.
*/
public int run(String[] args)
throws Exception {
int iterationCount = 0;

while
(keepGoing(iterationCount)) {

String input;
if (iterationCount == 0)
input = "input-graph";
else
input = "output-graph-" +
iterationCount;

String output = "output-

graph-" + (iterationCount + 1);

JobConf conf =
getJobConf(args);

FileInputFormat.setInputPaths(conf,
new Path(input));

FileOutputFormat.setOutputPath(conf,
new Path(output));
RunningJob job =
JobClient.runJob(conf);
iterationCount++;
}

return 0;
}

private boolean keepGoing(int

iterationCount) {
if(iterationCount >= 4) {
return false;
}

return true;
}

public static void main(String[]

args) throws Exception {
int res = ToolRunner.run(new
Configuration(), new GraphSearch(),
args);
System.exit(res);
}

}
the remotely astute reader may notice
that the keepGoing method is not
actually checking to see if there are any
remaining gray node, but rather just
stops after the 4th iteration. this is
because there is no easy way to
communicate this information in hadoop.
what we want to do is the following :
1. at the beginning of each iteration, set
a global flag keepGoing = false
2. as the reducer outputs each node, if
the is outputting a gray node set
keepGoing = true
unfortunately, hadoop provides no
framework for setting such a global
variable. this would need to be managed
using an external semaphore server of
some sort. this is left as an exercise for
the lucky reader. ; )

Graphs
No ratings yet
Graphs
18 pages
Chapter 7: Graphs: - Terminology - Storage Structures - Operations and Algorithms - Other Well-Known Problems
100% (2)
Chapter 7: Graphs: - Terminology - Storage Structures - Operations and Algorithms - Other Well-Known Problems
70 pages
Graph
No ratings yet
Graph
7 pages
Data Structure: Tree and Graph
No ratings yet
Data Structure: Tree and Graph
22 pages
UNIT V - Graphs
No ratings yet
UNIT V - Graphs
18 pages
Ankur Graph[1]
No ratings yet
Ankur Graph[1]
12 pages
UNIT-5-2
No ratings yet
UNIT-5-2
98 pages
Unit 5 - DS - AK2 - Graph
No ratings yet
Unit 5 - DS - AK2 - Graph
92 pages
Final Graph
No ratings yet
Final Graph
77 pages
Graph Data Structures
No ratings yet
Graph Data Structures
78 pages
DSA
No ratings yet
DSA
48 pages
Graph Algorithms 10 271709207487730
No ratings yet
Graph Algorithms 10 271709207487730
21 pages
Data Structures
No ratings yet
Data Structures
3 pages
Discussion 10_ Heaps and Graphs
No ratings yet
Discussion 10_ Heaps and Graphs
9 pages
Data structures (Graph) (1)
No ratings yet
Data structures (Graph) (1)
51 pages
DS Unit-IV
No ratings yet
DS Unit-IV
24 pages
Graphs Are A Generalization of Trees. Like Trees, Graphs Have Nodes and Edges. (The Nodes Are Sometimes
No ratings yet
Graphs Are A Generalization of Trees. Like Trees, Graphs Have Nodes and Edges. (The Nodes Are Sometimes
18 pages
BFSand DFS
No ratings yet
BFSand DFS
66 pages
GRAPHS
No ratings yet
GRAPHS
10 pages
GraphAlgorithms
No ratings yet
GraphAlgorithms
68 pages
Data Structures And Algorithm(sem-II)-1
No ratings yet
Data Structures And Algorithm(sem-II)-1
36 pages
Week 10 (Graphs and Trees)
No ratings yet
Week 10 (Graphs and Trees)
65 pages
Graph Data Structure: Breadth First Search (BFS)
No ratings yet
Graph Data Structure: Breadth First Search (BFS)
11 pages
Graphs
No ratings yet
Graphs
101 pages
Ch8 Graph
No ratings yet
Ch8 Graph
22 pages
DS Unit 5
No ratings yet
DS Unit 5
17 pages
Graph Algorithms
No ratings yet
Graph Algorithms
70 pages
BFS and DFS
No ratings yet
BFS and DFS
9 pages
Graphs 1
No ratings yet
Graphs 1
17 pages
datastructure5
No ratings yet
datastructure5
34 pages
Graph
No ratings yet
Graph
9 pages
Chapter 6 - DS
No ratings yet
Chapter 6 - DS
67 pages
06 Basic Graph Algorithms
No ratings yet
06 Basic Graph Algorithms
31 pages
BFS_AND_DFS_29
No ratings yet
BFS_AND_DFS_29
95 pages
Group 01 Oe Ee701a
No ratings yet
Group 01 Oe Ee701a
71 pages
Graph Algorithms (1)
No ratings yet
Graph Algorithms (1)
33 pages
aPDS-II-UNIT - V-RN
No ratings yet
aPDS-II-UNIT - V-RN
88 pages
07 CS316 - Algorithms - Graph 1 - Search
No ratings yet
07 CS316 - Algorithms - Graph 1 - Search
59 pages
Chapter 9 Data Structure Slide
No ratings yet
Chapter 9 Data Structure Slide
38 pages
Chapter 11 - Graphs
No ratings yet
Chapter 11 - Graphs
23 pages
Graphs: CSE225: Data Structures and Algorithms
No ratings yet
Graphs: CSE225: Data Structures and Algorithms
21 pages
Slide 2 Onenote
No ratings yet
Slide 2 Onenote
40 pages
DFS3
No ratings yet
DFS3
23 pages
Data Structures Unit-4 Notes
No ratings yet
Data Structures Unit-4 Notes
23 pages
Lecture 11 - Graphs P1 PDF
No ratings yet
Lecture 11 - Graphs P1 PDF
66 pages
Graphs Updated
No ratings yet
Graphs Updated
18 pages
Data - Structure - and - Algorithms - Lecture - 9 Graphs
No ratings yet
Data - Structure - and - Algorithms - Lecture - 9 Graphs
91 pages
Unit 4 Graph
No ratings yet
Unit 4 Graph
16 pages
DS Unit-4
No ratings yet
DS Unit-4
47 pages
Institute of Technology: Programming and Data Structures - II
No ratings yet
Institute of Technology: Programming and Data Structures - II
88 pages
08 Graph Algorithms Part1
No ratings yet
08 Graph Algorithms Part1
76 pages
DSA Easy Solutions
No ratings yet
DSA Easy Solutions
75 pages
DS Unit Iv
No ratings yet
DS Unit Iv
32 pages
Unit-5
No ratings yet
Unit-5
97 pages
Graphs: CS 302 - Data Structures Section 9.3
No ratings yet
Graphs: CS 302 - Data Structures Section 9.3
78 pages
Lecture 2 - Graph Traversal
No ratings yet
Lecture 2 - Graph Traversal
68 pages
Graph Theory
No ratings yet
Graph Theory
6 pages
Graphs: CS 308 - Data Structures
No ratings yet
Graphs: CS 308 - Data Structures
38 pages
DSA UNIT-5 notes 2023
No ratings yet
DSA UNIT-5 notes 2023
65 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fortiddos f 7.0.3 Handbook
No ratings yet
Fortiddos f 7.0.3 Handbook
648 pages
Unit 1
No ratings yet
Unit 1
12 pages
Priority Encoder
No ratings yet
Priority Encoder
11 pages
Exit Exam MODEL ANSWER
No ratings yet
Exit Exam MODEL ANSWER
29 pages
IAT 340 Exam Review PDF
No ratings yet
IAT 340 Exam Review PDF
7 pages
Design and Implementation of Frequency Modulated Transmission and Reception of Speech Signal and FPGA Based Enhancement
No ratings yet
Design and Implementation of Frequency Modulated Transmission and Reception of Speech Signal and FPGA Based Enhancement
11 pages
ELEC3300 - 04-Embedded System Structure
No ratings yet
ELEC3300 - 04-Embedded System Structure
47 pages
Sample Questions
No ratings yet
Sample Questions
12 pages
A Low-Power 130-GHz Tuned Frequency Divider in 90-nm SiGe BiCMOS
No ratings yet
A Low-Power 130-GHz Tuned Frequency Divider in 90-nm SiGe BiCMOS
4 pages
IoT Based Smart Energy Monitoring System
100% (1)
IoT Based Smart Energy Monitoring System
8 pages
Synchronized Real Time Audio Streaming Over Ethernet in Embedded Systems-33
No ratings yet
Synchronized Real Time Audio Streaming Over Ethernet in Embedded Systems-33
123 pages
RSLinx Enterprise GRG
No ratings yet
RSLinx Enterprise GRG
85 pages
Memory Management c
No ratings yet
Memory Management c
1 page
Computers Are Your Future: © 2005 Prentice-Hall, Inc
No ratings yet
Computers Are Your Future: © 2005 Prentice-Hall, Inc
25 pages
Week2 ITversity Support Document
No ratings yet
Week2 ITversity Support Document
5 pages
TR01 - Introduction PDF
No ratings yet
TR01 - Introduction PDF
25 pages
LL (1) Parsing Source Code
No ratings yet
LL (1) Parsing Source Code
12 pages
C++ Standard Library-Functions
No ratings yet
C++ Standard Library-Functions
5 pages
Keth Iralex Cabalda BES 043 (Module 15)
No ratings yet
Keth Iralex Cabalda BES 043 (Module 15)
3 pages
Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya: Department: Computer Science and Engineering
No ratings yet
Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya: Department: Computer Science and Engineering
8 pages
Mobile Computing Lecture
No ratings yet
Mobile Computing Lecture
17 pages
Chapter Three Basic C++: Mizan-Teppi University P.By Asres T
No ratings yet
Chapter Three Basic C++: Mizan-Teppi University P.By Asres T
28 pages
Palo Alto Hands On Workshop
No ratings yet
Palo Alto Hands On Workshop
108 pages
Iot - Chapter 5
No ratings yet
Iot - Chapter 5
11 pages
Erp & CRM
No ratings yet
Erp & CRM
26 pages
Autosar SWS Rte
No ratings yet
Autosar SWS Rte
1,132 pages
S V Engineering College: Lecture Notes
No ratings yet
S V Engineering College: Lecture Notes
160 pages
Software engineering for embedded systems
No ratings yet
Software engineering for embedded systems
82 pages
ITC366 - Course Outline
No ratings yet
ITC366 - Course Outline
3 pages
Wireless Gateway Datasheet
No ratings yet
Wireless Gateway Datasheet
4 pages

Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm

Uploaded by

Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm

Uploaded by

breadth-first graph

search using an iterative

i've noticed two trending topics in the

public class Graph {

public void breadthFirstSearch(int

// Set the initial conditions

for (int v : unode.getEdges())

public void addNode(int id, int[]

// A couple lines of hacky code

Node node = new Node(id);

public void print() {

public static void main(String[]

public static final Log LOG =

Node node = new

// For each GRAY node, emit

// No matter what, we emit the

List<Integer> edges = null;

// One (and only one) copy

// Save the minimum distance

// Save the darkest color

Node n = new Node(key.get());

static int printUsage() {

// the keys are the unique

for (int i = 0; i < args.length;

String output = "output-

private boolean keepGoing(int

public static void main(String[]

You might also like