Job
Job
Job
Learning Objective
From our class discussions you are familiar with the power of the MapReduce paradigm. It started off as part of the
secret sauce that differentiated Google from other search engines of that time. In this assignment you will express
the solution to a three problems using the MapReduce paradigm and the MRJob library. As we discussed in class, a
MapReduce programmer needs to write a:
1. Mapper, which will transform a (key, value) pair into a different (key, value) pair and a
2. Reducer, which will collect all values emitted by a mapper with the same key and transform the incoming
(key, [values]) to yet another (key, value) pair.
As we’ve discussed in class, the MapReduce infrastructure will align the emitted keys of a mapper with a reducer.
This is an individual assignment. You are welcome to discuss high level conceptual ideas but the final code needs to be
yours. Submissions will be checked for authenticity.
Assignment
The assignment is in similar vein to the exercises we did in class. You will want to start with the template code
we’ve used in class.
1. Solving Jumble
An example of a jumble word puzzle is given below. Given
jumbled words such as VELGA, PLUIT, SICCUR, IMPAGE your
program, jumble.py will print a list of ‘unjumbled’ words. To
generate the anagrams you will use the official scrabble dictionary
of words “sowpods”1. If interested check the Wikipedia entry for
more details on sowpods. Be sure to delete the first two lines of
the file you download so that only words are in the list. You will
execute you program as below:
% python jumble.py jumble.txt sowpods.txt –q
In jumble.txt scrambled words will be given one per line and tagged
with a question mark, e.g.,
velga ?
pluit ?
1 https://www.wordgamedictionary.com/sowpods/download/sowpods.txt
A sample jumble.txt has been provide with several jumbled words. You output should give the solution (the
unscrambled word) in some manner. For example, my output is along the lines of:
2 ["?velga", "gavel"]
3 ["?mursee", "emures", “resume”]
I’m using question marks in the output too (why?). The numbers at the beginning of each line are keys
representing the length of the values emitted by the reducer (1 more than the number of anagrams). They don’t
play a role in the solution. You are welcome to use alternate approaches as long as the unscrambled words are
clear.
Hint: Note that words that are anagrams of each other all have the same alphabetized letter ordering. For example,
spot, pots, tops are anagrams and they all have the same letter ordering of opst. So in your jumble.py program you
will write a mapper that yields the alphabetized version of a word and the original word. A reducer will then
receive all the words that are anagrams of each other with the alphabetized ordering as the key. Note that the in
the file jumble.txt we have marked each jumbled word with a ‘?’ to differentiate it from words coming from the word
list (sowpods.txt).
2. Eulerian Paths
One of the early, and still well known problem in graph theory, is determining an Euler path. The initial
incarnation of this problem was studied by the renown mathematician Leonard Euler with the famous Seven
Bridges of Königsberg problem. Following is a diagram from the Wikipedia article on this problem2. On the left
you have a map of the seven bridges in the city of Königsberg. In the middle an abstract sketch and at the right
a graph. The people of Königsberg pondered the question of whether they could walk across all seven bridges
without crossing a bridge twice.
0
2
1
Euler proved that for a graph to have an Eulerian path all vertices of the graph need to have an even number of
edges incident on it (i.e., the degree of all vertices is even). Write a MRJob program, which when given a graph will
output just True or False based on whether it has an Eulerian path or note. Note that you will need two reducers
for your solution. Take a look at the documentation for mrjob3 to determine how to use two reducers (using
MRStep). I’ve provided sample output of my program in the assignment folder. As I’ve shown, do produce a trace
execution of your program so that it prints the degree of each vertex.
Create a graph called konigsberg.txt using the vertex numbering given above and run your program on it. My output
is also included in the assignment folder.
2 https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
3 https://pythonhosted.org/mrjob/guides/writing-mrjobs.html#defining-steps
3. Finding mutual friends
A : B C D
MapReduce is a useful paradigm for solving problems using graphs. Finding people who have B : A C D E
mutual connections is a feature supported in social network systems. Suppose we have five C : A B D E
people A, B C, D, and E. The friends of each person are listed in the given ‘:’ separated format (A D : A B C E
is friends with B, C, and D; D is friends with A, B, C, and E etc.). E : B C D
Output of the mapper has been color coded to illustrate the transfer of data from the mappers to the reducer. For
example the mapper output tagged in red (with the key ['A', 'B']) goes to the same reducer. The reducer then
takes the set intersection of ['B', 'C', 'D'] and ['A', 'C', 'D', 'E'] to produce the final answer of ["C", "D"]
Note: you do not need to produce trace output from the mapper. Just the final result from the reducer will suffice.
What to Submit
A zipped file, e.g., a5-mrjob.zip, with three MRJob Python files: jumble.py, euler_path.py, common_friends.py, and one
text file konigsberg.txt. Do not submit the graphs nor the sowpods.txt files.
Grading Rubric
+/- grades will be assigned for overall good development practices (well organized code; comments etc.)