CS491/591 - MapReduce Graph Problem
You are to write a MapReduce program on AWS using Hadoop to find for a given graph, the node with the most number of outlinks. This will require you to launch an Ubuntu instance, set up hadoop, and write the input/output to HDFS as you did for HW#3 WordCount.
Input: Each line in the input will
contain the following:
The first string is the name of the node, each remaining string on the line represents an output link to another node in the graph. The end-of-line is indicated by a dollar sign $.
n1 n2 n0 $
n5 n4 n1 $
n0 n2 n4 n5 $
n2 n0 n4 $
n4 n5 $
The name of the node with the maximum outlinks and the value of the maximum.
In the example above: n0 3
This is NOT an iterative MapReduce program.
You can assume there is only one maximum.
You are required to figure out how to do this yourself - that means without asking the TA how to do this. Use the WordCount example, find other examples on the Web, etc.
CS491 requirments: You can process all of the input in one Mapper and send to the Reducers.
CS591 requirements: Process the input before setting up the MapReduce job so that each node will be processed by one Mapper - this is more challenging.
CS591 can complete CS491 requirements for a maximum of 85/100
CS491 can complete CS591 requirements for a 115/100 points.
We will provide a more complex graph shortly for input.