CS491/591 - MapReduce Graph Problem
Assignment #4

Spring 2017

 

Individual
Due Date - Wed., Mar. 29 between 12 noon and 11:59 pm send your IP address to 
psheinidashtegol@crimson.ua.edu and attach your code.

You are to write a MapReduce program on AWS using Hadoop to find for a given graph, the node with the most number of outlinks.  This will require you to launch an Ubuntu instance, set up hadoop, and write the input/output to HDFS as you did for HW#3 WordCount.

Input:  Each line in the input will contain the following:
    The first string is the name of the node, each remaining string on the line represents an output link to another node in the graph.  The end-of-line is indicated by a dollar sign $.

Sample input:
n1 n2 n0 $
n5 n4 n1 $ 
n0 n2 n4 n5 $
n2 n0 n4 $
n4 n5 $

Output: The name of the node with the maximum outlinks and the value of the maximum.
    In the example above:  n0 3

NOTE: This is NOT an iterative MapReduce program.  
You can assume there is only one maximum.
You are required to figure out how to do this yourself - that means without asking the TA how to do this.  Use the WordCount example, find other examples on the Web, etc.  

CS491 requirments: You can process all of the input in one Mapper and send to the Reducers.

CS591 requirements: Process the input before setting up the MapReduce job so that each node will be processed by one Mapper - this is more challenging.

CS591 can complete CS491 requirements for a maximum of 85/100 points
CS491 can complete CS591 requirements for a 115/100 points.

We will provide a more complex graph shortly for input.