ECC Assignment11
ECC Assignment11
Each line is a record of visit, which consists of IP Address, Time, Type of HTTP Request,
Requested File, HTTP Version and Status, etc.
Example Programs
We have provided two examples that related to this lab.
• logstat: It counts the number of visits for each IP address in the log file.
• logstat2: It counts the number of visits for each IP address in the same hour.
As discussed in the lectures, MapReduce programming framework seperates the data and op-
eration (two stages). It uses Hadoop Stream, which represents by sys.stdin in Python and
Writable, Text in Java.
In Map phase, we have to process the raw files and extract the related information, line, IP.
In the Reduce phase, we start counting the records based on the same IP addresses. After that,
we can sort the result and print it out. As Fig. 5 and 6 present, the Map Phase for logstat2
is different than the previous version since we need to consider the time. Since we have pro-
cessed data at Map Phase, the intermediate data of Map is already at the granularity of a hour.
Therefore, the Reduce Phase is the same as logstat.
1
ENGR 516, Fall 2022 Engineering Cloud Computing
2
ENGR 516, Fall 2022 Engineering Cloud Computing
3
ENGR 516, Fall 2022 Engineering Cloud Computing
Grading Rubric
(80%) Part 1;
(15%) Part 2;
(5%) Report about the your design and experiments, please include screenshots for run-
ning your code on the cloud;
Submission
You upload your submission in Canvas by the end of Nov. 4.