AT3 202110 FinalVersion
AT3 202110 FinalVersion
Coursework Assessment
202010
Instructions to Students
• Students are required to refrain from all forms of academic dishonesty as defined and explained in HCT procedures
and directions from HCT personnel.
• A student found guilty of having committed acts of academic dishonesty may be subject to one or more of the
disciplinary measures as outlined in Article 33 of the Student and Academic Regulations.
إفادة األمانة األكاديمية
، كما هو مبيّن وموضح في السياسات واإلجراءات الخاصة بكليات التقنية العليا،• يُطلب من الطلبة االمتناع عن كافة أشكال سوء األمانة األكاديمية
.والتوجيهات الصادرة من موظفي الكليات
33 • في حالة ارتكاب الطالب أي شكل من أشكال سوء األمانة األكاديمية سوف يتعرض الى واحد أو أكثر من التدابير التأديبية على النحو المبين في المادة
.من األنظمة األكاديمية
Question (CLO) No. Group Work Report Presentation Oral Defense Total %
CIS202010CXXYYYGroupProjectV1
P a g e 1|8
Mark obtained
Project Objectives
The group project will give students exposure to work on a real big data application. The main objectives
of this project are:
To Analyze real business cases, explain how and how to implement a big data system, and apply
different big data tools.
To give students knowledge on working in the Hadoop file system.
To get experience and learn how to handle big data and work on functions available in the Hive.
To get experience and learn on mongo too to handle the big data and working on NoSQL.
To get experience and learn how to perform DCL and DML operations in Hive on the chosen big
data application.
To get experience and learn how to perform Pig Latin and data analysis using Pig tools on the
chosen application.
To get experience and learn how to handle complex data using Pig.
To give students an understanding of data processing using Pig.
Project Description
This is a group project, and you are assigned to work on the chosen business case. You are required to
work in a team of a maximum of two (2) members. The project carries 25% of your coursework marks.
This assessment is of three parts:
Report
Presentation
Follow-up questions and discussion
The team has to work and complete the task and submit the report. The report will be marked as a group.
The team will be doing a group presentation after submitting the report using suitable PowerPoint slides
or other presentation tools, followed by an individual question and answer session. The project will be
marked as a group. The presentation and the Follow-up question and discussion part will be marked
individually.
Students are expected to choose one of the applications mentioned below. You could also propose your
own business case. However, prior approval from your instructor should be requested before project
commencement.
Big Data Applications
Entertainment: Netflix and Amazon use Big Data to make shows and movie recommendations to
their users.
CIS202010CXXYYYGroupProjectV1
P a g e 2|8
Insurance: Uses Big data to predict illness, accidents and price their products accordingly.
Driverless Cars: Google's driver-less cars collect about one gigabyte of data per second. These
experiments require more and more data for their successful execution.
Education: Opting for big data powered technology as a learning tool instead of traditional
lecture methods, which enhanced the learning of students as well aided the teacher to track
their performance better.
Automobile: Rolls Royce has embraced Big Data by fitting hundreds of sensors into its engines
and propulsion systems, which record every tiny detail about their operation. The changes in
data in real-time are reported to engineers who will decide the best course of action such
as scheduling maintenance or dispatching engineering teams should the problem require it.
Government: Use of Big data application on the government department activities
Fraud detection
Big data helps in Risk analysis, management, fraud detection, abnormal trading analysis.
Advertising and marketing
Big data helps advertising agencies to understand the patterns of user behavior and then gather
information about consumers' motivation.
Project Questions
Complete the list of tasks mentioned below. Capture the screenshot of each (both the code and answer)
of all the tasks and add it to your report. Write detailed notes of each task scenario and each command
used to get the result.
Your report should have a proper title, cover page, page number, screenshots. Explanation of all the task
related to your application and the commands used to complete the tasks.
I. CL04 [Deliverable 4]
1. Write a detailed business case of the application you have selected, explaining in detail the
application's functionality. Identify the type of data that you will be handling in the
application. (3
Marks)
CIS202010CXXYYYGroupProjectV1
P a g e 3|8
b. Demonstrate loading data from text file in pig. Create atleast five relations.
(3 Marks)
IV. CL03 [Deliverable 3]
1. Investigate, analyze, and report limitations the organization will be facing while adopting the
following technologies:
a. Hadoop File System
b. Mongo DB
c. Hive Tool
d. Pig Tool
Your answer should be addressed in relation to the business case that you have selected.
(4 Marks)
V. CL04 [Deliverable 4]
2. Write a clear Hive data analysis question and display the data from different tables created
for the selected business case using appropriate select commands. Write short notes on
your findings. At least six different data analysis questions need to done.
(3
Marks)
3. Demonstrate at least 6 different hive functions using the Hive tables that are created. Write
short notes on your answers. (3 Marks)
4. Write a clear data analysis requirement for pig relation, using appropriate pig commands on
the data loaded under the CL02 section, find out the answer and display the same.
(3 Marks)
5. JOIN Commands: (6 Marks)
a) Write a business requirement from two tables and find the answer for the same
using the appropriate HIVEQL JOIN command
b) Write a business requirement from two relations and find the answer for the same
using the appropriate PIG JOIN command.
6. Complete a word count problem using pig commands on the data of the topic that is
selected by the group. (5 Marks)
CIS202010CXXYYYGroupProjectV1
P a g e 4|8
Project Deliverables
1. Deliverable 1: A complete report for question 1 under section II CL01 should include detailed
answers with appropriate screen shots needs (if needed). [CLO1] [3 Marks]
2. Deliverable 2: A complete report for the question 1 and 2 under section III CL02 should be
included in this deliverable. The answer should consist of a detailed answer with appropriate
screen shots needs (if needed). [CLO2] [15 Marks]
3. Deliverable 3: A complete report for question 1 under section IV CL03 should be included in this
deliverable. The answer should consist of detailed answers with appropriate screenshots (if
needed) should be added. [CLO3] [4 Marks]
4. Deliverable 4: A complete report for the question 1,2,3,4,5 and 6 under section I and V CL04
should be included in this deliverable. The answer should consist of detailed answers with
appropriate screenshots (if needed) should be added. [CLO4] [23 Marks]
5. Report [All CLOs] [5 Marks].
6. Oral Communication: Each student will be assessed in the form of PowerPoint presentation
skills. [All CLOs] [5 Marks]
7. Follow-up Questions and Discussion [All CLOs] [45 Marks]
a. CL01 – 1 Question (3 Marks), CL02 – 2 Questions (2 * 9 = 18 Marks) CL03 - 1 Question (2
Marks) and CL04 – 2 Questions (2 * 11 = 22 marks)
Note: In the CAP document CL01 carries 2% and CL02 carries 9%, to accommodate the project requirements
CL01 is allotted 1% and CL02 is allotted 10% in ASD document. This change will be updated in the following
semester.
CIS202010CXXYYYGroupProjectV1
P a g e 5|8
Rubric
Please note that the Project rubric should reflect the project description and be CAP-compliant. Please feel free to customize the descriptors as per the
project requirements and course level.
Group Component
Absent (F) Insufficient (1-59.49%) Emerging (60-69.49%) Satisfactory (70- Competent (77-86.49%) Mastering (87-100%)
(F) 76.49%)(C/C+)
(D/D+/C-) (B-/B/B+) (A-/A)
Not done HDFS folder structure HDFS folder structure HDFS folder structure HDFS folder structure HDFS folder structure
Deliverable
created that is not related to created with many created with error created – Few errors and created –very well fit
CLO1
the chosen big data system. errors and does not fit and does not fit well well fit with the chosen big with the chosen big
well with the chosen big with the chosen big data system. data system.
data system. data system.
CLO2 Deliverable [5%]
Not done MongoDB, Hive, and Pig MongoDB, Hive, and Pig MongoDB, Hive, and MongoDB, Hive, and Pig MongoDB, Hive task
tasks are done, and most are tasks were completed, Pig tasks were tasks were completed with and
[15%]
wrong or unrelated to the but most of the work completed with more few errors or missing parts. Pig task completed
business case. does not fit with the missing parts and fulfilling all the
selected business case errors in the tasks. requirements.
Few are not related
to the business case.
Not done The critical investigation, The critical investigation, The critical The critical investigation, Critical investigation,
[5%] Deliver CLO3
analysis is not done/partially analysis done and investigation, analysis done and reported analysis done and
correct/just attempted on reported the limitations analysis, and the limitations done with reported the limitations
one or two tools. missing one tool that the of all the tools that the
able
CIS202010CXXYYYGroupProjectV1 P a g e 6|8
Deliverable [20%]
done and poor document document with many document with few documented well the documented well the
with many errors: errors: errors: following task with few following task with no
Hive data analysis. Hive data analysis. Hive data analysis. errors: errors:
Six different hive functions Six different hive Six different hive Hive data analysis. Hive data analysis.
were demonstrated well and functions were functions were Six different hive functions Six different hive
documented well. demonstrated well and demonstrated well were demonstrated well functions demonstrated
Pig data analysis question. documented well. and documented and documented well. well and documented
Join commands of both Hive Pig data analysis well. Pig data analysis question. well.
and Pig. question. Pig data analysis Join commands of both Hive Pig data analysis
Map reduce question. Join commands of both question. and Pig. question.
Hive and Pig. Join commands of Map reduce question. Join commands of both
Map reduce question. both Hive and Pig. Hive and Pig.
Map reduce Map reduce question.
question.
[ALL CLOs] Report [5%]
Group Component
Not done
Incomplete report with Some but not all of the Most but not all All of the following: All of the following:
missing most of the following: of the following:
deliverable components. Complete report with Complete report
Too many typographical Complete report with Complete report required format and with required format
errors. required deliverables. with all required deliverables. and deliverables.
Clear table of format and Clear table of contents Clear table of
contents showing all deliverables. showing all required contents showing all
required sections. Some minor sections. required sections.
Free from formatting formatting and/or Some minor formatting Free from formatting
and typographical typographical and/or typographical and typographical
errors. errors. errors. errors.
Table of contents
presented with
some missing
information.
Some but not all of the Most but not all All of the following: All of the following:
[ALL CLOs]
audience and purpose (No Communicates Communic with a clear sense of s with a strong sense of
eye contact, no body with a clear sense of ates with a clear audience and purpose (Eye audience and purpose
language, and no poise) audience and purpose sense of audience contact, body language, and (holds attention with
Communicates (Eye contact, body and purpose (Eye poise) the use if direct eye
Communicati
on [5%]
CIS202010CXXYYYGroupProjectV1 P a g e 7|8
Individual Component
Not done
CIS202010CXXYYYGroupProjectV1 P a g e 8|8