0% found this document useful (0 votes)
103 views8 pages

AT3 202110 FinalVersion

This document provides instructions and assessment criteria for a group project on big data technologies. Students are assigned to a group to work on a real-world big data application. They must submit a report, do a presentation, and participate in a question/answer session. The report involves analyzing the chosen application, setting up related data structures in HDFS, MongoDB, Hive, and Pig. Students must also investigate limitations of these technologies and perform data analysis queries in Hive. The project aims to provide hands-on experience with key big data tools and addressing a real business problem through data.

Uploaded by

Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views8 pages

AT3 202110 FinalVersion

This document provides instructions and assessment criteria for a group project on big data technologies. Students are assigned to a group to work on a real-world big data application. They must submit a report, do a presentation, and participate in a question/answer session. The report involves analyzing the chosen application, setting up related data structures in HDFS, MongoDB, Hive, and Pig. Students must also investigate limitations of these technologies and perform data analysis queries in Hive. The project aims to provide hands-on experience with key big data tools and addressing a real business problem through data.

Uploaded by

Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Computer and Information System

Coursework Assessment
202010

Course CIB 3123 Big Data Technology

Assessment Method Group Project

Date of Assessment Week 10 Duration / Last day of Week 14 2021


Deadline(s)

Maximum Mark 100 Percentage of Final 25%


Grade

Instructions to Students

1. This assessment has 10 pages including the cover page.


2. Students are allowed to use Cloudera VM
3. Each student is required to submit the project deliverables individually onto Blackboard.

Academic Honesty Statement

In accordance with HCT policy LP201- Academic Honesty

• Students are required to refrain from all forms of academic dishonesty as defined and explained in HCT procedures
and directions from HCT personnel.

• A student found guilty of having committed acts of academic dishonesty may be subject to one or more of the
disciplinary measures as outlined in Article 33 of the Student and Academic Regulations.
‫إفادة األمانة األكاديمية‬

‫ األمانة األكاديمية‬LP201 - ‫وف ًقا لسياسة كليات التقنية العليا‬

،‫ كما هو مبيّن وموضح في السياسات واإلجراءات الخاصة بكليات التقنية العليا‬،‫• يُطلب من الطلبة االمتناع عن كافة أشكال سوء األمانة األكاديمية‬
.‫والتوجيهات الصادرة من موظفي الكليات‬

33 ‫• في حالة ارتكاب الطالب أي شكل من أشكال سوء األمانة األكاديمية سوف يتعرض الى واحد أو أكثر من التدابير التأديبية على النحو المبين في المادة‬
.‫من األنظمة األكاديمية‬

Student HCT ID Student Name

Question (CLO) No. Group Work Report Presentation Oral Defense Total %

Marks Allocated 45 5 5 45 100 25

CIS202010CXXYYYGroupProjectV1

P a g e 1|8
Mark obtained

Project Objectives
The group project will give students exposure to work on a real big data application. The main objectives
of this project are:

 To Analyze real business cases, explain how and how to implement a big data system, and apply
different big data tools.
 To give students knowledge on working in the Hadoop file system.
 To get experience and learn how to handle big data and work on functions available in the Hive.
 To get experience and learn on mongo too to handle the big data and working on NoSQL.
 To get experience and learn how to perform DCL and DML operations in Hive on the chosen big
data application.
 To get experience and learn how to perform Pig Latin and data analysis using Pig tools on the
chosen application.
 To get experience and learn how to handle complex data using Pig.
 To give students an understanding of data processing using Pig.

Project Description
This is a group project, and you are assigned to work on the chosen business case. You are required to
work in a team of a maximum of two (2) members. The project carries 25% of your coursework marks.
This assessment is of three parts:
 Report
 Presentation
 Follow-up questions and discussion

The team has to work and complete the task and submit the report. The report will be marked as a group.
The team will be doing a group presentation after submitting the report using suitable PowerPoint slides
or other presentation tools, followed by an individual question and answer session. The project will be
marked as a group. The presentation and the Follow-up question and discussion part will be marked
individually.

Topics Covered in this assessment


 Hadoop/HDFS
 Mongo Tool
 Data Analysis with Hive
 Managing Data with Hive
 Pig Latin
 Pig Tools
 Pig for Analysis

Students are expected to choose one of the applications mentioned below. You could also propose your
own business case. However, prior approval from your instructor should be requested before project
commencement.

Big Data Applications
 Entertainment: Netflix and Amazon use Big Data to make shows and movie recommendations to
their users.
CIS202010CXXYYYGroupProjectV1

P a g e 2|8
 Insurance: Uses Big data to predict illness, accidents and price their products accordingly.
 Driverless Cars: Google's driver-less cars collect about one gigabyte of data per second. These
experiments require more and more data for their successful execution.
 Education: Opting for big data powered technology as a learning tool instead of traditional
lecture methods, which enhanced the learning of students as well aided the teacher to track
their performance better.
 Automobile: Rolls Royce has embraced Big Data by fitting hundreds of sensors into its engines
and propulsion systems, which record every tiny detail about their operation. The changes in
data in real-time are reported to engineers who will decide the best course of action such
as scheduling maintenance or dispatching engineering teams should the problem require it.
 Government: Use of Big data application on the government department activities
 Fraud detection
Big data helps in Risk analysis, management, fraud detection, abnormal trading analysis.
 Advertising and marketing
Big data helps advertising agencies to understand the patterns of user behavior and then gather
information about consumers' motivation.

Project Questions
Complete the list of tasks mentioned below. Capture the screenshot of each (both the code and answer)
of all the tasks and add it to your report. Write detailed notes of each task scenario and each command
used to get the result.

Your report should have a proper title, cover page, page number, screenshots. Explanation of all the task
related to your application and the commands used to complete the tasks.

The task to be completed

I. CL04 [Deliverable 4]
1. Write a detailed business case of the application you have selected, explaining in detail the
application's functionality. Identify the type of data that you will be handling in the
application. (3
Marks)

II. CL01 [Deliverable 1]


1. Create HDFS folder structure to save the data in the HDFS file system. The folder structure
created should cover all the functions in detail. (3
Marks)

III. CL02 [Deliverable 2]


1. Create a mongo database for the selected application. Add three relevant collections and
add at least five documents in each collection. Demonstrate five find with complex
conditions and two update commands.
(9 Marks)
2. Complete the following two questions for the selected business case:
a. Demonstrate creating hive database and hive tables from the text data. You are
expected to use both commas separated values and tab separated values in the text
file. It is expected to create at least five different tables with fields that link the
tables. (3 Marks)

CIS202010CXXYYYGroupProjectV1

P a g e 3|8
b. Demonstrate loading data from text file in pig. Create atleast five relations.
(3 Marks)
IV. CL03 [Deliverable 3]
1. Investigate, analyze, and report limitations the organization will be facing while adopting the
following technologies:
a. Hadoop File System
b. Mongo DB
c. Hive Tool
d. Pig Tool
Your answer should be addressed in relation to the business case that you have selected.
(4 Marks)
V. CL04 [Deliverable 4]
2. Write a clear Hive data analysis question and display the data from different tables created
for the selected business case using appropriate select commands. Write short notes on
your findings. At least six different data analysis questions need to done.
(3
Marks)
3. Demonstrate at least 6 different hive functions using the Hive tables that are created. Write
short notes on your answers. (3 Marks)
4. Write a clear data analysis requirement for pig relation, using appropriate pig commands on
the data loaded under the CL02 section, find out the answer and display the same.
(3 Marks)
5. JOIN Commands: (6 Marks)
a) Write a business requirement from two tables and find the answer for the same
using the appropriate HIVEQL JOIN command
b) Write a business requirement from two relations and find the answer for the same
using the appropriate PIG JOIN command.

6. Complete a word count problem using pig commands on the data of the topic that is
selected by the group. (5 Marks)

CIS202010CXXYYYGroupProjectV1

P a g e 4|8
Project Deliverables

1. Deliverable 1: A complete report for question 1 under section II CL01 should include detailed
answers with appropriate screen shots needs (if needed). [CLO1] [3 Marks]
2. Deliverable 2: A complete report for the question 1 and 2 under section III CL02 should be
included in this deliverable. The answer should consist of a detailed answer with appropriate
screen shots needs (if needed). [CLO2] [15 Marks]
3. Deliverable 3: A complete report for question 1 under section IV CL03 should be included in this
deliverable. The answer should consist of detailed answers with appropriate screenshots (if
needed) should be added. [CLO3] [4 Marks]
4. Deliverable 4: A complete report for the question 1,2,3,4,5 and 6 under section I and V CL04
should be included in this deliverable. The answer should consist of detailed answers with
appropriate screenshots (if needed) should be added. [CLO4] [23 Marks]
5. Report [All CLOs] [5 Marks].
6. Oral Communication: Each student will be assessed in the form of PowerPoint presentation
skills. [All CLOs] [5 Marks]
7. Follow-up Questions and Discussion [All CLOs] [45 Marks]
a. CL01 – 1 Question (3 Marks), CL02 – 2 Questions (2 * 9 = 18 Marks) CL03 - 1 Question (2
Marks) and CL04 – 2 Questions (2 * 11 = 22 marks)

Note: In the CAP document CL01 carries 2% and CL02 carries 9%, to accommodate the project requirements
CL01 is allotted 1% and CL02 is allotted 10% in ASD document. This change will be updated in the following
semester.

CIS202010CXXYYYGroupProjectV1

P a g e 5|8
Rubric
Please note that the Project rubric should reflect the project description and be CAP-compliant. Please feel free to customize the descriptors as per the
project requirements and course level.
Group Component

Absent (F) Insufficient (1-59.49%) Emerging (60-69.49%) Satisfactory (70- Competent (77-86.49%) Mastering (87-100%)
(F) 76.49%)(C/C+)
(D/D+/C-) (B-/B/B+) (A-/A)

Not done HDFS folder structure HDFS folder structure HDFS folder structure HDFS folder structure HDFS folder structure
Deliverable

created that is not related to created with many created with error created – Few errors and created –very well fit
CLO1

the chosen big data system. errors and does not fit and does not fit well well fit with the chosen big with the chosen big
well with the chosen big with the chosen big data system. data system.
data system. data system.
CLO2 Deliverable [5%]

Not done MongoDB, Hive, and Pig MongoDB, Hive, and Pig MongoDB, Hive, and MongoDB, Hive, and Pig MongoDB, Hive task
tasks are done, and most are tasks were completed, Pig tasks were tasks were completed with and
[15%]

wrong or unrelated to the but most of the work completed with more few errors or missing parts. Pig task completed
business case. does not fit with the missing parts and fulfilling all the
selected business case errors in the tasks. requirements.
Few are not related
to the business case.
Not done The critical investigation, The critical investigation, The critical The critical investigation, Critical investigation,
[5%] Deliver CLO3

analysis is not done/partially analysis done and investigation, analysis done and reported analysis done and
correct/just attempted on reported the limitations analysis, and the limitations done with reported the limitations
one or two tools. missing one tool that the of all the tools that the
able

done with missing of reported the


organization will be facing organization will be
three tools that the limitations done with while adopting the following facing while adopting
organization will be missing two tools technologies. the following
facing while adopting that the organization technologies.
the following will face while
technologies. adopting the
following
technologies.
Not done Incomplete, some tasks not Completed and poor Completed and poor Completed and Completed and
O4
CL

CIS202010CXXYYYGroupProjectV1 P a g e 6|8
Deliverable [20%]
done and poor document document with many document with few documented well the documented well the
with many errors: errors: errors: following task with few following task with no
Hive data analysis. Hive data analysis. Hive data analysis. errors: errors:
Six different hive functions Six different hive Six different hive Hive data analysis. Hive data analysis.
were demonstrated well and functions were functions were Six different hive functions Six different hive
documented well. demonstrated well and demonstrated well were demonstrated well functions demonstrated
Pig data analysis question. documented well. and documented and documented well. well and documented
Join commands of both Hive Pig data analysis well. Pig data analysis question. well.
and Pig. question. Pig data analysis Join commands of both Hive Pig data analysis
Map reduce question. Join commands of both question. and Pig. question.
Hive and Pig. Join commands of Map reduce question. Join commands of both
Map reduce question. both Hive and Pig. Hive and Pig.
Map reduce Map reduce question.
question.
[ALL CLOs] Report [5%]
Group Component

Not done
 Incomplete report with Some but not all of the Most but not all All of the following: All of the following:
missing most of the following: of the following: 
deliverable components.  Complete report with  Complete report
 Too many typographical  Complete report with  Complete report required format and with required format
errors.  required deliverables. with all required deliverables. and deliverables.
 Clear table of format and  Clear table of contents  Clear table of
contents showing all deliverables. showing all required contents showing all
required sections.  Some minor sections. required sections.
 Free from formatting formatting and/or  Some minor formatting  Free from formatting
and typographical typographical and/or typographical and typographical
errors. errors. errors. errors.
 Table of contents
presented with
some missing
information.

 Some but not all of the Most but not all All of the following:   All of the following:  
[ALL CLOs]

Not done Communicates


with a limited sense of following:   of the following:    Communicates  Communicate
Oral

audience and purpose (No  Communicates  Communic with a clear sense of s with a strong sense of
eye contact, no body with a clear sense of ates with a clear audience and purpose (Eye audience and purpose
language, and no poise)  audience and purpose sense of audience contact, body language, and (holds attention with
 Communicates (Eye contact, body and purpose (Eye poise)  the use if direct eye
Communicati
on [5%]

CIS202010CXXYYYGroupProjectV1 P a g e 7|8
Individual Component
Not done     

[ALL CLOs] Follow-up questions


Unable to answer Able to answer Able to answer Able to answer all Able to answer all
questions from the some most but not all questions from the questions and

and discussion [45%] [


examining board but not all questions from examining board, not demonstrate a
questions the examining complete
able to demonstrate a
from the examining board
complete understanding of
board understanding the study

CIS202010CXXYYYGroupProjectV1 P a g e 8|8

You might also like