0% found this document useful (0 votes)

45 views8 pages

ITECH2302 MainAssessment Report

The document outlines an assessment task for a Big Data Management course. Students are required to: [1] Choose a dataset from a public repository and analyze the data using tools like Jupyter notebooks and Tableau. [2] Write a report recommending big data management strategies based on their analysis. The report must include descriptions of the data, preprocessing steps, analytics results from tools like GroupBy and data mining algorithms, and considerations for managing large-scale "big data". Students will be graded based on technical analysis tasks as well as the quality and structure of their written report.

Uploaded by

sedobi1512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views8 pages

ITECH2302 MainAssessment Report

Uploaded by

sedobi1512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ITECH2302 Big Data Management

ITECH2302: Assessment Task 2:

Big Data Management Report (40%)
Semester 1, 2022
Due date: Week 10

Purpose:
The assignment helps you grasp the fundamental concepts of big data management, related
knowledge and the techniques, and practical software and tools which are required for
developing big data projects.
Requirements: You are required to identify a suitable dataset, provide an analysis of the data,
and recommend suitable Big Data Management strategies. This will be written up as a
professional report.

Learning outcomes assessed:

K1, K2, K3, S1, S1, S2, A1, A2

Details
You will use the analytical tools taught on this course (including Jupyter notebooks, pySpark,
Tableau) to explore, analyse and visualise a dataset of your choosing. An important part of
this work is preparing a good quality report, which details your choices, analysis, and
recommendations/conclusions. Also, that it is of an appropriate style.
The dataset should be chosen from the following repository:

UC Irvine Machine Learning Repository

https://archive.ics.uci.edu/ml/index.php
The aim is to use the dataset allocated to provide interesting insights, trends and patterns
amongst the data. Your intended audience is the CEO and middle management of the
Company for whom you are employed, and who have tasked you with this analysis.

Tasks

Data choice. Choose any dataset from the repository that has at least five attributes, and for
which the default task is classification. Transform this dataset into an appropriate one to load
into your chosen analytics software.

Background information. Write a description of the dataset and project. Provide an overview
of what the dataset is about, including from where and how it has been gathered, and for what
purpose.

Data description. Describe how many instances does the dataset contain, how many
attributes there are in the dataset, their names, and include which is the class attribute.
Include in your description details of any missing values, and any other relevant
characteristics. Use appropriate pandas functions to initially analyse the data, for instance
descriptive statistics of each attribute, including description of the range of possible values of
the attributes, and visualise these in a graphical format.

Initial analysis. You will need to make decisions about which features to include in your
dataframe, and how to deal with missing values (if they exist). You might need preprocess the
dataset attributes. Useful techniques will include remove certain attributes, exploring different
ways of discretizing continuous attributes and replacing missing values. Discretizing is the
conversion of numeric attributes into "nominal" ones by binning numeric values into intervals.
If you replaced missing values explain what strategy you used to select a replacement of the
missing values.

GroupBy analysis. Implement various aggregate functions that will provide interesting
insights into the data. Use the GroupBy function in pandas to analyse the data.

Data visualisation. Choose any data visualisation techniques that will provide helpful insights
into the data. This could include plotting chosen variables against each other, and displaying
them in a linechart, or binning them and using a (stacked) histogram etc. Use whichever you
prefer from either matplotlib (matplotlib.pyplot.hist), pandas (pandas.DataFrame.plot), seaborn
(seaborn.histplot) and/or Tableau.

Data mining. Compare and contrast at least two different data mining algorithms on your
data, for instance: SVN, neural networks, k-nearest neighbour, Apriori association rules,
decision tree induction etc. For each experiment you run, describe the data you used for the
experiments, that is, did you use the entire dataset of just a subset of it. You must include
screenshots and results from the techniques you employ.

Discussion of findings. Explain your results and include the usefulness of the approaches
for the purpose of the analysis. Include any assumptions that you may have made about the
analysis. In this discussion you should explain what each algorithm provides to the overall
analysis task. Summarize your main findings.

Big Data Management. The data you have used will have been very small in comparison with
what might be considered “big data” in this course. In this section you are to draw conclusions
about how the acquisition, storage, and subsequent analysis of the data would be different if
this was truly a “big data” dataset. You are to make reference to the concepts learned about
the “V’s” of big data (velocity, volume.. etc), data warehouses, OLAP, business intelligence,
HADOOP/Spark and so on. Explain how this dataset might have links to data that could be
considered be too difficult or very complex to implement in a traditional SQL database, and
traditional statistical analysis, and would therefore require Big Data storage and Big Data
Analytics.

Report writing. Present your work in the form of a big data management report.
Submission
The assignment is to be submitted via the Assignment submission box in Moodle. This can be found in
the Assessments section of the course Moodle shell. Your report file will be submitted as either a MS
word file or a PDF. If you are using MacOS, please submit as a PDF.

Your report will include the following in the order provided below:

A cover page with your name and student ID

Table of Contents
Table of Figures / Tables
Data choice
Background information
Data description
Data preprocessing
Data analytics and discussion of results
Big Data Management considerations
References
Appendices

Separately you are to upload your analytics files (e.g. Jupyter notebooks [ipynb], python files
[py] etc).

Your references should use the APA referencing style; information is available here:
https://federation.edu.au/library/student-resources/help-with-referencing
https://federation.edu.au/library/student-resources/fedcite

Identify all sources of information used. You are reminded to read the “Plagiarism” section of
the course description.

A passing grade will be awarded to assignments adequately addressing all assessment

criteria. Higher grades require better quality and more effort. For example, a minimum is set
on the wider reading required. A student reading vastly more than this minimum will be better
prepared to discuss the issues in depth and consequently their report is likely to be of a higher
quality. So before submitting, please read through the assessment criteria very carefully.
Feedback

Feedback and marks will be provided in Moodle. Marks will also be available in FDL Marks.

Plagiarism
Plagiarism is the presentation of the expressed thought or work of another person as though it is one's own without
properly acknowledging that person. You must not allow other students to copy your work and must take care to
safeguard against this happening. More information about the plagiarism policy and procedure for the university can
be found at http://federation.edu.au/students/learning-and-study/online-help-with/plagiarism
Please refer to the Course Description for information regarding late assignments, extensions, and special
consideration. A reminder all academic regulations can be accessed via the university’s website, see:
http://federation.edu.au/staff/governance/legal/feduni-legislation

Marking Criteria/Rubric
Tasks Marks Awarded Comments
1 - Data choice 5
i. Data correctly
transformed into a
format that can be
loaded into analytics
software.

2 - Background information 2x5

i. Description of the dataset = 10
including what the dataset is
about, including from where
and how it has been
gathered, and for what
purpose
ii. Description of project, and
its importance for the
organization.

3 - Data description 5 + 10
= 15
i. General details of
dataset
ii. Detailed description of
five attributes

4 - Data preprocessing 2x5

• Implementation of = 10
preprocessing and data
transformation techniques.

5. GroupBy analysis 10
• Use of pandas to analyse
the data

6. Data visualisation 10
• Use of visualisation techniques to
investigate the data
5 – Data mining 2x5
• Two different data mining = 10
algorithms used
• Description of techniques with
screenshots and discussion of
results

6 – Considerations of Big Data 25

Management
• Explanation of big data
characteristics of this data
• Management considerations,
including both Storage and
• Analytics

At least two references from peer

reviewed sources.

7 - Presentation of report 5
• Report is well-written and
presented professionally,
containing all required sections.

Total Marks 100

Total Marks out of 40 40%

Unit 10 Big Data and Business Analytics Assignment 2 LAB LAC
No ratings yet
Unit 10 Big Data and Business Analytics Assignment 2 LAB LAC
5 pages
Foundation of Data Science
100% (1)
Foundation of Data Science
201 pages
Edt 2021 - 7buis010w - CW2
No ratings yet
Edt 2021 - 7buis010w - CW2
5 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Assignment 2
No ratings yet
Assignment 2
6 pages
IDA-Group Assignment Question
No ratings yet
IDA-Group Assignment Question
6 pages
Data Analysis PHASE
No ratings yet
Data Analysis PHASE
14 pages
1 - cn7022 18 19 CRWK
No ratings yet
1 - cn7022 18 19 CRWK
7 pages
Final Project
No ratings yet
Final Project
4 pages
Group Assignment 01
No ratings yet
Group Assignment 01
3 pages
PCED - Lösung en
No ratings yet
PCED - Lösung en
24 pages
BDA Lab 9 Manual
No ratings yet
BDA Lab 9 Manual
3 pages
PWD Midterm
No ratings yet
PWD Midterm
4 pages
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
No ratings yet
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
6 pages
TDTU 2023-Aug Applied Big Data Final Project
No ratings yet
TDTU 2023-Aug Applied Big Data Final Project
2 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Project Handbook
No ratings yet
Project Handbook
4 pages
COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
No ratings yet
COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
45 pages
Task 1 Data Science With Documentation
No ratings yet
Task 1 Data Science With Documentation
11 pages
IMC501 - Individual Assignment Guidelines
No ratings yet
IMC501 - Individual Assignment Guidelines
2 pages
Bus5wb 2024S2 A3
No ratings yet
Bus5wb 2024S2 A3
3 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
Introduction To Big Data Ecosystems: Description
No ratings yet
Introduction To Big Data Ecosystems: Description
4 pages
Big Data Analysis
No ratings yet
Big Data Analysis
33 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
CSCI946 Assignment - 1 - Task - Sheet
No ratings yet
CSCI946 Assignment - 1 - Task - Sheet
4 pages
Assessment Brief
No ratings yet
Assessment Brief
6 pages
Analytic Group Assignment
No ratings yet
Analytic Group Assignment
6 pages
Phase 2
No ratings yet
Phase 2
14 pages
Assignment Week 2 BDA
No ratings yet
Assignment Week 2 BDA
4 pages
Kavin
No ratings yet
Kavin
13 pages
Lecture 2 - The Dataset Presentation
No ratings yet
Lecture 2 - The Dataset Presentation
35 pages
Course Outline DPA
No ratings yet
Course Outline DPA
5 pages
Ai203h DSC-1
No ratings yet
Ai203h DSC-1
10 pages
BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes
No ratings yet
BTech 5 CSE Data Analytics With Python Unit 2 and 3 Notes
36 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Data Analytics
No ratings yet
Data Analytics
4 pages
Data Preparation Basics#
No ratings yet
Data Preparation Basics#
2 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
Data Analytics Course Outline
No ratings yet
Data Analytics Course Outline
5 pages
Project Guidelines (ISE-291 - T 241)
No ratings yet
Project Guidelines (ISE-291 - T 241)
3 pages
3 - Assignment Question - Updated
No ratings yet
3 - Assignment Question - Updated
6 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
(A1) BDAI - Sem1 2023-2024
No ratings yet
(A1) BDAI - Sem1 2023-2024
5 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
15 pages
Mid Term Project
No ratings yet
Mid Term Project
3 pages
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
BCom (BusinessAnalytics) I Sem Pratical - QBank
No ratings yet
BCom (BusinessAnalytics) I Sem Pratical - QBank
3 pages
Data Analytics Lesson Plan
No ratings yet
Data Analytics Lesson Plan
11 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
In Semester (Individual) Assignment
No ratings yet
In Semester (Individual) Assignment
12 pages
Revised 16 Oct - Project-Guidelines - BA-2020-21 PDF
No ratings yet
Revised 16 Oct - Project-Guidelines - BA-2020-21 PDF
2 pages
Group Assignment
No ratings yet
Group Assignment
4 pages
Basic Data Analysis
No ratings yet
Basic Data Analysis
16 pages
1.0 - Laying The Foundation of Business Analytics
No ratings yet
1.0 - Laying The Foundation of Business Analytics
33 pages
Bigdata
No ratings yet
Bigdata
54 pages
COM7036M BigData Assessment Brief2023-2024
No ratings yet
COM7036M BigData Assessment Brief2023-2024
8 pages
Steps in The Implementation of Data Analysis
No ratings yet
Steps in The Implementation of Data Analysis
2 pages
Chap.3 Data Preprocessing
No ratings yet
Chap.3 Data Preprocessing
6 pages
Data Analytics Applications Syllabus
No ratings yet
Data Analytics Applications Syllabus
6 pages
PCE-TPEG (1) B C New
No ratings yet
PCE-TPEG (1) B C New
2 pages
Major Project (Finalproject)
No ratings yet
Major Project (Finalproject)
19 pages
Cadmo Thesis Template
100% (2)
Cadmo Thesis Template
4 pages
Practical Ontologies For Information Professionals
No ratings yet
Practical Ontologies For Information Professionals
192 pages
Fake News Prediction Review
No ratings yet
Fake News Prediction Review
14 pages
Learning Profiles in Duplicate Question Detection
No ratings yet
Learning Profiles in Duplicate Question Detection
7 pages
Standardization Problem Statement
No ratings yet
Standardization Problem Statement
5 pages
M Abdullah Full Stack MERN Developer
No ratings yet
M Abdullah Full Stack MERN Developer
2 pages
Year - B.Sc. (Data Science) (NEP Pattern) First Year Semester-I Subject - BSCDS011 - Data Structure and Algorithm Using Python
No ratings yet
Year - B.Sc. (Data Science) (NEP Pattern) First Year Semester-I Subject - BSCDS011 - Data Structure and Algorithm Using Python
2 pages
Information Retrieval System
No ratings yet
Information Retrieval System
10 pages
Google Resume Interview Offer SWE 1744325367
No ratings yet
Google Resume Interview Offer SWE 1744325367
1 page
21 - Data Structure and Algorithms - Hash Table
No ratings yet
21 - Data Structure and Algorithms - Hash Table
9 pages
Bilahari Gouravaghala
No ratings yet
Bilahari Gouravaghala
3 pages
ML-UNIT - I - Part A
No ratings yet
ML-UNIT - I - Part A
88 pages
Ssis Final Cheat Sheet
No ratings yet
Ssis Final Cheat Sheet
1 page
ADBMS Lec#2
No ratings yet
ADBMS Lec#2
42 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Database Management System - 071229
No ratings yet
Database Management System - 071229
44 pages
DBMS NOTES (Module4)
No ratings yet
DBMS NOTES (Module4)
26 pages
Deep Keywordnet: Automated English Keyword Extraction in Documents Using Deep Keyword Network Based Ranking
No ratings yet
Deep Keywordnet: Automated English Keyword Extraction in Documents Using Deep Keyword Network Based Ranking
33 pages
Semantic Web and Ontology Engineering: ITKS544
100% (1)
Semantic Web and Ontology Engineering: ITKS544
78 pages
Unit-V-Mca-305-Advanced DBMS
No ratings yet
Unit-V-Mca-305-Advanced DBMS
25 pages
Foundation of Datascience
No ratings yet
Foundation of Datascience
2 pages
Hall Ticket Number:: 14CS IT 504
No ratings yet
Hall Ticket Number:: 14CS IT 504
19 pages
Evolution of Big Data
No ratings yet
Evolution of Big Data
23 pages
Notes On Machine Learning (ML)
No ratings yet
Notes On Machine Learning (ML)
3 pages
Algorithm Analysis Chapter 1
No ratings yet
Algorithm Analysis Chapter 1
13 pages
Research Trend On OER (Reborn)
No ratings yet
Research Trend On OER (Reborn)
10 pages
The Ultimate C - C - SEN - 2011 - SAP Certified Application Associate - SAP Enable Now
No ratings yet
The Ultimate C - C - SEN - 2011 - SAP Certified Application Associate - SAP Enable Now
2 pages

ITECH2302 MainAssessment Report

Uploaded by

ITECH2302 MainAssessment Report

Uploaded by

ITECH2302 Big Data Management

ITECH2302: Assessment Task 2:

Learning outcomes assessed:

UC Irvine Machine Learning Repository

A cover page with your name and student ID

A passing grade will be awarded to assignments adequately addressing all assessment

2 - Background information 2x5

4 - Data preprocessing 2x5

6 – Considerations of Big Data 25

At least two references from peer

Total Marks 100

You might also like