School of IT & Business Technologies Graduate Diploma in Data Analytics (Level 7)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

School of IT & Business Technologies

Graduate Diploma in Data Analytics (Level 7)


Course Code & Title: GDDA612 – Data Transformation and Management
Assignment Title Assessment 1 Assessment Type Project 1
Level 6 Credits 15
Term & Cohort:
Due date:
Overall Weighting: 60%
Total marks available 100
Tutor:

Assessment Information
• This is an individual open-book assessment. The assessment is worth 60% of your total assessment
weighting of the entire course.
• You must complete the Student Declaration and attestation on Canvas LMS.
• You must submit your completed assessment to Canvas.
• You must ensure you are familiar with NZSE’s academic policies regarding assessments, and the relevant
resubmission regulations that apply to this course. These policies and regulations can be found on your
Course Outline on Canvas LMS.
• Your assessment will be marked on Canvas, and your assessment result will be provided on Canvas too.
• A maximum of 15% of the content may be quoted or paraphrased from other sources provided you
acknowledge and cite the original source of material you use.
• Use APA 7th edition referencing for all quoted or paraphrased material.
• All cases of plagiarism and/or cheating will be investigated and dealt with according to A08: Misconduct in
Assessment Policy.
• If you have any disabilities for writing, please discuss with your tutor options for alternative submission
methods.

Submission Instructions
You are required to ensure you have carried out the following before submitting your assessment:
• Naming convention of submitted file/s must be adhered to:
CourseCode_ AssessmentNumber_ AssessmentName_StudentNumber_DocumentNumber
For example, “GDD612_A1_Project_1_7647XXXX_1”
• All written answers are in your own words.
• Proofread and spell check the assessment work carefully.
• Submission should include word processed document or pdf that includes screenshots of all programming
scripts and the results of the scripts.
• Compressed files of the program scripts and data store must also be uploaded on Canvas.
• DO NOT email your document/s to your tutor; it must be uploaded to the Canvas LMS.
• Check that all evidence required has been uploaded to the link provided on the NZSE LMS (Canvas).

© NZSE - GDDA612 Data Transformation and Management - Assessment 1 [Project 1] v1.0 Page 1 of 10
Resits and Resubmissions of Assessments
Students enrolled in this programme have the opportunity for resubmission with the following
conditions:
a. One resubmission in Semester One only. No resubmissions will be permitted during Semester
Two.
A resubmission can only be offered where the original grade was at least 45%. No further teaching or
specific feedback should occur between the final submission date and the resubmission.
The maximum grade allowable for any resubmitted assessment event is 50%.
The student shall make a request for a resubmission within three working days of the return of the graded
assessment and negotiate a resubmission date. The agreed resubmission date must be adhered to, and
no further extensions will be given.
In all cases, the grade achieved on the resubmission will be the grade used to calculate the final course
grade.

Course aim
The course provides students with the skills to work with various data sources to collect and prepare data
for processing by cleaning and transforming it to appropriate format, import the pre-processed data to
data stores, query and manipulate the data sets and data structures, and export data to an appropriate
format for subsequent use.

Purpose
The purpose of this assessment is to prepare data stores by collecting data from various sources, applying
techniques of web scraping and data preparation, using a programming language to enhance business
decision making.

Learning Outcomes (LOs)


This assessment is mapped to the following learning outcomes for this course:
Apply web scraping, data preparation and cleansing techniques to prepare data for
LO1
analysis.
LO2 Import data to data stores using a programming language.

Graduate profile outcome


The above stated learning outcomes are mapped to the graduate profile outcomes shown below for this
programme:
Critically analyse data using data collection and transformation methods, analysis
DA GPO 05 methods, data and decision modelling concepts and visualisation tools to address
complex business situations ensuring ethical professional considerations are met.
Apply problem solving and strategic thinking skills to establish and create data
DA GPO 06 warehouses, apply data mining techniques, and find data patterns to enhance
business decision making.
Apply machine learning, AI and big data analytical technologies to data analytic
DA GPO 07 systems that support and improve business decision-making through critical thinking,
programming and research skills.

© NZSE - GDDA612 Data Transformation and Management - Assessment 1 [Project 1] v1.0 Page 2 of 10
Assessment Tasks - Overview

This is an individual assessment designed to evaluate your knowledge and skills in data transformation
and management. It encompasses tasks such as web scraping, data preparation, cleaning, and the
importation of data into a centralized data store. This assessment is the first project of the data
transformation and management course where you need to collect data from various sources, apply
web scraping and data preparation techniques, and utilise a programming language to improve
business decision-making. This is the scenario:

Scenario
You work as a data analyst at a company called TechTrends Inc. Your company aims to make informed
business decisions based on data collected from various online sources. Your primary task is to
establish a data transformation and management pipeline that involves web scraping, data
preparation, cleaning, and importing data into a central data store.

You are encouraged to focus on data analysis in diverse fields, including sports, retail, marketing,
finance, stock markets, banking, education, politics, travel, military affairs, fisheries, vehicles, housing,
social networks, movies, games, food, dairy, wine, beverages, books, flowers, cosmetics, clothing,
climate, environment, earthquakes, floods, and more. To accomplish this, you will use a programming
language such as Python and relevant libraries. Your ultimate objective is to enhance business
decision-making by delivering a clean and well-structured dataset for further analysis.

Assessment Guidelines
You may select the scenario described in the preceding section. Or, if you have any real projects from
industry clients, you can choose any of them. This, however, MUST be approved by your tutor.
Therefore, you are required to discuss the project details with your tutor and obtain approval before
beginning the assignment.

PART A - (Data Collection) [20 Marks]


LO1: Apply web scraping, data preparation, and cleansing techniques to prepare data for analysis.

Task 1: Identify Data Sources [5 marks]


Identify and describe three (3) different online data sources that are publicly accessible and relevant
to your chosen business problem. These sources could include websites, APIs, social media platforms,
etc.

Task 2: Web Scraping [15 marks]


a) Apply web scraping techniques to develop scripts (e.g., Python, R, etc.) that extract data from
the three specified web sources and import them into three different datasets. [5 Marks]
b) Explain how your scraping process adheres to ethical standards and complies with relevant
data privacy regulations. [5 Marks]
c) Reflect on the importance of responsible data collection and handling, considering principles
of respect and privacy as valued in the cultural contexts of Tikanga, Whakapapa, and Pepeha.
[5 Marks]

© NZSE - GDDA612 Data Transformation and Management - Assessment 1 [Project 1] v1.0 Page 3 of 10
PART B - (Data Preparation and Cleansing) [35 Marks]
LO1: Apply web scraping, data preparation, and cleansing techniques to prepare data for analysis.

Task 3: Data Preparation and Cleansing [25 marks]


Implement data cleansing methods using scripts (e.g., Python, R, etc.) to prepare all three (3) datasets
for further analysis. These methods should perform five (5) data-cleaning tasks that are suitable for
your prepared datasets., such as:

- Correcting possible typos


- Removing unwanted observations (erroneous, duplicate, or missing values).
- Removing irrelevant data
- Removing outliers
- Data type conversion
- Encoding/decoding and re-coding variables
- Converting units.

Note: You should not perform manual data cleaning. All data cleaning tasks should be automated
using the data cleansing methods.

Task 4: Documentation [10 marks]


a) Document the data preparation and cleansing processes. Explain the rationale behind each
step, summarise the cleaned dataset's structure, feature types, and size. Additionally, discuss
how this cleaned dataset can enhance the client's marketing strategies. [5 Marks]
b) Discuss any five (5) challenges you faced during the data preparation and cleansing process in
Task 3 and explain how you resolved them. [5 Marks]

PART C - (Data Importation) [45 Marks]


LO2: Import data to data stores using a programming language.

Task 5: Store Datasets [5 marks]


Write a script in Python, R, or another programming language to save cleaned and transformed
datasets into a specific format such as CSV, XML, JSON, etc.

Task 6: Merge Data [10 marks]


Write a script (e.g., Python, R, etc.) to extract three (3) datasets from a specified format (e.g., CSV,
XML, JSON, etc.), import them into separate DataFrames, merge them into a single DataFrame using
common columns as merge keys, and specify the type of merge operation (e.g., inner, outer, left, right)
while providing a rationale for the chosen merge operation.

Task 7: Indexing [10 marks]


Write a script in Python, R, or another programming language to perform indexing on a merged
DataFrame. Your script should address the following tasks and display the results:

a. Set a specific column as the index.


b. Reset the index to the default integer-based index.
c. Create a new DataFrame by selecting rows based on a conditional index.
d. Perform multi-level indexing by setting multiple columns as the index.

© NZSE - GDDA612 Data Transformation and Management - Assessment 1 [Project 1] v1.0 Page 4 of 10
Task 8: Sorting [3 marks]
Write a script in Python, R, or another programming language to sort the merged DataFrame based
on the values in a specific column in ascending and descending order. Please also explain your
approach.

Task 9: Summary Statistics [7 marks]


Write a script (e.g., Python, R, etc.) to perform summary statistics on merged DataFrame using built-
in method based on numeric data columns. Also interpret and discuss the summary statistic and
identify key findings contained in your data.

Task 10: Slicing [5 marks]


Write a script (e.g., Python, R, etc.) to slice and display a specific portion of the DataFrame based on
row and column selection. Please also explain your approach.

Task 11: Data Import [5 marks]


Write a script (e.g., Python, R, etc.) to import the merged DataFrame into NoSQL (e.g., MongoDB,
Cassandra, etc.) data store.

Note: Check the assessment marking rubric on pages 8-10 for the marks allocation and marking
criteria/guidelines to attain optimum marks.

© NZSE - GDDA612 Data Transformation and Management - Assessment 1 [Project 1] v1.0 Page 5 of 10
Assessment Marking Rubric - GDDA612 Data Transformation and Management
Assessment 1 – Project 1 (Total max. 100 marks)
CRITERIA EVIDENCE
Task 1: Identify The learner has identified three (3) relevant publicly The learner has identified only two (2) relevant The learner has identified only (1) one relevant The learner has not completed
Data Sources accessible online data sources for the chosen publicly accessible online data sources for the publicly accessible online data source for the any requirements correctly or has
(5 Marks) business problem and provided a comprehensive chosen business problem and provided a chosen business problem and provided a not attempted this task at all.
description of each source, including name, type comprehensive description of each source, comprehensive description of that source, including
(e.g., website, API, social media), and their including name, type (e.g., website, API, social name, type (e.g., website, API, social media), and its
significance to the problem. media), and their significance to the problem. significance to the problem.
(0 Marks)
(5 Marks) (3 - 4 Marks) (1 – 2 Marks)
Part A – Data Collection (20 Marks)

Task 2: Web The learner has successfully developed the scripts The learner has developed the scripts that partially The learner has attempted to develop scripts but The learner has not completed
Scraping – a) (e.g., Python, R, etc.) that effectively extract data extract data from the specified web sources and has faced significant challenges in extracting data any requirements correctly or has
Application of from the three specified web sources and import import them into datasets, but there may be from web sources or importing it into datasets, not attempted this task at all.
Web Scraping them into three different datasets. The code is well- minor errors or inefficiencies in the code. resulting in incomplete or flawed scripts.
Techniques structured, efficient, and free from significant
(5 Marks) errors.
(0 Marks)
(5 Marks) (3 - 4 Marks) (1 – 2 Marks)
Task 2: Web The learner has provided a comprehensive and The learner has provided an explanation of ethical The learner has mentioned ethical considerations The learner has not completed
Scraping – b) well-explained description of how their scraping considerations and data privacy regulations in the and data privacy regulations, but their explanation any requirements correctly or has
Adherence to process adheres to ethical standards and complies context of web scraping, but some aspects are less lacks depth and clarity in their explanation. not attempted this task at all.
Ethical with relevant data privacy regulations. detailed or clear.
Standards and
Data Privacy
(5 Marks)
(5 Marks) (3 - 4 Marks) (1 – 2 Marks) (0 Marks)
Task 2: Web The learner has demonstrated a thoughtful and The learner has provided a reflection on responsible The learner has briefly mentioned the importance The learner has not completed
Scraping – c) insightful reflection on the importance of data collection and handling but may lack depth or of responsible data handling but does not provide a any requirements correctly or has
Reflection on responsible data collection and handling. The specificity in their explanation of cultural contexts. substantial reflection or connection to cultural not attempted this task at all.
Responsible learner has connected this reflection to principles contexts.
Data Collection of respect and privacy and explained how these
and Handling principles apply to the cultural contexts of Tikanga,
(5 Marks) Whakapapa, and Pepeha. (3 - 4 Marks)
(0 Marks)
(5 Marks) (1 – 2 Marks)

© NZSE - GDDA612 Data Transformation and Management Assessment 1 [Project 1] v1.0 Page 6 of 10
Task 3: Data The learner has correctly implemented data The learner has correctly The learner has The learner has correctly The learner has The learner has not completed
Preparation and cleansing methods for all three (3) datasets implemented data correctly implemented data correctly any requirements correctly or has
Cleansing and addressed five (5) tasks from the below cleansing methods for all implemented data cleansing methods for all implemented data not attempted this task at all.
(25 Marks) list. three (3) datasets and cleansing methods three (3) datasets and cleansing methods
• Correcting possible typos addressed only four (4) for all three (3) addressed only two (2) for all three (3)
(0 Marks)
• Removing unwanted observations tasks from the list. datasets and tasks from the list. datasets and
(erroneous, duplicate, or missing addressed only addressed only one
(16 - 20 Marks) three (3) tasks from (6 - 10 Marks) (1) task from the list.
Part B – Data Preparation and Cleansing (35 Marks)

values)
• Removing irrelevant data the list.
• Removing outliers (1 - 5 Marks)
• Data type conversion (11 - 15 Marks)
• Encoding/decoding and re-coding
variables
• Converting units

(21 - 25 Marks)
Task 4: The learner has presented a well-documented The learner has not completed
The learner has provided mostly adequate
Documentation – data preparation and cleansing process, any requirements correctly or has
documentation of data preparation and cleansing The learner's documentation is inadequate
a) Document the demonstrated a deep understanding of the not attempted this task at all.
processes with some detail gaps, shown a basic because it provides insufficient insight into the
data preparation rationale behind each step, efficiently
understanding of the rationale, provided a potential impact on the client's marketing
and cleansing summarised the attributes of the cleaned (0 Marks)
satisfactory summary of the dataset, and discussed strategies and does not provide enough detail on
processes dataset, and offered valuable insights into its
potential marketing enhancements with limited data preparation and cleansing.
(5 Marks) potential to improve the client's marketing
depth and creativity.
strategies.
(1 – 2 Marks)
(3 - 4 Marks)
(5 Marks)
Task 4: The learner has discussed five (5) or more The learner has discussed three )3) or four (4) The learner has discussed one (1) or two (2) The learner has not completed
Documentation relevant challenges, provided clear and relevant challenges, provided clear and effective relevant challenges, provided clear and effective any requirements correctly or has
b) Discuss any five effective solutions, demonstrated creativity in solutions, demonstrated creativity in problem- solutions, demonstrated creativity in problem- not attempted this task at all.
(5) challenges you problem-solving, and presented a well- solving, and presented a well-structured, error-free solving, and presented a well-structured, error-free
faced during the structured, error-free response. response. response. (0 Marks)
data preparation (5 Marks) (3 - 4 Marks) (1 - 2 Marks)
and cleansing
process in Task 3
and explain how you
resolved them.
(5 Marks)

© NZSE - GDDA612 Data Transformation and Management Assessment 1 [Project 1] v1.0 Page 7 of 10
Task 5: Store The script provided by the learner is error-free, The script provided by the learner is functional but The script provided by the learner is partially The learner has not completed
Datasets saving the cleaned dataset in the specified format, has minor errors or issues that affect its functional but has major errors or issues that affect any requirements correctly or has
(5 Marks) and the code is well-structured, well-commented, performance, but the code is well-structured, well- its performance. It is also somewhat organised but not attempted this task at all.
and includes explanations of key sections or steps. commented, and includes explanations of key lacks clarity and has some coding issues.
sections or steps. (0 Marks)
(5 Marks) (3 - 4 Marks) (1 - 2 Marks)
Task 6: Merge The learner's script has correctly extracted three (3) The learner's script has achieved most of the The learner script has significant issues or errors The learner has not completed
Data datasets, imported them into separate required functionality with minor issues or errors. that impact functionality. Moreover, the rationale any requirements correctly or has
(10 Marks) DataFrames, merged them into one DataFrame Moreover, the learner has also provided a clear and is provided but lacks depth or clarity. not attempted this task at all.
using common columns, and applied the specified good rationale for the choice of the specific type of
type of merge operation without errors. Moreover, merge operation (e.g., inner, outer, left, right). (1 - 3 Marks) (0 Marks)
the learner has also provided a clear and well-
explained rationale for the choice of the specific (4 - 7 Marks)
type of merge operation (e.g., inner, outer, left,
Part C – Data Importation (45 Marks)

right).
(8 - 10 Marks)
Task 7: Indexing The learner has correctly The learner has correctly implemented The learner has correctly implemented The learner has correctly implemented The learner has not
(10 Marks) implemented indexing on the merged indexing on the merged DataFrame, indexing on the merged DataFrame, indexing on the merged DataFrame, completed any
DataFrame, performed following four performed only three (3) tasks from the performed only two (2) tasks from the performed only one (1) task from the requirements
(4) tasks, and displayed the results: following, and displayed the results: following, and displayed the results: following, and displayed the results: correctly or has not
• Setting a Specific Column as • Setting a Specific Column as Index • Setting a Specific Column as Index • Setting a Specific Column as Index attempted this task at
Index • Resetting the Index • Resetting the Index • Resetting the Index all.
• Resetting the Index • Conditional Indexing • Conditional Indexing • Conditional Indexing
• Conditional Indexing • Multi-level Indexing • Multi-level Indexing • Multi-level Indexing (0 Marks)
• Multi-level Indexing
(6 - 8 Marks) (3 - 5 Marks) (1 - 2 Marks)
(9 - 10 Marks)
Task 8: Sorting The learner’s script has successfully sorted the The learner’s script has successfully sorted the The learner's script has sorted the merged The learner has not completed
(3 Marks) merged DataFrame both in ascending and merged DataFrame both in ascending and DataFrame in either ascending or descending order any requirements correctly or has
descending order based on the specified column, descending order based on the specified column, based on the specified column, but the explanation not attempted this task at all.
and it runs without errors. Moreover, the learner and it runs without errors. But the explanation of of the approach is missing, unclear, or lacks depth.
has provided a clear and well-explained rationale the approach is missing, unclear or lacks depth. (0 Marks)
for their approach to sorting the DataFrame, (1 Mark)
demonstrating an understanding of the process and (2 Marks)
the chosen column's significance.
(3 Marks)
The learner’s script has successfully calculated The learner’s script has successfully calculated The learner’s script has partially calculated The learner has not completed
summary statistics for all numeric data columns in summary statistics for all numeric data columns in summary statistics, contains minor errors, or lacks any requirements correctly or has
the merged DataFrame using appropriate built-in the merged DataFrame using appropriate built-in some important statistics as well as interpretation not attempted this task at all.
Task 9: methods (e.g., mean, min, max, count, standard methods (e.g., mean, min, max, count, standard is missing or inadequately addressed.
Summary deviation, etc.). Moreover, the learner has deviation, etc.). Moreover, the learner has (0 Marks)
Statistics effectively interpreted and discussed the summary interpreted and discussed the summary statistics, (1 - 2 Marks)
(7 Marks) statistics, and identified key findings in the data. and identified key findings in the data but the
interpretation lacks depth or clarity.
(6 -7 Marks)
(3 - 5 Marks)

© NZSE - GDDA612 Data Transformation and Management Assessment 1 [Project 1] v1.0 Page 8 of 10
Task 10: Slicing The learner’s script has successfully sliced and The learner’s script has successfully sliced and The learner's script has successfully sliced and The learner has not completed
(5 Marks) displayed the specified portion of the DataFrame displayed the specified portion of the DataFrame displayed the specified portion of the DataFrame, any requirements correctly or has
based on row and column selection, operated based on row and column selection, operated either based on row or column selection (not both). not attempted this task at all.
without errors, and delivered the expected output. without errors, and delivered the expected output. However, the provided explanation by the learner
Moreover, the learner has provided a clear and But the explanation provided by the learner is is unclear, incomplete, and lacks detail. (0 Marks)
comprehensive explanation of their approach to unclear, incomplete, or lacks detail.
slicing and displaying the DataFrame. (1 - 2 Marks)
(3 - 4 Marks)
(5 Marks)
Task 11: Data The learner’s script is fully functional, successfully The learner’s script is partially functional but has The script is minimally functional, with major errors The learner has not completed
Import importing the merged DataFrame into the minor errors or issues that affect its performance or that prevent successful execution. any requirements correctly or has
(5 Marks) specified NoSQL data store (e.g., MongoDB, doesn't exhibit optimal performance. not attempted this task at all.
Cassandra) without errors, and it demonstrates (1 - 2 Marks)
good performance and efficiency. (3 - 4 Marks) (0 Marks)

(5 Marks)

© NZSE - GDDA612 Data Transformation and Management Assessment 1 [Project 1] v1.0 Page 9 of 10
School of IT & Business Technologies
Graduate Diploma in Data Analytics (Level 7)
Cover Sheet and Student Declaration
This sheet must be signed by the student and attached to the submitted assessment.

Course Title: Data Transformation and Management Course code: GDDA612

Student Name: Student ID:


Assessment No
Assessment 1 - Project 1 Cohort:
& Type:
Date
Due Date: Submitted:
Tutor’s Name:
Assessment
Weighting: 60%

Total Marks: 100

Student Declaration:
I declare that:
• I have read the New Zealand School of Education Ltd policies and regulations on assessments and
understand what plagiarism is.
• I am aware of the penalties for cheating and plagiarism as laid down by the New Zealand School of
Education Ltd.
• This is an original assessment and is entirely my own work.
• Where I have quoted or made use of the ideas of other writers, I have acknowledged the source.
• This assessment has been prepared exclusively for this course and has not been or will not be submitted
as assessed work in any other course.
• It has been explained to me that this assessment may be used by NZSE Ltd, for internal and/or external
moderation.

Student signature:

Date:

Tutor only to complete


Part A Part B Part C
(max. 20 marks) (max. 35 marks) (max. 45 marks)

Assessment result: Marks: /100 Grade:

© NZSE - GDDA612 Data Transformation and Management Assessment 1 [Project 1] v1.0

You might also like