0% found this document useful (0 votes)
23 views3 pages

Project 1

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views3 pages

Project 1

Uploaded by

jiejialing08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

158.

739-2024 Semester 1 Massey University

Project 1

Deadline: Hand in by midnight April 5 2024


Evaluation: 20% of your final course grade.
Work This assignment is expected to be completed individually. See below.
Purpose: Gain experience in perform data wrangling, data visualization and introductory data
analysis using Python with suitable libraries. Begin developing skills in formulating a
problem from data in a given domain, asking questions of the data, extracting insights
from a real-world dataset. Learning outcomes 1, 2 and 4 from the course outline.

Project outline:

This project requires that you perform data cleaning, exploratory data analysis (EDA) as well as uncover insights from a
real-world dataset. You are required to present your work in a Jupyter Notebook. The notebook is expected to have the
general structure of a report, together with all the Python scripts embedded in it and, descriptions of the steps you took in
your analysis and the data cleaning processes.

After you have cleaned the data and prepared it for analysis, your task is to gain an understanding of the problem domain,
which will enable you to formulate some assumptions as well as key questions that will drive your research. The research
objectives are open-ended. It is your task to find correlations, interesting trends and innovative ideas on how to best use
the data in the dataset.

You will need to transform data into different formats where necessary. Be creative and generate new columns as
derivatives from others where useful. Make justifiable decisions on how to handle missing values depending on your
research goals. Look for erroneous values and restore the integrity of the data where needed. Be critical.

Utilise a variety of exploratory data analysis techniques to make sense of the data, which will then guide you to dig deeper
and drive new avenues of investigation. Use visualisations to communicate your insights and messages to the reader. Be
effective with how you construct your graphs and preserve accuracy and integrity.

Finally, you may install and use any additional Python packages you wish that will help you with this project.

Dataset Domain:

The dataset covers socio-economic data on New Zealand, stretching back to early 1980s. The data covers a range of
topics: income and wealth distribution, poverty and deprivation levels, health measures, education outcomes, safety and
security, housing as well as employment. The data is captured by various government agencies as well as some private
sector entities.

There are approximately ~100 columns in the dataset. The columns range widely in their completeness and coverage. A
document is provided which explains briefly what each column means and where it originated.

The dataset has been intentionally tampered with in order to provide you with a sufficient amount of practice in data
wrangling and cleaning. Cleaning the dataset represents a significant amount of marks in the assignment.

Once the dataset is ready for analysis, consider how to create a data product from your insights that helps inform public
discourse on these socio-economic matters.

Dataset Usage Conditions:

The dataset was collated by a group of researchers belonging to the Knowledge Exchange Hub at Massey University. The
dataset values are obtained from a mixture of publicly available sources as well as confidential private sources. It also
contains a number of derived values. The dataset has not been updated and as such serves as a good opportunity for
students to hunt out the data sources where possible and to update the raw values and the analysis since it was originally
conducted. The website describing this project as well as a publication regarding the dataset and its analysis can be found
here: https://sharedprosperity.co.nz Your analysis is expected to consider the data from a unique perspective to that found
on the website.

1
158.739-2024 Semester 1 Massey University

Bonus Marks:

Additional marks are offered to students who are prepared to go beyond the specified requirements. Bonus marks will be
granted in respect to the meaningful integration of additional data into the main dataset. The additional data files comprise
the NZ General Social Survey Data from the 2008, 2010, 2012, 2014, 2016 years. These data files are provided. You are
welcome to integrate latest releases on these data too for additional marks.

Some of the variables can also be updated with more recent values. You will be awarded additional marks if you take the
effort to acquire these datapoints.

Marking criteria:

Marks will be awarded for different components of the project using the following rubric:

Component Marks Requirements and expectations


Data Wrangling 30 Thoroughness of the data cleaning using Python.
EDA/Visualisation 30 Quality of investigation into potential erroneous values, decision making process on
how to handle missing data and potential interpolation options.
Stating assumptions and justifying them.
Variety of exploratory research and inquiry into different aspects of the dataset, use
of broad and appropriate range of visualisations and their effective communication.
Data Analysis 30 Depth, sophistication and difficulty of analysis being performed.
Diversity of techniques used to answer the research questions and communicate the
findings to the reader.
Report Presentation 10 Structure of the report and use of headers and formatting.
Clear sections and logical flow.
Well-articulated research questions and goals.
Suitable introduction and conclusion.
Tidy code sections and their explanations where needed.
Not cluttering the notebooks with too many dataframe data dumps.
BONUS MARKS
Integration of Additional 5 Meaningful integration and augmentation of insights with the NZ General Social
Datasets Survey data.
Updating of variables 5 Updating of variables with more recent values where possible.

Jupyter Notebook Template

A notebook template has been created for you that you are invited to use. Make sure that the introduction section has all
the necessary parts filled out that are relevant to your project. The template file is called ‘Jupyter Project Report
Template.ipynb’

Group Work:

This assignment is expected to be completed individually. However, students strongly desiring to complete this
assignment in pairs may be given permission on the condition that their final mark will be a maximum of 80%. The
completion of the bonus component would make their maximum score of 90%.

Hand-in:

Submit ONLY ONE Jupyter notebook file via the Stream assignment submission link. However, please extract an html
page from your notebook and submit this too in case there are errors in your notebook and we cannot open it. Please do
not email your submission to the teaching staff.
****************
*** Plagiarism ***
****************

It is mandatory that any assessment items that you submit during your University study are your own work. Massey
University takes a firm stance on academic misconduct, such as plagiarism and any form of cheating.

Plagiarism is the copying or paraphrasing of another person’s work, whether published or unpublished, without
clearly acknowledging it. It includes copying the work of other students and reusing work previously submitted by
yourself for another course. It also includes the copying of code from unacknowledged sources.
2
158.739-2024 Semester 1 Massey University

Academic integrity breaches impact on students as it disadvantages honest students and undermines the credibility of your
qualification. Plagiarism, and cheating in tests and exams will be penalised; it is likely to lead to loss of marks for that
item of assessment and may lead to an automatic failing grade for the course and/or exclusion from reenrolment at the
University.

Please see the Academic Integrity Guide for Students on the University website for more information. The Guide steps
you through the University Academic Integrity Policy and Procedures. For example you will find definitions of academic
integrity misconduct, such as plagiarism; how misconduct is determined and managed; and where to find resources and
assistance to help develop the skills of academic writing, exam preparation and time management. These skills will help
you approach university study with academic integrity.

You might also like