Project 1
Project 1
Project 1
Project outline:
This project requires that you perform data cleaning, exploratory data analysis (EDA) as well as uncover insights from a
real-world dataset. You are required to present your work in a Jupyter Notebook. The notebook is expected to have the
general structure of a report, together with all the Python scripts embedded in it and, descriptions of the steps you took in
your analysis and the data cleaning processes.
After you have cleaned the data and prepared it for analysis, your task is to gain an understanding of the problem domain,
which will enable you to formulate some assumptions as well as key questions that will drive your research. The research
objectives are open-ended. It is your task to find correlations, interesting trends and innovative ideas on how to best use
the data in the dataset.
You will need to transform data into different formats where necessary. Be creative and generate new columns as
derivatives from others where useful. Make justifiable decisions on how to handle missing values depending on your
research goals. Look for erroneous values and restore the integrity of the data where needed. Be critical.
Utilise a variety of exploratory data analysis techniques to make sense of the data, which will then guide you to dig deeper
and drive new avenues of investigation. Use visualisations to communicate your insights and messages to the reader. Be
effective with how you construct your graphs and preserve accuracy and integrity.
Finally, you may install and use any additional Python packages you wish that will help you with this project.
Dataset Domain:
The dataset covers socio-economic data on New Zealand, stretching back to early 1980s. The data covers a range of
topics: income and wealth distribution, poverty and deprivation levels, health measures, education outcomes, safety and
security, housing as well as employment. The data is captured by various government agencies as well as some private
sector entities.
There are approximately ~100 columns in the dataset. The columns range widely in their completeness and coverage. A
document is provided which explains briefly what each column means and where it originated.
The dataset has been intentionally tampered with in order to provide you with a sufficient amount of practice in data
wrangling and cleaning. Cleaning the dataset represents a significant amount of marks in the assignment.
Once the dataset is ready for analysis, consider how to create a data product from your insights that helps inform public
discourse on these socio-economic matters.
The dataset was collated by a group of researchers belonging to the Knowledge Exchange Hub at Massey University. The
dataset values are obtained from a mixture of publicly available sources as well as confidential private sources. It also
contains a number of derived values. The dataset has not been updated and as such serves as a good opportunity for
students to hunt out the data sources where possible and to update the raw values and the analysis since it was originally
conducted. The website describing this project as well as a publication regarding the dataset and its analysis can be found
here: https://sharedprosperity.co.nz Your analysis is expected to consider the data from a unique perspective to that found
on the website.
1
158.739-2024 Semester 1 Massey University
Bonus Marks:
Additional marks are offered to students who are prepared to go beyond the specified requirements. Bonus marks will be
granted in respect to the meaningful integration of additional data into the main dataset. The additional data files comprise
the NZ General Social Survey Data from the 2008, 2010, 2012, 2014, 2016 years. These data files are provided. You are
welcome to integrate latest releases on these data too for additional marks.
Some of the variables can also be updated with more recent values. You will be awarded additional marks if you take the
effort to acquire these datapoints.
Marking criteria:
Marks will be awarded for different components of the project using the following rubric:
A notebook template has been created for you that you are invited to use. Make sure that the introduction section has all
the necessary parts filled out that are relevant to your project. The template file is called ‘Jupyter Project Report
Template.ipynb’
Group Work:
This assignment is expected to be completed individually. However, students strongly desiring to complete this
assignment in pairs may be given permission on the condition that their final mark will be a maximum of 80%. The
completion of the bonus component would make their maximum score of 90%.
Hand-in:
Submit ONLY ONE Jupyter notebook file via the Stream assignment submission link. However, please extract an html
page from your notebook and submit this too in case there are errors in your notebook and we cannot open it. Please do
not email your submission to the teaching staff.
****************
*** Plagiarism ***
****************
It is mandatory that any assessment items that you submit during your University study are your own work. Massey
University takes a firm stance on academic misconduct, such as plagiarism and any form of cheating.
Plagiarism is the copying or paraphrasing of another person’s work, whether published or unpublished, without
clearly acknowledging it. It includes copying the work of other students and reusing work previously submitted by
yourself for another course. It also includes the copying of code from unacknowledged sources.
2
158.739-2024 Semester 1 Massey University
Academic integrity breaches impact on students as it disadvantages honest students and undermines the credibility of your
qualification. Plagiarism, and cheating in tests and exams will be penalised; it is likely to lead to loss of marks for that
item of assessment and may lead to an automatic failing grade for the course and/or exclusion from reenrolment at the
University.
Please see the Academic Integrity Guide for Students on the University website for more information. The Guide steps
you through the University Academic Integrity Policy and Procedures. For example you will find definitions of academic
integrity misconduct, such as plagiarism; how misconduct is determined and managed; and where to find resources and
assistance to help develop the skills of academic writing, exam preparation and time management. These skills will help
you approach university study with academic integrity.