Assignment 3-PDS Python-24S3
Assignment 3-PDS Python-24S3
Assignment 3-PDS Python-24S3
Introduction
This assignment covers core steps in the data science process. You will need to develop and
implement appropriate steps, in Ipython (Jupyter Notebook), to complete the corresponding
tasks. This assignment is intended to give you practical experience with the typical steps of the data
science process.
The “Practical Data Science with Python” Canvas contains further announcements and a
discussion board for this assignment. Please be sure to check these on a regular basis - it is your
responsibility to stay informed with regards to any announcements or changes.
This assignment is teamwork, each team with at most 3 students. It is up to you to form a
team. Once you have formed your team, you should register your team on Canvas.
Important: you must register your team on Canvas. Anyone without a team by 31st
December 2024 will be randomly assigned to a team. If you have strong reasons for needing
to complete the assignment with less than 3 members, you may apply to do so by sending an
email to the lecturer, explaining your reasons. However, bear in mind that the requirements and
available marks will be the same as for a team of 3. In addition, please submit what percentage
each member contributed to the assignment and include this in your report. The contributions
of your group should add up to 100%. The ones with too little contribution (e.g. less than 15%
contribution) will have their marks reduced. You may need a team leader to manage the
teamwork.
Plagiarism
RMIT University takes plagiarism very seriously. All assignments will be checked
with plagiarism-detection software; any student found to have plagiarised will be subject
to disciplinary action as described in the course guide. Plagiarism includes submitting code
that is not your own or submitting text that is not your own. Allowing others to copy your work
is also plagiarism. All plagiarism will be penalised; there are no exceptions and no excuses.
More information on Academic Integrity is available at
https://www.rmit.edu.vn/students/my-studies/assessment-and-results/academic-integrity
RMIT Classification: Trusted
1.1. Online News Popularity Data Set. More details can be found from the following UCI
webpage about this dataset: https://archive.ics.uci.edu/dataset/332/online+news+popularity
1.2. Secondary Mushroom Data Set. More details can be found from the following UCI webpage
about this dataset:
https://archive.ics.uci.edu/dataset/848/secondary+mushroom+dataset
1.3. Online Shoppers Purchasing Intention Dataset Data Set. More details can be found from the
following UCI webpage about this dataset:
https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset#
3. Option
You can propose another data set to work on tasks for Problem type 1 OR type 2. However, the
data set must be at least at the level of complexity (in terms of size and data types) with the data sets
given above and must be with the same tasks. You need to send an email with a detailed description of
the data set and the tasks that you will work on for the project. You need to get written permission from
the teaching staff before working on your proposed project.
After you have built two clustering models and two classification (or regression) models, on
your data, the next step is to compare the performance of the selected models. You need to include
the results of this comparison, including a recommendation of which model should be used, in your
report (see Task 4).
All the files should be zipped together, and they must be submitted as ONE single zip file,
named as your team number (for example, 1.zip if your team ID is 1). The zip file must be
submitted in Canvas: Assignments/Assignment 2. Please do NOT submit other unnecessary
files.
Important information
Academic Dishonesty: This is an advanced course, so we expect full professionalism and ethical
conduct. Plagiarism is a serious offense. Sophisticated plagiarism detection may be used to check
against other submissions in the class as well as resources available on the web. We will pursue
the strongest consequences available according to the University Academic Integrity policy. In
a nutshell, never look at solutions done by others (e.g., classmates, websites or AI tools).
Silent Policy: A silent policy will take effect 24 hours before this assignment is due. This means
that no question about this assignment will be answered, whether it is asked on the newsgroup,
by email, or in person.