0% found this document useful (0 votes)

195 views5 pages

Assignment 3-PDS Python-24S3

Uploaded by

Voldemort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

195 views5 pages

Assignment 3-PDS Python-24S3

Uploaded by

Voldemort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RMIT Classification: Trusted

RMIT Vietnam University

School of Science, Engineering and Technology

COSC2999 - Practical Data Science with Python

Assignment 3: Group project
Due: 08:00 AM, January 18th, 2025 (week 12)
This assignment is worth 30% of your overall mark.

Introduction
This assignment covers core steps in the data science process. You will need to develop and
implement appropriate steps, in Ipython (Jupyter Notebook), to complete the corresponding
tasks. This assignment is intended to give you practical experience with the typical steps of the data
science process.
The “Practical Data Science with Python” Canvas contains further announcements and a
discussion board for this assignment. Please be sure to check these on a regular basis - it is your
responsibility to stay informed with regards to any announcements or changes.
This assignment is teamwork, each team with at most 3 students. It is up to you to form a
team. Once you have formed your team, you should register your team on Canvas.
Important: you must register your team on Canvas. Anyone without a team by 31st
December 2024 will be randomly assigned to a team. If you have strong reasons for needing
to complete the assignment with less than 3 members, you may apply to do so by sending an
email to the lecturer, explaining your reasons. However, bear in mind that the requirements and
available marks will be the same as for a team of 3. In addition, please submit what percentage
each member contributed to the assignment and include this in your report. The contributions
of your group should add up to 100%. The ones with too little contribution (e.g. less than 15%
contribution) will have their marks reduced. You may need a team leader to manage the
teamwork.

Where to Develop Your Code

You are encouraged to develop and test your code in two environments: Jupyter
Notebook (or Jupyter Lab) on Lab PCs or your laptop.

Plagiarism
RMIT University takes plagiarism very seriously. All assignments will be checked
with plagiarism-detection software; any student found to have plagiarised will be subject
to disciplinary action as described in the course guide. Plagiarism includes submitting code
that is not your own or submitting text that is not your own. Allowing others to copy your work
is also plagiarism. All plagiarism will be penalised; there are no exceptions and no excuses.
More information on Academic Integrity is available at
https://www.rmit.edu.vn/students/my-studies/assessment-and-results/academic-integrity
RMIT Classification: Trusted

Task 0: Choosing your project topic (1%)

This assignment covers the core steps of the data science process. You need to identify the data
science problem that you want your project to solve. The data science problem must be solvable using
Classification, Regression or Clustering techniques. Please choose carefully as you must list
measurable project goals, tangible deliverables and work on the project will a full data pipeline and
model deployment to solve that problem.
Examples of two types of problems you may select to work on are as follows. You need to work
on ONE problem for this asignment.

1. Problem type 1: Focusing on Data Modelling.

For this problem, you will model the data by treating it as classification, regression and/or
clustering tasks, depending on your choice. You need to select at least two tasks, one of which must
be a clustering task. For example, your choice can be classification and clustering.
You need to select one of the following datasets then work on it:

1.1. Online News Popularity Data Set. More details can be found from the following UCI
webpage about this dataset: https://archive.ics.uci.edu/dataset/332/online+news+popularity

1.2. Secondary Mushroom Data Set. More details can be found from the following UCI webpage
about this dataset:
https://archive.ics.uci.edu/dataset/848/secondary+mushroom+dataset

1.3. Online Shoppers Purchasing Intention Dataset Data Set. More details can be found from the
following UCI webpage about this dataset:
https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset#

2. Problem type 2: Building a recommender system.

For this problem, you will work on this dataset: Anonymous Microsoft Web Data Dataset.
Details can be found from the following UCI webpage
https://archive.ics.uci.edu/ml/datasets/Anonymous+Microsoft+Web+Data
You need to implement at least two approaches for building a recommender system, such as
content-based recommendation and collaborative filtering-based recommendation. For each of the
approaches, you need to use some of the data modeling techniques such as classification, regression
and/or clustering.

3. Option
You can propose another data set to work on tasks for Problem type 1 OR type 2. However, the
data set must be at least at the level of complexity (in terms of size and data types) with the data sets
given above and must be with the same tasks. You need to send an email with a detailed description of
the data set and the tasks that you will work on for the project. You need to get written permission from
the teaching staff before working on your proposed project.

Task 1: Retrieving and Preparing the Data (5%)

Being a careful data scientist, you know that it is vital to set the goal for the project, then
thoroughly pre-process any available data (each attribute) before starting to analyse and model
it. In this step, you need to deal with potential issues in the data (such as: impossible values,
missing values, duplication, etc.) and explore it.
In your report in Task 4, you need to clearly state the goal of your project, and the design/steps of
pre-processing your data. Please ensure you understand the data you selected.
RMIT Classification: Trusted

Task 2: Feature Engineering (5%)

Use suitable Python functions to extract potential features for model input. Conduct appropriate
analysis to evaluate feature importance (e.g. correlation analysis), then use suitable method(s) to select
the final features for the model. The feature choices must be explained via analysis.
Note: These steps must be performed consistently for trainning, validation, and test sets.

Task 3: Data Modelling (10%)

Model the data by treating it as either a clustering, classification and/or regression task,
depending on your choice.
For Problem type 1, you must use at least two different models for each approach (i.e. two
classification models and two clustering models).
For Problem type 2: The two tasks for data modeling (in the two recommender systems) can
be any of the three data modeling approaches (classification, clustering, or regression).
When building each model, you must include the following steps:
• Select appropriate features.
• Select the appropriate model (e.g. DecisionTree for classification) from sklearn.
• Train and evaluate the model appropriately.
• Train and evaluate the model by selecting the appropriate values for each parameter in the
model. You need to show how you choose these values and justify why you choose them.
• Discuss any problem you may observe or discover, such as data leakage, data bias.

After you have built two clustering models and two classification (or regression) models, on
your data, the next step is to compare the performance of the selected models. You need to include
the results of this comparison, including a recommendation of which model should be used, in your
report (see Task 4).

Other Evaluation Criteria: Innovative Model (bonus 2%)

Out of the four selected models, there should be at least one innovative model (the other three models
can be simple models). A simple model using only one algorithm for model training with some parameter
tuning is not considered as an innovative model. For example, using a K-NN classifier from scikit-learn
without any modification will be considered a simple model and won’t have any point.
If you use a model from any research work, you must cite the reference correctly. An example of an
innovative model is as below:
+ 1 point: a linear stacking of multiple algorithms or an ensemble model.
+ 2 points: a complex ensemble model or a complex combination of multiple algorithms. You can
propose a new model (algorithm) here.
Give a short explanation about the classification results obtained from the innovative model.

Task 4: Report (4%)

Write your report and save it in a file called report.pdf, and it must be in PDF format, and
must be at most 16 (in single column format) pages for everything (including figures and references)
with a font size 12. Penalties will apply if the report does not satisfy the requirements. Remember
to clearly cite any sources (including books, research papers, course notes, source code, etc.) that
you referred to while designing aspects of your programs.
RMIT Classification: Trusted

Your report must have the following structure:

• A cover page, including:
– Statement of the solution representing your own work as required.
– Title
– Author information
– Affiliations
– Contact details
– Date of report
• Table of Content
• An abstract/executive summary
• Introduction
• Methodology
• Results
• Discussion
• Conclusion
• Reference

Task 5: Presentation (5%)

You will be required to make a presentation in the last session of the course. The presentation
should include, but not limit to:
– briefly describe your chosen problem and dataset(s).
– describe the data preparation steps.
– state the hypotheses/questions that you were investigating.
– explain what the modelling steps are, and what the results are.
– demo of the model deployment.
– show the conclusion and recommendation.
You need to prepare 10-12 slides for the in-class presentation and demonstration.
The presentation should be at a maximum of 20 minutes per group, including 3-5 mintues for demo
and 3-5 minutes for Q&A. Each group member must present at least 2 slides in the presentation.
Your presentation slides must be included in the submission before the presentation date.
5.1. Slide and presentation (2 points)
• The slides must follow RMIT University template.
• The slides and presentation must clearly present the research question(s), the used methods
for solving the problem(s), the findings (results), and recommendations.
• The presentation is scheduled on Saturday, January 18th, 2025 (week 12) during our
regular class time).
5.2. Demo (1 point): The code runs without error, showing the exact results as presented in the
report.
5.3. Q&A (2 points): Students answer the questions by the lecturer and other students clearly and
convincingly.
RMIT Classification: Trusted

What to Submit, When, and How

Each team needs to make one submission on Canvas.
The assignment is due at 8:00 AM, Saturday the 18th, January 2025 (in week 12). Assignments
submitted after this time will be subject to standard late submission penalties.
You need to submit the following files:
• A notebook file containing your python commands, ‘Assignment3.ipynb’. For the
notebook files, please make sure to clean them and remove any unnecessary lines of
code (cells).
• Your report.pdf file at most 16 (in single column format) pages (including figures and
references) with a font size between 10 and 12 points.
• A “readme.txt" file (if needed) includes your name and student ID, and instructions for
how to execute your submitted script files.
• A presentation file (slides, in pptx or PDF format) for your presentation.

All the files should be zipped together, and they must be submitted as ONE single zip file,
named as your team number (for example, 1.zip if your team ID is 1). The zip file must be
submitted in Canvas: Assignments/Assignment 2. Please do NOT submit other unnecessary
files.

Important information
Academic Dishonesty: This is an advanced course, so we expect full professionalism and ethical
conduct. Plagiarism is a serious offense. Sophisticated plagiarism detection may be used to check
against other submissions in the class as well as resources available on the web. We will pursue
the strongest consequences available according to the University Academic Integrity policy. In
a nutshell, never look at solutions done by others (e.g., classmates, websites or AI tools).

Silent Policy: A silent policy will take effect 24 hours before this assignment is due. This means
that no question about this assignment will be answered, whether it is asked on the newsgroup,
by email, or in person.

--- The End ---

PDS 2510 Assignment 3
No ratings yet
PDS 2510 Assignment 3
5 pages
AI Project: Real-World Data Classification
No ratings yet
AI Project: Real-World Data Classification
6 pages
Data Science Project Guidelines 2025
No ratings yet
Data Science Project Guidelines 2025
3 pages
CSCI946 Assignment - 1 - Task - Sheet
No ratings yet
CSCI946 Assignment - 1 - Task - Sheet
4 pages
Data Science for Engineers Course
No ratings yet
Data Science for Engineers Course
8 pages
Assignment2 2024
No ratings yet
Assignment2 2024
4 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
DS Project Requirements Ver 2021
No ratings yet
DS Project Requirements Ver 2021
2 pages
CWBrief
No ratings yet
CWBrief
2 pages
Milestone
No ratings yet
Milestone
7 pages
Project Guidelines (ISE-291 - T 241)
No ratings yet
Project Guidelines (ISE-291 - T 241)
3 pages
CST8390 FinalProject 25S
No ratings yet
CST8390 FinalProject 25S
4 pages
CS502M Project Spec
No ratings yet
CS502M Project Spec
8 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Computational
No ratings yet
Computational
7 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
FIT1043 A2 Specification - S2 2024 - Gks6arg
No ratings yet
FIT1043 A2 Specification - S2 2024 - Gks6arg
5 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Assigment Instructios
No ratings yet
Assigment Instructios
3 pages
dsm020 Coursework
No ratings yet
dsm020 Coursework
3 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Data Science & Big Data Lab Guide
No ratings yet
Data Science & Big Data Lab Guide
167 pages
Final Project
No ratings yet
Final Project
4 pages
Project Big Data
No ratings yet
Project Big Data
2 pages
Data Science-1
No ratings yet
Data Science-1
6 pages
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
CSL7620 A2
No ratings yet
CSL7620 A2
2 pages
CS7641 Assignment 1: Supervised Learning
No ratings yet
CS7641 Assignment 1: Supervised Learning
4 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Predicting Loan Repayment with Data Mining
No ratings yet
Predicting Loan Repayment with Data Mining
2 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Task 2P-1
No ratings yet
Task 2P-1
4 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Assignment 2 Task Sheet
No ratings yet
Assignment 2 Task Sheet
3 pages
SPA Group 13 - Assignment 2 Problem Statement
No ratings yet
SPA Group 13 - Assignment 2 Problem Statement
2 pages
Cits2402 Assignment
No ratings yet
Cits2402 Assignment
7 pages
Tasks B.2 - Data Processing 1
No ratings yet
Tasks B.2 - Data Processing 1
1 page
Assignment 2 - Bayesian Classification
No ratings yet
Assignment 2 - Bayesian Classification
2 pages
MIE1624 - Assignment 3
No ratings yet
MIE1624 - Assignment 3
6 pages
AA Syllabus 2024 25
No ratings yet
AA Syllabus 2024 25
4 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Task - Case Study - DLMDSME01
No ratings yet
Task - Case Study - DLMDSME01
7 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Group Assignment 01
No ratings yet
Group Assignment 01
3 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
AI Project With Placeholders Final
No ratings yet
AI Project With Placeholders Final
24 pages
Data Science Assignment Guidelines
No ratings yet
Data Science Assignment Guidelines
3 pages
Comprehensive Guide to Data Science Basics
No ratings yet
Comprehensive Guide to Data Science Basics
6 pages
IDS MIdterm Project - Section (C) Fall 24-25
No ratings yet
IDS MIdterm Project - Section (C) Fall 24-25
2 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
50 pages
Project Instruction
No ratings yet
Project Instruction
6 pages
Coursework Assessment MFKhan v1.4
No ratings yet
Coursework Assessment MFKhan v1.4
9 pages
Data Science
No ratings yet
Data Science
10 pages
Supervised Learning Regression Overview
No ratings yet
Supervised Learning Regression Overview
52 pages
Computer Programming Lab Guide
No ratings yet
Computer Programming Lab Guide
8 pages
12 - Gaurav Khairnar - Professional Experience 01 - 3 Yrs 0 Month
No ratings yet
12 - Gaurav Khairnar - Professional Experience 01 - 3 Yrs 0 Month
3 pages
Singapore's Crypto Regulatory Vision
No ratings yet
Singapore's Crypto Regulatory Vision
4 pages
JCL Tutorial $
No ratings yet
JCL Tutorial $
43 pages
ASTM C423 - 08a
No ratings yet
ASTM C423 - 08a
11 pages
BSF Air Wing Exam Notice 2021
No ratings yet
BSF Air Wing Exam Notice 2021
2 pages
Smart Waste Management System Using IoT - Research Paper Final
No ratings yet
Smart Waste Management System Using IoT - Research Paper Final
37 pages
ACS860 Multidrive Cabinet Guide
No ratings yet
ACS860 Multidrive Cabinet Guide
16 pages
Class 7 Fraction Concepts and Exercises
No ratings yet
Class 7 Fraction Concepts and Exercises
6 pages
Land Rover Diagnostic Aid Bulletin
100% (1)
Land Rover Diagnostic Aid Bulletin
9 pages
Castrol - Optitemp SB 100-1
No ratings yet
Castrol - Optitemp SB 100-1
2 pages
Snamprogetti-Desing For Piping Support-2001-138pages
No ratings yet
Snamprogetti-Desing For Piping Support-2001-138pages
138 pages
Datasheet DINFIR3
No ratings yet
Datasheet DINFIR3
6 pages
SAP S - 4HANA Migration Cockpit - Deep Dive LTMOM For Direct Transfer
No ratings yet
SAP S - 4HANA Migration Cockpit - Deep Dive LTMOM For Direct Transfer
79 pages
Causes and Solutions of Overfitting
No ratings yet
Causes and Solutions of Overfitting
1 page
MEP Mesh Welding Machines Overview
No ratings yet
MEP Mesh Welding Machines Overview
9 pages
Module 5 - HR Analytics
No ratings yet
Module 5 - HR Analytics
18 pages
Chapter 5 Communicating Electronically
No ratings yet
Chapter 5 Communicating Electronically
24 pages
Business Impact Analysis (BIA) and Risk Assessment Data Gathering Worksheet
100% (4)
Business Impact Analysis (BIA) and Risk Assessment Data Gathering Worksheet
6 pages
An FR4-Based Self-Packaged Full Ka-Band Low-Loss 14 Power Divider Using SISL To Air-Filled SIW T-Junction
No ratings yet
An FR4-Based Self-Packaged Full Ka-Band Low-Loss 14 Power Divider Using SISL To Air-Filled SIW T-Junction
4 pages
Binary and Octal Arithmetic Operations
No ratings yet
Binary and Octal Arithmetic Operations
21 pages
BJ Coiled Tubing Equipment Manual Version 1
95% (40)
BJ Coiled Tubing Equipment Manual Version 1
90 pages
Unit 2 (Last Topic) Model Based Software Architecture
No ratings yet
Unit 2 (Last Topic) Model Based Software Architecture
4 pages
986 Lcdmmitx
No ratings yet
986 Lcdmmitx
92 pages
Ramp Timer Pro Installation Guide
No ratings yet
Ramp Timer Pro Installation Guide
6 pages
BROILLER MPB94 NIECO Manual Técnico
No ratings yet
BROILLER MPB94 NIECO Manual Técnico
44 pages
FCFS Disk Scheduling Algorithm Explained
No ratings yet
FCFS Disk Scheduling Algorithm Explained
4 pages
Aravindhraj Miniproject Report Final 2
No ratings yet
Aravindhraj Miniproject Report Final 2
93 pages
Python Interface for LINGO API Guide
No ratings yet
Python Interface for LINGO API Guide
5 pages
Hind Rectifiers LTD - Global Rail 2025 Presentation.
No ratings yet
Hind Rectifiers LTD - Global Rail 2025 Presentation.
14 pages

Assignment 3-PDS Python-24S3

Uploaded by

Assignment 3-PDS Python-24S3

Uploaded by

RMIT Classification: Trusted

RMIT Vietnam University

COSC2999 - Practical Data Science with Python

Where to Develop Your Code

Task 0: Choosing your project topic (1%)

1. Problem type 1: Focusing on Data Modelling.

2. Problem type 2: Building a recommender system.

Task 1: Retrieving and Preparing the Data (5%)

Task 2: Feature Engineering (5%)

Task 3: Data Modelling (10%)

Other Evaluation Criteria: Innovative Model (bonus 2%)

Task 4: Report (4%)

Your report must have the following structure:

Task 5: Presentation (5%)

What to Submit, When, and How

--- The End ---

You might also like