Final Project
Final Project
(DAT – 100)
Spring 2024/2025
Final Project
Supervised By:
Page 1 of 4
Course Final Project
Requirements
Students enrolled in DAT100 must complete the final project.
Due Date: Your final project must be submitted before Sunday 04/05/2025.
Project Report:
Your project submission should be a single notebook with a technical report format. It
should include a title, list of team members, summary, introduction, description of data,
description of methods, a summary of results, and discussion. The notebook should also
include all code and visualizations. Make sure to number figures and tables and include
informative captions.
Groups: Team size should not be less than 3 and more than 4.
Scoring:
• Your project will be scored based on the submitted report (the narrative within
Notebook).
• You will be graded based on: The code (Jupyter/Colab Notebooks), and the
accuracy of your final model on the test set.
• If you present your project, the person who will score your project will also attend
your presentation for additional context.
Project Choices
Pick your own question and data set or follow the recommendations we have provided
below.
Page 2 of 4
Recommendations for datasets and problems
• Kaggle (www.kaggle.com),
• UCI Machine Learning Repository (archive.ics.uci.edu)
• TerraDataUniversity (www.teradata.com/University/Academics)
• Analytics Vidhya (www.analyticsvidhya.com)
• Google Datasets (datasetsearch.research.google.com)
Design a Project
This project aims to carry through a data analytics workflow/process and put into practice
what you have learned in this course in a more open-ended setting than the assignments.
Specifically, the project should involve the following steps.
1. Frame a question of your choice that can be addressed by identifying, collecting,
and analyzing relevant data.
2. Describe and obtain the data.
3. Perform exploratory data analysis (EDA) and include at least two (but probably
many more) data visualizations in your report.
4. Describe any data cleaning or transformations you perform and why they are
motivated by your EDA.
5. Apply relevant inference or prediction methods (e.g., linear regression or
classification), including, if appropriate, feature engineering. Use cross-validation
or test data as appropriate for model selection and evaluation. Make sure to
carefully describe the methods you are using and why they are suitable for
answering the question.
6. Summarize and interpret your results (including visualization). Provide an
evaluation of your approach and discuss any limitations of your methods.
7. Describe any surprising discoveries that you made and future work.
In order to ensure that you have applied the course materials in a sufficient scope, we
impose the following two additional requirements.
• The analysis should involve at least one of the inference, descriptive or predictive
methods presented in this course.
• The dataset should have at least six distinct variables (i.e., columns) and a sample
size (i.e., rows) of 50 or more. Much larger datasets are encouraged.
Page 3 of 4
How to Submit
• Name your Folder using your name and SID (e.g., YourNameYYYYxxxx)
• You need to zip all files (Notebook and dataset files) together
• The file should be uploaded through the e-learning system (Moodle).
• The team leader is responsible for the submission.
Evaluation Criteria
Page 4 of 4