CSC 603 - Final Project
CSC 603 - Final Project
Overview
Machine Learning projects are mainly focus on machine learning algorithms and evaluations
by implementing or even modifying them to handle real-world problems. However, these
machine-learning algorithms is considered as somehow the final stage of a set of processes
starting from data collection, data preparation (including data wrangling), data visualization,
and prediction or forecasting, where the project's results can be seen. Our goal in this project
is to delve deeper into these steps especially the machine learning algorithms and practice them
with a well-known dataset.
• Dataset is one of the key aspects of the machine leaning project, therefore, the group can
choose one the following datasets:
1- Network intrusion detection dataset
https://drive.google.com/drive/folders/1sVPshUvHkOBwm0gvLQJgFB4XxoyswwVq?usp=sharin
g
3- Inconsistent and consistent amazon reviews: For detecting mismatch between review's
text and review's star rating
https://www.kaggle.com/datasets/yeshmesh/inconsistent-and-consistent-amazon-reviews
4- CUSTOMER_BANKANALYSIS_CLASSIFICATION:
Bank_Termdeposit_Customer_Analysis_Classificationmodel
https://www.kaggle.com/datasets/saikrishjalakam/customer-bankanalysis-classification
6- Document classification
https://www.kaggle.com/datasets/achrafbribiche/document-classification
7- Document classification2
https://www.kaggle.com/competitions/doc-class/data?select=sol_all1.csv
8- Document classification3
https://www.kaggle.com/datasets/haytemcharraj/document-classification
• Projects can be done individually, or in teams of two students. For a two-person group,
group members are responsible for dividing up the work equally and making sure that each
member contributes.
• Make 3 useful graphs that show different features of the data. Write a paragraph in your
report for each plot describing the interesting qualities that your visualization shows.
These must include the following:
o one-line chart
o one scatter plot
o one bar chart or histogram
2. Building Machine Learning Models (10 marks)
1) Selecting and Training Machine Learning Models: you have to choose and implement
at least 5 machine learning algorithms.
2) Evaluating the Model : Evaluation matrices: for classification- Accuracy, Precision,
Recall, F1, and Confusion matrix etc. For regression, Mean Squared Error (MSE) and R-
squared etc.
3) Hyperparameter Tuning: Once you have created and evaluated your model, see if its
accuracy can be improved in any way. This is done by tuning the parameters present in
your model.
Deliverables: (5 marks)
1. Code (‘.ipnyp’ and ‘html’ files):
1.1. Try your best to present the data in a good manner by showing it and their data
type using what we have been learned such as: head(), tail(), dtype(),…, etc.