0% found this document useful (0 votes)
53 views2 pages

CS5805 Proposal 1

Uploaded by

musab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views2 pages

CS5805 Proposal 1

Uploaded by

musab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Machine Learning, I

CS 5805
Final Term Project- Proposal

We are launching the first phase of the final term project (data selection & proposal) at this moment.
The selected dataset must satisfy the following criteria:

• Pick and an interesting, applied real world dataset from industry.


• It must be a multivariate dataset with at least 50K observations. If you have an interesting
dataset that is less than 50K samples, please come forward and talk to me.
• It must contain numerical & categorical data with at least two for each category. Please keep
in mind that you can also perform feature engineering and develop new features.
• It must come from a non-classified (public) database.
• The dataset selection/allocation is based on first comes first serve.

Write a paragraph proposal (max one A4 size page) and justify how the selected data meet the above
criteria. You are going to perform regression, classification, clustering, and association rule mining on the
selected dataset. You need to include the answer to the following questions inside the proposal:

• (For regression analysis) Which feature is selected as a dependent variable and which features
are selected as independent variable? Is there any need for feature engineering? Is there any
need for encoding? Explain your answer.[10 pts]

• (For clustering & classification) Which variable is selected as a dependent variable and which
features are selected as independent variable? Is there any need for feature engineering? Is
there any need for encoding? Is this a binary classification or multi-label classification? Explain
your answer. [10 pts]

• (For association rule mining) Which feature is selected as a dependent variable and which
features are selected as independent variable? How the association rule mining will be
important in the selected dataset? Explain your answer. [10 pts]

There are several resources available to acquire dataset i.e.

• https://www.kaggle.com
• https://archive.ics.uci.edu/ml/index.php
• https://datasetsearch.research.google.com
• https://analyticsindiamag.com/top-10-popular-publicly-available-datasets-deep-learning-
research/

Submission Guidelines

1. The deadline to submit the term project proposal and selected dataset is by 10/1/2024.
2. Submit the pdf of the proposal before the deadline.
3. Upload the excel, csv, jason… of the selected dataset through canvas under Term Project-
Proposal.
4. Fill out the following shared excel sheet with the selected dataset before the deadline.
https://virginiatech-
my.sharepoint.com/:x:/g/personal/nikhilsa_vt_edu/EWttJnhJYzBLkWlEu0uD3DYBIgGv2_bMpjd6
o0vyRi-8qQ?e=1UF1vw&CID=6F7FD3B3-8169-4FA0-8637-D6488570E48F&wdLOR=c9A1D6D14-
FC23-411D-AB18-0FC950A6B4CE

You might also like