0% found this document useful (0 votes)
9 views3 pages

Assigment Instructios

The document outlines the requirements for a group project in data science, where students must analyze a real business problem using Python and a dataset. Each group must define an analytical question, select a suitable dataset, choose a model, conduct analysis, and present their findings in a written report and presentation. The final project is due on March 18, and students are encouraged to use various online resources to find datasets for their analysis.

Uploaded by

Luis Pineda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Assigment Instructios

The document outlines the requirements for a group project in data science, where students must analyze a real business problem using Python and a dataset. Each group must define an analytical question, select a suitable dataset, choose a model, conduct analysis, and present their findings in a written report and presentation. The final project is due on March 18, and students are encouraged to use various online resources to find datasets for their analysis.

Uploaded by

Luis Pineda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Assigment Instructios

We will have group project (up to 4 students per group) to help you better
understand the concepts of data science and apply what we learned in this
class to analyze a real problem.

All work must be done by yourself, including the Python code and
written report. You can use the code from HW or my class example
in the final project, but cannot directly copy the code from online
project.

Final Project Requirement:

Please see attached for a word template for the final written report, which
includes all sections discussed below.

The final project should include the following items:

1. Find a business problem and define an analytical quesiton that can be


tested and analyzed. Ideally, the analytical question should be
predictive or diagnostic.

2. Find a dataset, which can be used to answer your question. The


dataset should have at least 100 rows, one target variable, and at least
3 features for supervised learning. It should have at least 100 rows and
at least 3 features for clustering analysis.

3. Determine the type of model based on the question (regression,


classificaiton, clustering), and pick the best set of variables for your
analysis.

4. Do the analysis and conclude.

In the written report and presentation, each group should include the
following items:

1. Describe the research question, background information and the


motivation of study.

2. Describe your dataset, including data source, data collection method,


definition of columns, descriptive statistics.

3. Describe of the model(s), including the name of the model, why do you
pick it, how do you want to use it (what is the target and what is the
features).
4. Perform the data analysis, describe and discuss the results, including
how you run the model, what results you get, what findings you get
from the results.

 You need to use at least one model learned from the


class (including: K-means clustering, hierarchical clustering, linear
regression, logistic regression, decision tree, SVM)

1. Do the model evaluation (evaluate the model to see whether it is good)

2. Discuss potential problems and improvements

Submission Requirements:

1. Written report (5-8 pages, or longer) - you can use the attached word
template to create your report. It should be in a formal essay format.

2. Python code as appendix - please attache the Python code you used
for the analysis as a separate file. Please do not paste your Python
code inside the written report.

3. Recorded/Live presentation - You can choose to record your


presentation and share the recording with me, or schedule a time with
me to do a live presentation.

Only one submission per group is enough, and all group members will
receive the same grade (unless you do not participate the
report/presentation). You can overwrite your submission before the deadline
and only the last one will be graded.

The final due date is Mar 18 - No extension will be given.

-----------------------------------------------------------------------------------------------------------

Here are the websites from week 2. They can help you finding dataset
relevant to your research question. You can also use Kaggle website (the last
one) to find dataset and some idea about research question.

Data sources listed below can help you find proper data for a data
analysis project:

 Google Dataset Search

 U.S. Government's open data

 Jersey City open data


Here is another website widely used in the data science field. It provides
datasets for practice purpose. Please go to "dataset" on the top of kaggle
home page to access its data repository.

 Kaggle website

You might also like