0% found this document useful (0 votes)
10 views3 pages

Final Project Guidelines - Option 2

The BUAN 651 project requires students to individually analyze a dataset using Python, accounting for 30% of their grade, with a due date of December 3rd. Students must present their findings in a 3-5 minute presentation and submit a zip file containing all source files and documentation. The project must include original work, well-documented code, and thorough data analysis, following specific guidelines for continuous and categorical variables.

Uploaded by

randomaxe1105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Final Project Guidelines - Option 2

The BUAN 651 project requires students to individually analyze a dataset using Python, accounting for 30% of their grade, with a due date of December 3rd. Students must present their findings in a 3-5 minute presentation and submit a zip file containing all source files and documentation. The project must include original work, well-documented code, and thorough data analysis, following specific guidelines for continuous and categorical variables.

Uploaded by

randomaxe1105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

BUAN 651 Project Guidelines

Option 2 - Data Analysis with Python

One of the requirements for BUAN651 is the nal project (30% of the grade). The project
must be done individually. This is an opportunity for you to be creative in solving a
problem that is of interest to you and demonstrate your pro ciency with python. The
project should be challenging enough so that you could discuss it at future interviews with
potential employers.

The project is due by December 3rd and include a presentation of your work (in class). Your
(3-5 minutes) presentation is an opportunity for you to go through your code succinctly to give a
general idea of your problem and solution.

In addition to the presentation, you are required to submit the following through
Blackboard in one zip le called <emailpre x>_ nal_project.zip:
• All source les: programs, classes and data
• Instructions how to run your code and install any third-party modules
o Note that third party libraries must be pre-approved by the instructor

Here are some of the key criteria to consider when thinking about the functionality of your
project and the elements it must contain.

1. It must be original work and not something that might be proprietary to your company,
etc.

2. The presentation and well-documented code should be at the level that other students
and lay people can understand what your project is all about. Do not use advanced
math or industry terms that would require a lot of explanation.

Imagine that you are have an interview and you are asked to describe in a few words a Python
project of your choice.

3. Your code is to be done in a Jupyter Notebook (.ipynb) le. You should include all code and
documentation within the le.

4. Look into the following sites as an example and select a data set that interests you. The dataset
should have a variety of numerical variables (continuous, discrete, etc.) and categorical variables.
Your dataset must have at least two continuous numerical variables and one categorical
variable.

1. https://www.kaggle.com/datasets
2. https://archive.ics.uci.edu/ml/index.php
3. Any other source of your choice
fi
fi
fi
fi
fi
fi
fi
fi
5. Preparing the Data: Import the data set into Pandas dataframe and document the steps for the
import process and any preprocessing (wrangling) that has to be done prior to or after the import.
Any python code used in the process should be included.

6. Analyzing the Data: Provide appropriate plots and interpretations for the variables of the
dataset. Analysis should include the standalone variables as well relationships amongst
the variables. Your analysis must include the minimum number of each of the following:

• With one continuous variable:


1. Show distribution statistics for the variable (min, max, quantiles, etc.).
2. Generate two appropriate graphs to display the distribution of the variable.
3. Identify whether there are any outliers in the variable (use combination of graphs and
other calculations to do so).
4. In 1-3 sentences, write up any interesting trends based on the graphs and statistics.

• With one categorical variable:


1. Show distribution statistics for the variable (counts, min, max, etc.).
2. Generate two appropriate graphs to display the distribution of the variable.
3. In 1-3 sentences, write up any interesting trends based on the graphs and statistics.

• Using two or more continuous variables:


1. Select at least two continuous variables you believe, based on the context/subject area
of the dataset, have a strong relationship with one another.
2. Explain in 1-3 sentences why you believe these variables should have a strong
relationship. Note: This is based on the subject of the data, not from any analysis.
3. Determine the correlation(s) between the variables.
4. Are they statistically signi cant? Why?
5. Generate scatterplot(s) to display the correlation(s).
6. In 1-3 sentences, write up any interesting trends based on the graphs and statistics.
Did they match what you expected from part 2?

• Using a numerical variable and a categorical variable:


1. Select a numerical variable and a categorical variable you believe, based on the
context/subject area of the dataset, have a some sort of relationship with one another.
2. Explain in 1-3 sentences why you believe these variables should have a strong
relationship. Note: This is based on the subject of the data, not from any analysis.
3. Generate at least two appropriate graphs, that includes both variables in each, that
helps see if there are difference in continuous values based on categories.
4. In 1-3 sentences, write up any interesting trends based on the graphs and statistics.
Did they match what you expected from part 2?

Notes:
1. Your graphs can be generated using any graphing module discussed in the course.
2. You must keep your dataset in a Pandas dataframe during preprocessing, graphing,
analysis, etc.
3. Your graphs must be appropriately labeled (i.e. have a title, xy-axis labels, legend, etc.)
fi
7. Write up a concluding summary (1-2 paragraphs) regarding your process of preparing and
analyzing the data. What observations did you learn with your analysis and was it inline with
what you might have thought prior to exploring the data? What other analysis would you be
interested in doing?

Before beginning your project, submit via Blackboard a few sentences describing your project
proposal. This is so your instructor can assess the appropriateness of your idea.

When submitting your project, include instructions for running the code and installing any
approved 3rd party modules. We cannot grade your project if it uses proprietary modules
or if we are unable to run it.

Finally, we want to emphasize that your project MUST not contain any proprietary, non-
public or con dential algorithms and data from your employer or other sources.
fi

You might also like