Data 110 Project 2
Data 110 Project 2
Data Project 2
For this assignment, you will incorporate the techniques we have learned so far in class. Think of this final project
like your final exam, where you are encapsulating what you have learned over the course of the semester. I would
like you to push yourself to try some new technique or add an element that I did not teach – either you find it
yourself or use a classmate’s idea. The point is to try something new.
You will select a dataset and email it to me for approval. You might be allowed to use a dataset you already used for
project 1, but only if you can justify to me how you will use the information differently for this final project. The
dataset should have a minimum of 6 variables, and at least two quantitative and two categorical variables. Other
helpful variables that might be included in your dataset include dates and/or mapping locations.
You will submit your dataset choice in a separate assignment dropbox by Friday, June 28th.
Steps
1. Open a new quarto document. Add an image that reflects something about your topic. Please credit the source
for your image.
Then write your intro paragraph: describe your topic, define your variables, define the source for your dataset.
Include an additional paragraph with some background research on your topic. Be sure to cite your source for the
background information and put it in the bibliography at the end. Explain why you chose this topic and dataset
– what meaning does it have for you?
2. Load the libraries and dataset. Load your dataset using the readr::read_csv() command (do NOT use read.csv() ).
3. Clean and explore the data variables and keep track of all of your cleaning and explorations in an R-Markdown
document. Be sure to comment your actions in each chunk.
4. Your R code must have at least two dplyr commands such as select, filter, summarize, mutate, group_by, arrange,
etc. Your visualizations must have non-default ggplot themes and non-default colors (you must intentionally
change the color palette)
5. Perform at least one of the following statistical analyses: linear, multiple linear, or logistic regression. Write the
equation for your model, p-values, diagnostic plots, and adjusted R2 values. Then ANALYZE what these
values suggest about your model. This statistical analysis will be separate from your other final two
visualizations.
6. Explore both quantitative and categorical variables with simple plots to determine what you want to focus on for
your final visualization.
7. Plot at least TWO distinct types of visualizations in addition to the statistical analysis. Plot types could
include maps in R or Tableau. If you create a visualization in Tableau, be sure to include the link to your Tableau
visualization directly on your Markdown File. During your exploration, keep a running commentary in the
Markdown text area of what you are doing and why you are doing it. Your final submission for each of the two
visualizations must include all of the following aesthetics:
a. Intro (at the beginning of the markdown document): The topic of the data, any variables included, what kind
of variables they are, where the data came from and how you cleaned it up (be detailed and specific, using
proper terminology where appropriate). Attempt to discover HOW the data was collected – describe the
methodology, or state clearly that there is no ReadMe file with that information. Be sure to explain why you
chose this topic and dataset – what meaning does it have for you?
b. Incorporate background research about this topic. This background information will include
information you find in an article, website, or book. Please source this background information within
the essay or if you have multiple sources, include a bibliography. I am not particular about the format of
this bibliography. If you need help finding articles, I am happy to help you and/or show you how to search the
MC Library Database.
c. What the visualization represents, any interesting patterns or surprises that arise within the visualization, and
anything that could have been shown that you could not get to work or that you wished you could have
included.
Render your document and publish it in rpubs. Submit the link in the Assignment Dropbox by 11:59 pm on Sunday, July
7th.
Prepare a 2-minute presentation of your project to present to the class for Tuesday, July 2nd. Highlight the following:
1. Your topic
2. The variables in your dataset
3. What your visualization represents
4. Any interesting findings or corroborations with your background research - be sure to mention sources for any
background information you mention.
Rubric for Evaluation of Final Project
Acceptable One or two of the 10 steps above are omitted; or two or three of the 10 steps are 80%
underdeveloped.
Developing Three of the 10 steps are omitted; or 4 or 5 of the 10 steps are underdeveloped. 60%
Competence
Inadequate The project has at least one serious weakness. Less than half of the requirements have 40% or
been completed. lower