BANA 3010 Assignment 2
BANA 3010 Assignment 2
Question 1-25pts
Using the mtcars data set in R, please answer the following questions.
• Report the number of variables and observations in the data set using the print function.
• Print the summary statistics of the data set and report how many discrete and continuous
variables are in the data set using the print function.
Assignment № 2 Page 1
• Calculate the mean, variance, and standard deviation for the variable mpg and assign
them into variable names m, v, and s. Report the results in the print statement.
• Create two tables to summarize 1) average mpg for each cylinder class and 2) the standard
deviation of mpg for each gear class.
• Create a crosstab that shows the number of observations belong to each cylinder and
gear class combinations. The table should show how many observations given the car
has 4 cylinders with 3 gears, 4 cylinders with 4 gears, etc. Report which combination is
recorded in this data set and how many observations for this type of car.
Question 2-25pts
Use different visualization tools to summarize the data sets in this question.
• Using the PlantGrowth data set, visualize and compare the weight of the plant in the three
separated group. Give labels to the title, x-axis, and y-axis on the graph. Write a paragraph
to summarize your findings. (Write your paragraph as comment lines () in your submission)
• Using the mtcars data set, plot the histogram for the column mpg with 10 breaks. Give
labels to the title, x-axis, and y-axis on the graph. Report the most observed mpg class
from the data set using print function.
• Using the USArrests data set, create a pairs plot to display the correlations between the
variables in the data set. Plot the scatter plot with Murder and Assault. Give labels to the
title, x-axis, and y-axis on the graph. Write a paragraph to summarize your results from
both plots.
Question 3 - 25pts
Let’s find out what explains the housing prices in New York City via the data set in housing.csv.
Note: Check your working directory to make sure that you can download the data into the data
folder.
Assignment № 2 Page 2
• Create your own descriptive statistics and aggregation tables to summarize the data set
and find any meaningful results between different variables in the data set.
• Create multiple plots to demonstrates the correlations between different variables. Re-
member to label all axes and give title to each graph.
Questin 4 - 25pts
In this problem, we will explore another car data set. Read in the data in the file data.csv as a
R data frame. Print out the dimension of the data set. You will see that it is of pretty decent size.
The str function is a useful function that tells you the type of data in each column. However,
don’t fully trust the result.
• Make a plot of the mean and standard deviation of MSRP (Manufacturer’s Suggested
Retail Price) and mpg of the cars by year from 1990 to 2017. What do you observe?
• Create a bar chart of the number of cars by year (from 1990 to 2017). What is your
observation?
• Create a pie chart of the car make in the data set. You can use this documentation:
https://r-graph-gallery.com/piechart-ggplot2.html.
• Compare Highway mpg and City mpg across car makes. Make a plot of your choice and
state your observation.
Assignment № 2 Page 3