0% found this document useful (0 votes)
36 views3 pages

BANA 3010 Assignment 2

Uploaded by

quandominh1507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

BANA 3010 Assignment 2

Uploaded by

quandominh1507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 1

BANA3010 Data driven analytics Fall 2024 – Assignment 2

Assignment Submission Instructions:

• Due date: Oct 25th, 2024


• Your program should work correctly on all inputs. If there are any specifications about
how the program should be written (or how the output should appear), those specifications
should be followed.
• Your code and functions should be appropriately commented. However, try to avoid mak-
ing your code overly busy (e.g., include a comment on every line).
• Variables and functions should have meaningful names, and code should be organized
into functions/methods where appropriate.
• Academic honesty is required in all work you submit to be graded. You MUST NOT copy
or share your code with other students to avoid plagiarism issues.
• Upload your work to Canvas as a .R file.
• Submit separate .R file for each problem with the following naming format: Assignment1_Q1.R.
Failure to submit a .R file for lab or assignment will result in a 0.
• Late submission of an assignment without an approved extension is NOT allowed.

Question 1-25pts

Using the mtcars data set in R, please answer the following questions.

1 # Loading the data


2 data ( mtcars )
3 # Head of the data set
4 head ( mtcars )

• Report the number of variables and observations in the data set using the print function.

• Print the summary statistics of the data set and report how many discrete and continuous
variables are in the data set using the print function.

Assignment № 2 Page 1
• Calculate the mean, variance, and standard deviation for the variable mpg and assign
them into variable names m, v, and s. Report the results in the print statement.

• Create two tables to summarize 1) average mpg for each cylinder class and 2) the standard
deviation of mpg for each gear class.

• Create a crosstab that shows the number of observations belong to each cylinder and
gear class combinations. The table should show how many observations given the car
has 4 cylinders with 3 gears, 4 cylinders with 4 gears, etc. Report which combination is
recorded in this data set and how many observations for this type of car.

Question 2-25pts

Use different visualization tools to summarize the data sets in this question.

1 # Load the data set


2 data ( " PlantGrowth " )
3 # Head of the data set
4 head ( PlantGrowth )

• Using the PlantGrowth data set, visualize and compare the weight of the plant in the three
separated group. Give labels to the title, x-axis, and y-axis on the graph. Write a paragraph
to summarize your findings. (Write your paragraph as comment lines () in your submission)

• Using the mtcars data set, plot the histogram for the column mpg with 10 breaks. Give
labels to the title, x-axis, and y-axis on the graph. Report the most observed mpg class
from the data set using print function.

• Using the USArrests data set, create a pairs plot to display the correlations between the
variables in the data set. Plot the scatter plot with Murder and Assault. Give labels to the
title, x-axis, and y-axis on the graph. Write a paragraph to summarize your results from
both plots.

1 # Load the data set


2 data ( " USArrests " )
3 # Head of the data set
4 head ( USArrests )

Question 3 - 25pts

Let’s find out what explains the housing prices in New York City via the data set in housing.csv.
Note: Check your working directory to make sure that you can download the data into the data
folder.

Assignment № 2 Page 2
• Create your own descriptive statistics and aggregation tables to summarize the data set
and find any meaningful results between different variables in the data set.

• Create multiple plots to demonstrates the correlations between different variables. Re-
member to label all axes and give title to each graph.

• Write a summary about your findings from this exercise.

Questin 4 - 25pts

In this problem, we will explore another car data set. Read in the data in the file data.csv as a
R data frame. Print out the dimension of the data set. You will see that it is of pretty decent size.
The str function is a useful function that tells you the type of data in each column. However,
don’t fully trust the result.

• Make a plot of the mean and standard deviation of MSRP (Manufacturer’s Suggested
Retail Price) and mpg of the cars by year from 1990 to 2017. What do you observe?

• Create a bar chart of the number of cars by year (from 1990 to 2017). What is your
observation?

• Create a pie chart of the car make in the data set. You can use this documentation:
https://r-graph-gallery.com/piechart-ggplot2.html.

• Compare Highway mpg and City mpg across car makes. Make a plot of your choice and
state your observation.

Assignment № 2 Page 3

You might also like