STATS 10 Assignment 1

The document outlines the requirements for submitting an assignment for a statistics course, including formatting instructions and submission guidelines. It consists of two parts: Part I focuses on R programming tasks related to data analysis, while Part II includes questions about statistical interpretation and analysis of given datasets. Students must provide clear and legible answers, including outputs and explanations for their calculations and analyses.

Uploaded by

vc431365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views7 pages

STATS 10 Assignment 1

Uploaded by

vc431365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

STATS 10 Assignment 1

Please submit both parts of the assignment in one single PDF file. You can use any PDF editor
software to merge the two parts into one file. Please make sure that the questions are in the correct
order and clearly labeled, and that the answers are legible and easy to read.
To submit your assignment, upload the PDF file under the designated assignment page on the
course website before the deadline specified. Email or hard copy submissions are not accepted.

Part I
Include both the R commands and their corresponding outputs, results, or answers for all
exercise questions in Part I.

1. Vectors:
a. Create a vector named heights that contains the heights, in inches, of yourself and two
students near you. Print the contents of this vector.
b. Create a vector named names that contains the names of these people. Print the contents
of this vector.
c. Try typing cbind(heights, names). What did this command do? What class is this new
object?
Hint: Try the class() function.

2. Downloading data:
a. Download the data set births.csv from the course site and upload it into RStudio. Name
the data frame NCbirths.
b. Demonstrate that you have been successful by typing head(NCbirths) and copying and
pasting the output into your word processing document.

3. Package loading
a. Install the maps package. Verify its installation by typing find.package("maps") and
include the output in your answer.
b. Type library(maps) to load up the package. Type map("state") and include the plot output
in your answer.
Use the births data set for questions 4-11
4. Perform vector operations
a. Extract the weight variable as a vector from the data frame
b. What units do you think the weights are in?
c. Create a new vector named weights_in_pounds which are the weights of the babies in
pounds. You can look up conversion factors on the internet.
d. Demonstrate your success by typing weights_in_pounds[1:20] and including the output in
your word processing document.

5. What is the mean weight of the babies in pounds?

a. What percentage of the mothers in the sample smoke? Hint: use the tally function with
the format argument. Use the help screen for guidance.
b. According to the Centers for Disease Control, approximately 21% of adult Americans are
smokers. How far off is the percentage you found in 2 from the CDC’s report?

6. Produce three different histograms of the weights in pounds. Use 3 bins, 20 bins, and 100
bins. Which histogram seems to give the best visualization, and why?

7. We can use the syntax boxplot(vector1, vector2) to make a side by side box plot. Create a
side by side boxplot of the mother’s ages and the father’s ages. Which gender tends to be
older?

8. Try typing histogram(~ weight | Habit, data = NCbirths, layout = c(1, 2)). Describe what this
code does. Based on the graph, do you see any major differences between baby weights from
smoking moms vs. non-smoking moms?

9. Produce a dot plot of the weights in pounds.

10. Consider the other categorical variables in this data. Of those that record the health of the
baby, which do you think will be associated with the mother’s smoking and why? Make a
two-way Summary Table to check your hypothesis. Do you have evidence that this variable
associated with smoking? Why?
Part II
You may choose to type or write your answers electronically or scan your handwritten
solutions. Please ensure that you show all steps and explanations to receive full credit,
unless otherwise instructed.

1. A data set on Shark Attacks Worldwide posted on StatCrunch records data on all shark
attacks in recorded history including attacks before 1800. The data set can be viewed here:
https://www.statcrunch.com/app/index.html?dataid=2188687

a. How many variables are contained in the data?

i. 15

b. Which of the following questions could not be answered using this data set? Briefly
explain.
i. In what month do most shark attacks occur?
ii. Are shark attacks more likely to occur in warm temperature or cooler
temperatures?
1 Cannot be answered because there is no variable about temperature, the rest
can be solved for from the information, but this one cannot be.
iii. Attacks by which species of shark are more likely to result in a fatality?
iv. What country has the most shark attacks per year?

c. A researcher wants to understand the age of the people in the data set and proposed some
questions of interest: Are the reported cases are mostly younger people or older people?
How is the age distributed? How would you help the research answer these questions?
What statistical tools (e.g., graphs, measures) will you use? (You only need to describe
your approach)
i. First, I would advise the researcher to create a histogram with all of the data. The
bin width of the histogram can range from ages such as 15 to 20, 20 to 25, 25 to
30, up to.
ii. From the histogram itself, the researcher can then determine the reported cases
and their corresponding age by looking at the overall shape of the histogram
comma taking note of the skewness and the modality.
2. The scores of a quiz are displayed in the graph below.

a. Describe the shape of distribution

i. The distribution of the graph is unimodal and skewed left.

b. Would the mean score be greater than, less than, or about the same as the median score?
Explain.
i. In a graph that is skewed left, the mean score would be less than the median
score.

c. What measures would you use to report the center and spread. Explain.
i. To measure the center, the median would be a good measure of a typical value for
skewed distributions. And to measure the spread, the interquartile range would be
best as it would use this median, And provide more information on the range in
the middle 50% of the scores.

3. The distribution of test scores in a class is unimodal and symmetric with a mean of 80 pts
and a standard deviation of 7pts. Based on the information, Adam estimated that his score is
higher than approximately 97.5% of the students in class. What score did Adam receive?
Explain.
i. We can assume that Adams’s score is around 94 points. At the second standard
deviation above the mean, a score of 94 points is already higher than 95%
students in the class. So, since Adam is higher than 97.5% his score must fall
within the highest value of the second standard deviation.

4. Assume that both men and women’s heights have symmetric and unimodal distributions.
Women’s distribution has a mean of 64 inches and a standard deviation of 2.5 inches. Men’s
distribution has a mean of 69 inches and a standard deviation of 3 inches. a. What women’s
height corresponds with a z-score of -1.50?
i. With a Z score of - 1.50, the height corresponding would be around 60.25 inches
for women.
b. Professional basketball player Evelyn Akhator is 75 inches tall and plays in the
WNBA (women’s league). Professional basketball player Draymond Green is 79 inches
tall and plays in the NBA (men’s league). Compared to their own peers, who is taller?
i. Evelyn Akhator has a zscore of around 4.4 inches, and Draymond has a z
score of 3.3. So, Evelyn Akhator is taller than her peers in the WNBA.

5. The top ten movies based on Marvel comic book characters for the U.S. box office as of fall
2017 are shown in the following table, with domestic gross rounded to the nearest hundred
million. (Source: ultimatemovieranking.com)

a. Report the five-number summary of the domestic gross income.

Min: 363, Max: 677, Median: 428.5, Q1: 389, Q3: 520
b. Interpret the five-number summary in context, i.e., what information can you obtain about
the distribution of the domestic gross income?
i. The five number summary suggests that movies have most domestic gross
incomes within the range of 389million to 520 million, which are quartile 1 and
quartile three. And overall, quartile two shows the average gross income of
around 428.5 million. Outside of the interquartile range, there are few outliers
period

6. The data set below show the number of central public libraries in 32 states.
The five number summary is given as:
Minimum Q1 Median Q3 Maximum
1 62 91 218 756

Sketch a boxplot using the five-number summary above and the data below.
Mark the values of the quartiles, the lower whisker, the upper whisker, and any potential
outliers in the boxplot. Explain how you determined the length of the whiskers. (The
scale of the plot does not need to be accurate)
Q1 – 1.5*1qr = lower bound (first whisker), Q3+ 1.5*iqr = upper bound (other whisker)
I found the lower bound by subtracting 1.5 * IQR from Q1, and I found the lower bound
by adding 1.5 * IQR from the Q3. For the lower bound I got the value -172, however
the lower bound is at 1, and for the upper bound I got the value 452 which is the length
of the other whisker.

Code Optimization
No ratings yet
Code Optimization
25 pages
How To Do Settings For Networking Along With CPPLUS DDNS
No ratings yet
How To Do Settings For Networking Along With CPPLUS DDNS
5 pages
1st Midterm - Spring 2019 OSL
100% (1)
1st Midterm - Spring 2019 OSL
10 pages
SyncServer S666 User Guide
No ratings yet
SyncServer S666 User Guide
291 pages
A Brief Introduction To Circuit Analysis 1st Edition J. David Irwin Download PDF
100% (4)
A Brief Introduction To Circuit Analysis 1st Edition J. David Irwin Download PDF
84 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
OSTA WS2024 Tutorial Session 01
No ratings yet
OSTA WS2024 Tutorial Session 01
19 pages
MICROSOFT
No ratings yet
MICROSOFT
7 pages
All Auto Product Catalogue-2024
No ratings yet
All Auto Product Catalogue-2024
21 pages
Interpreting Charts & Graphs 2
No ratings yet
Interpreting Charts & Graphs 2
11 pages
Probability & Statistics Exercises
No ratings yet
Probability & Statistics Exercises
19 pages
Terjemahan Bab1 TheHundredPageLanguageModels AndySetiawan
No ratings yet
Terjemahan Bab1 TheHundredPageLanguageModels AndySetiawan
34 pages
Mock Exam Midterm Statistics I
No ratings yet
Mock Exam Midterm Statistics I
24 pages
20185644444444444
100% (1)
20185644444444444
59 pages
PracticeProblems Exam1
No ratings yet
PracticeProblems Exam1
8 pages
Clicker Question Speck
No ratings yet
Clicker Question Speck
14 pages
Mostly Harmless Statistics
No ratings yet
Mostly Harmless Statistics
506 pages
IS VII & VIII 2018 SYLLABUS-compressed
No ratings yet
IS VII & VIII 2018 SYLLABUS-compressed
78 pages
Tetris Game
No ratings yet
Tetris Game
5 pages
Test 1A AP Statistics Name:: Part 1: Multiple Choice
No ratings yet
Test 1A AP Statistics Name:: Part 1: Multiple Choice
6 pages
Biomath
No ratings yet
Biomath
43 pages
STATS 10 Assignment 1 PT 2
No ratings yet
STATS 10 Assignment 1 PT 2
4 pages
Workbook - Data Distributions
No ratings yet
Workbook - Data Distributions
24 pages
Lesson 1.2 - Representing A Quantitative Variable
No ratings yet
Lesson 1.2 - Representing A Quantitative Variable
6 pages
81742301
No ratings yet
81742301
2 pages
Cpts 440 / 540 Artificial Intelligence: Search
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Search
182 pages
Q1 Stats Review
No ratings yet
Q1 Stats Review
19 pages
Topic 2 Practice
No ratings yet
Topic 2 Practice
5 pages
Non-Creamy Layer Certificate: Government of Kerala
No ratings yet
Non-Creamy Layer Certificate: Government of Kerala
1 page
STATS 1023A Assignment 3
No ratings yet
STATS 1023A Assignment 3
12 pages
Tutorial 4
No ratings yet
Tutorial 4
7 pages
Xiaomi MiA3 Laurel Sprout 2024-01-08 00-38-47
No ratings yet
Xiaomi MiA3 Laurel Sprout 2024-01-08 00-38-47
2 pages
HWK1 324 SS
No ratings yet
HWK1 324 SS
7 pages
Chapter 3 Univariate Data Worksheet Package Student Spaces
No ratings yet
Chapter 3 Univariate Data Worksheet Package Student Spaces
24 pages
UFS SW Module 1 Review KEY
No ratings yet
UFS SW Module 1 Review KEY
5 pages
Service Management - Zoho Desk Scope
No ratings yet
Service Management - Zoho Desk Scope
2 pages
SNR Sons College (Autonomous) Department of Computer Applications (Ug) BCA (Academic Year 2015 - 2016 and Onwards) Scheme of Examination
No ratings yet
SNR Sons College (Autonomous) Department of Computer Applications (Ug) BCA (Academic Year 2015 - 2016 and Onwards) Scheme of Examination
72 pages
Eep - Stats - RPH Reviewer
No ratings yet
Eep - Stats - RPH Reviewer
15 pages
Final Review Packet
No ratings yet
Final Review Packet
21 pages
Lab 2
No ratings yet
Lab 2
22 pages
Lead Magnet For LiveGood
0% (1)
Lead Magnet For LiveGood
7 pages
Maths Exam
No ratings yet
Maths Exam
11 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
James Bonner
No ratings yet
James Bonner
3 pages
F70aet NEW
No ratings yet
F70aet NEW
329 pages
Patrick Siarry (Editor) - Metaheuristics-Springer (2016) PDF
No ratings yet
Patrick Siarry (Editor) - Metaheuristics-Springer (2016) PDF
497 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
CH 1 Notes
No ratings yet
CH 1 Notes
7 pages
Test 1 Sample With Answers
No ratings yet
Test 1 Sample With Answers
6 pages
3i - Portable Device For On-Line Insulation Monitoring in Switchgear Cells and HV/MV Cables by Partial Discharges
No ratings yet
3i - Portable Device For On-Line Insulation Monitoring in Switchgear Cells and HV/MV Cables by Partial Discharges
3 pages
) LKK
No ratings yet
) LKK
3 pages
Unit 7 Review Algebra 2
No ratings yet
Unit 7 Review Algebra 2
12 pages
Statistics Class Work # 2-1
No ratings yet
Statistics Class Work # 2-1
8 pages
22 PLC15 Bset 1
No ratings yet
22 PLC15 Bset 1
2 pages
C
No ratings yet
C
21 pages
Lecture 1 Intro
No ratings yet
Lecture 1 Intro
61 pages
Chapter 4 Exercises 4546 and 50
No ratings yet
Chapter 4 Exercises 4546 and 50
6 pages
MCQ Statistics
No ratings yet
MCQ Statistics
8 pages
The Role of AI in Financial Services A Bibliometric Analysis
No ratings yet
The Role of AI in Financial Services A Bibliometric Analysis
14 pages
Statistics2024 - Final Sds
No ratings yet
Statistics2024 - Final Sds
15 pages
PQ1
No ratings yet
PQ1
36 pages
Exercise Sheet 1
No ratings yet
Exercise Sheet 1
6 pages
June 2023 (v1) QP - Paper 1 CAIE Computer Science IGCSE
No ratings yet
June 2023 (v1) QP - Paper 1 CAIE Computer Science IGCSE
12 pages
MDM4U1-31 - Test #1 - Statistics of One Variable
No ratings yet
MDM4U1-31 - Test #1 - Statistics of One Variable
5 pages
SAA2 - Geeni App For PC Download (Windows 11,10,8,7)
No ratings yet
SAA2 - Geeni App For PC Download (Windows 11,10,8,7)
5 pages
01 Sample Problems For Chapter 1 - ANSWER KEY
No ratings yet
01 Sample Problems For Chapter 1 - ANSWER KEY
13 pages
MA121-1 3 4-hw
No ratings yet
MA121-1 3 4-hw
19 pages
Maths Exam
No ratings yet
Maths Exam
11 pages
Review Question Stat
No ratings yet
Review Question Stat
19 pages
MDM4U1-31 Test #1 - Statistics of One Variable Mar. 24, 2025 Name - 1
No ratings yet
MDM4U1-31 Test #1 - Statistics of One Variable Mar. 24, 2025 Name - 1
5 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
AP Stats Practice ch1
No ratings yet
AP Stats Practice ch1
6 pages
Unit 1 Review Packet
No ratings yet
Unit 1 Review Packet
10 pages
270 Book Solutions
No ratings yet
270 Book Solutions
78 pages
Ds Imp Qs
No ratings yet
Ds Imp Qs
4 pages
Module 1 2
No ratings yet
Module 1 2
22 pages
Nobotech DO Meter RS485 MODBUS
No ratings yet
Nobotech DO Meter RS485 MODBUS
6 pages
MA232 Final Exam Fall2020 Online
No ratings yet
MA232 Final Exam Fall2020 Online
9 pages
VCTest 1 BF09 Ans
No ratings yet
VCTest 1 BF09 Ans
9 pages
Practice For Math Test
No ratings yet
Practice For Math Test
7 pages
Hacking PSP
No ratings yet
Hacking PSP
6 pages
Cs Woodside Petrel VBM
No ratings yet
Cs Woodside Petrel VBM
2 pages
Mdm4U Final Exam Review: This Review Is A Supplement Only. It Is To Be Used As A Guide Along With Other Review
No ratings yet
Mdm4U Final Exam Review: This Review Is A Supplement Only. It Is To Be Used As A Guide Along With Other Review
6 pages
Math1530finalreview Nospaces
No ratings yet
Math1530finalreview Nospaces
10 pages
Unified Council - Options
No ratings yet
Unified Council - Options
4 pages
Exam 1
No ratings yet
Exam 1
5 pages
Untitled
No ratings yet
Untitled
3 pages
Kaylaschoolwork 3
No ratings yet
Kaylaschoolwork 3
5 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet