0% found this document useful (0 votes)
17 views3 pages

Homework 1: Cut, Breaks C, A, B, C, D, A A, B B, C C, D A, B, C, D

Uploaded by

duxingrui0628
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Homework 1: Cut, Breaks C, A, B, C, D, A A, B B, C C, D A, B, C, D

Uploaded by

duxingrui0628
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

HOMEWORK 1

(Due date: Sept 11 in class or Sept 14 online)

1. Download and install R and RStudio on your personal computer. (No need to take a
screenshot.)
https://posit.co/download/rstudio-desktop/

2. Download the dataset “Global YouTube Statistics 2023” from Kaggle.


https://www.kaggle.com/datasets/nelgiriyewithana/global-youtube-statistics-2023

3. Import the above dataset to R. Take a snapshot of what you did. What is your working
directory?

4. Find an example of a qualitative variable, and an example of a quantitative variable in this


dataset.

5. Produce a frequency table using R for the variable uploads. The number of classes is fixed
to be 4. Also, as a separate exercise, use codes to find is the maximum and the minimum number
of uploads. Please also attach your codes.

Note: in class we discussed how to cut the range of values of a variable into intervals of equal
lengths. In fact, you can also choose to cut the range into unequal intervals. Here is how you do
it:
upfrq ← table(cut(up, breaks = c(0, a, b, c, d)))
This code tells R to cut the range for the values of the vector up into four intervals [0, a), [a, b),
[b, c) and [c, d), and produce the corresponding frequency distribution. You should choose you
preferred values of a, b, c, d.

6. Compute the cumulative frequency distribution based on your answer in 5.

7. Compute the relative frequency distribution based on your answer in 5.

8. Produce a pie chart and a bar chart based on your answer in 5. Either save the pictures or
take screenshots.

9. Produce the stem-and-leaf presentation of the following data consisting of 20 integers:


32 14 47 31 18 41 24 16 37 46 36 27 29 12 10 31 38 25 42 27
1
2 HOMEWORK 1

(You do not have to use R, but if you explore a bit and find the appropriate R command, feel
free to take a snapshot of what you did.)
10. Classify the following as either cross-sectional data or times series data. (No justification
required)
(1) Number of supermarkets in each of the ten provinces and three territories in Canada as of
January 1st, 2023.
(2) Your monthly electricity bills for the last couple of years.
(3) Average life expectancy by countries.
(4) 200 responses to the same political poll, where the individuals are interviewed on either
Saturday or Sunday of the same week.

11. The following is a part of a dataset originally consisting of 6704 elements.The dataset included
six variables: age, gender, experience, job title, education level and salary.

Figure 1. A sample dataset

Classify the variables as either qualitative variables or quantitative variables. Are the quanti-
tative variables continuous or discrete?
HOMEWORK 1 3

Challenge question. (A correct solution contributes 0.1 percent worth of extra credits towards
you overall grade.)
We define a Fibonacci-type sequence to be a sequence (an )n≥0 of integers such that
a0 = a
a1 = b
an = an−1 + an−2 for n ≥ 2
Here, a, b are pre-chosen integers. We call them the initial values of the sequence.
Write a function in R that takes the initial values a, b as extra variables and computes the
terms of a Fibonacci-type sequence. In particular, your function should have three variables. This
function allows us to change the initial values without modifying the code each time.
Afterwards, use the smallest digit in your student ID as a, and the largest digit as b and compute
the first twelve terms of the corresponding sequence using your own function.

You might also like