Exercises For Days 1 and 2
Exercises For Days 1 and 2
Use the material taught in the first two days to accomplish the following in R. The following exercises
DO NOT REQUIRE any knowledge in addition to what has been taught. Some creative thinking might
be required for some of the questions. Some of these might feel a bit difficult at first, especially if you
do not have much programming experience (I would be happy if this turns out false). The goal is NOT
to discourage you, but to challenge you to develop your skills, and convince you that programming is
not a mechanical activity. Feel free to take your time, as this ability only builds with practice and a
sound theoretical understanding.
1. Save your name as a variable. Then make a new variable, that says “My name is x and the
number of letters in it is n”. Replace x and n by your name and its length, respectively.
2. Make these variables: x = sample(20), y = sample(35), z = sample(10), a = sample(3:7), b =
sample(c(2, 5, 6, 7, 10)), m = c(‘apple’, ‘banana’, ‘cat’, ‘dog’), n = sample(m). Re-run the code
thrice: notice the values of x, y, z, a, b, m and n each time. What is the function of sample()?
3. Make a vector of all the numbers between 1 and 1000, inclusive, of the multiples of 3, 7 or 13.
The output should be in ascending order. The format would be: x = c(3, 6, 7, 9, 12, 13, 14, 15,
18, 21, ….). Now, make these: y=sample(length(x)), z=y[x %% 3 > 0]. What are these?
4. Make a vector of names, n, a vector of heights (in metres), h, with the same length. Make
another vector with the same length, of weights (in kgs), w. You can use the sample function, if
ℎ𝑒𝑖𝑔ℎ𝑡
you want, for this. Now, make a vector BMI of BMI scores (BMI is 𝑤𝑒𝑖𝑔ℎ𝑡 2). Finally, make a
vector, v, of all the names that have BMIs greater than 20. (Hint: try to use the lesson from Q3).
5. Import any Excel file into R. Try to guess the type of the variable in each column. If the column
has numbers, find out the maximum, minimum, mean and median values.
6. Make a data frame with 99 entries, with the following columns: section (A or B) and exam
score (between 0 and 100). You can put whatever sections you want. You can use the sample
function for the scores, if you want. Find out the highest score, and which section the topper
belongs to. Also find out the mean scores of the two sections separately.
7. Make a vector of Booleans, x. Use the function y = as.numeric(x). What do you observe? What
does TRUE become? What does FALSE become?
8. In the data frame from Q6, find out which rows have scores that are multiples of 5, and save
the row numbers as a vector, v. For example, if your score column is c(1, 6, 5, 3, 10, 4, ….)
then v should be c(3, 5, ….) as the third, fifth, …. rows have multiples of 5 as scores.
9. Make a vector of 200 numbers, v, using the sample function. Make another vector, w, such that
if the 𝑖 𝑡ℎ element of v is a multiple of 3, then the 𝑖 𝑡ℎ element of w should be 10, and it should
be 5 otherwise. For example, if v = c(1, 4, 3, 6, 2, 7, 9, ….) then we should have w = c(5, 5, 10,
10, 5, 5, 10, ….). (Hint: use what you learned from Q7).
10. Make any data frame you want, with 1000 rows and 5 columns. Make a data frame that covers
the rows 501 to 700 of the previous data frame, and columns 2 to 4. Make another data frame
that only sas the odd numbered rows of the first data frame.