Week 1-B. Data in R
Week 1-B. Data in R
Week 1-B. Data in R
Data in R
Matrix
It is often useful to work in 2-dimensions when working with data. In R, a matrix is a 2-dimensional object consisting of
values of a single data type.
matrix(1:9, 3)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "A" "C" "E" "G" "I" "K" "M" "O" "Q" "S" "U" "W" "Y"
## [2,] "B" "D" "F" "H" "J" "L" "N" "P" "R" "T" "V" "X" "Z"
You can access the individual elements using [ like vectors. If you use a single value or single vector, it treats the matrix
as a vector concatenating the columns in order.
## [1] 5
## [1] 2 11
More intuitively, you can use 2-dimensional coordinates to extract the elements separated by a comma. The first value
corresponds to the row and the second value to the column. The index starts at 1 like a single value index.
## [1] 8
## [1] 7 9
mat_ex[c(2, 3), c(1, 4)] # second and third row, first and fourth column
## [,1] [,2]
## [1,] 2 11
## [2,] 3 12
If you leave one dimension blank, you can extract across the whole rows or columns.
## [1] 1 4 7 10
## [1] 10 11 12
https://rconnect.utstat.utoronto.ca/content/a57e9d9b-336e-4f49-9347-afee2f53a192/#section-data-in-r 1/5
25/01/2024, 12:53 Week 1: Review and Introduction to Data Analysis
There are a few convenient functions provided for working with multidimensional objects. For example, rowMeans()
computes mean values for each row. See documentation for details and other similar functions.
1 ?rowMeans
2
3
_D_e_s_c_r_i_p_t_i_o_n:
Form row and column sums and means for numeric arrays (or data
frames).
_U_s_a_g_e:
_A_r_g_u_m_e_n_t_s:
Exercise 4
mat is a 6 by 4 numeric matrix in the following code chunk. Use R code to
1. find the row numbers where the mean values of the row is smaller than 15
2. find the column numbers where the column sums are larger than 75
3. check whether all values in the second row are larger than values in the fourth row AND value 5 is in the last
column
1 mat
2
3 (1:6)[rowMeans(mat) < 15] # fill in the bracket with a logical statement
4 (1:4)[colSums(mat) > 75] # fill in the bracket with a logical statement
5
6 all(mat[2, ] > mat[4, ]) && (5 %in% mat[ ,4])
7
[1] 2 3 4 5 6
[1] 2 3
[1] FALSE
Data frames
Vectors and matrices can only save values of a single data type. We can create tables consisting of columns in different
data types via data.frame() in R.
https://rconnect.utstat.utoronto.ca/content/a57e9d9b-336e-4f49-9347-afee2f53a192/#section-data-in-r 2/5
25/01/2024, 12:53 Week 1: Review and Introduction to Data Analysis
For example, the data frame below has a column of numbers and a column of characters.
set.seed(238)
numbers alphabets l y w
<int> <chr> <lgl> <int> <int>
1 a TRUE 106 14
2 b TRUE 100 98
3 c FALSE 89 1
5 e TRUE 92 282
5 rows
Many functions in R are designed to work with data frames. For example, ggplot() 1 expects a data frame as the first
input argument. You can then define the mappings between columns and aesthetics (e.g., x axis).
library(ggplot2)
ggplot(some_table, # provide the table
aes(x = numbers)) + # map the column `numbers` to x-axis
theme_minimal() + # use minimal theme
geom_point(aes(y = ifelse(l, y, w), # draw points using `y` or `w` based on the value of `l`
color = l)) # change color based on the value of `l`
Specifying aesthetics such as color , size , linetype , shape , etc. inside aes() changes the specified aesthetics
based on the mapped values.
Most multivariate data come in a table format and R makes it easy to work with them.
You can access elements of a data frame using 2D indexes in the same way you use 2D indexes with matrices.
numbers alphabets l y w
<int> <chr> <lgl> <int> <int>
1 1 a TRUE 106 14
1 row
https://rconnect.utstat.utoronto.ca/content/a57e9d9b-336e-4f49-9347-afee2f53a192/#section-data-in-r 3/5
25/01/2024, 12:53 Week 1: Review and Introduction to Data Analysis
numbers alphabets l y w
<int> <chr> <lgl> <int> <int>
1 1 a TRUE 106 14
2 2 b TRUE 100 98
3 3 c FALSE 89 1
3 rows
some_table[1, 3:4] # extracting the element in the first row, third and fourth columns
l y
<lgl> <int>
1 TRUE 106
1 row
When you extract multiple columns, the result is still a data frame.
On the other hand, using a single positional index behaves differently from a matrix and returns a column.
some_table[3]
l
<lgl>
TRUE
TRUE
FALSE
FALSE
TRUE
5 rows
You can also extract a column using $ followed by the column name.
some_table$l
Using 2D index to extract a column (e.g., some_table[ , 3] ) or $ extractor returns a vector whereas using 1D index
(e.g., some_table[3] ) returns a single column data frame. It doesn’t affect the codes in most cases but can be a good
place to check if you face an unexpected error.
Exercise 5
In the code chunk below, some_table is available for you. i. Extract values of y from rows where l is TRUE . ii.
Extract values of w from rows where l is FALSE . iii. Add the five pairs of extracted values.
1 some_table
2 sum(some_table$y[some_table$l]) + sum(some_table$w[!some_table$l])
3
numbers alphabets l y w
<int> <chr> <lgl> <int> <int>
1 a TRUE 106 14
2 b TRUE 100 98
https://rconnect.utstat.utoronto.ca/content/a57e9d9b-336e-4f49-9347-afee2f53a192/#section-data-in-r 4/5
25/01/2024, 12:53 Week 1: Review and Introduction to Data Analysis
numbers alphabets l y w
<int> <chr> <lgl> <int> <int>
3 c FALSE 89 1
5 e TRUE 92 282
5 rows
[1] 440
https://rconnect.utstat.utoronto.ca/content/a57e9d9b-336e-4f49-9347-afee2f53a192/#section-data-in-r 5/5