Unit 2 Matrices
Unit 2 Matrices
In R we represent the matrix with the matrix function. The data elements
must be of the same basic type.
One way to create a matrix is by using the matrix() function:
Syntax: matrix(vector elements, nrow=, ncol=,byrow=)
Example: A<- matrix(c(2,4,3,1,5,7),nrow=2,ncol=3)
A # print the matrix
[,1] [,2] [,3]
[1,] 2 4 3
[2,] 1 5 7
• The internal storage of a matrix is in column-major order, meaning that first
all of column 1 is stored, then all of column 2, and so on.
• An element at the mth row, nth column of A can be accessed by the expression
Syntax: A[m, n].
Example: A[2, 3]
Output: 7
• The entire mth row A can be extracted as A[m, ].
Example: A[2, ] # the 2nd row
[1] 1 5 7
• The entire nth column A can be extracted as A[ ,n].
Example: A[ ,3] # the 3rd column
[1] 3 7
• We can also extract more than one rows or columns at a time.
Example: A[ ,c(1,3)] # the 1st and 3rd columns
[,1] [,2]
[1,] 2 3
[2,] 1 7
• We assign names to the rows and columns of the matrix, than we can access
the elements by names.
Example: dimnames(A) = list(
c("row1", "row2"), # row names
c("col1", "col2", "col3")) # column names
Output: A # print A
col1 col2 col3
row1 2 4 3
row2 1 5 7
Example: A["row2", "col3"]
Output:
Matrix Construction
Construct a matrix
Transpose
Combining Matrices
Deconstruction
Construct a Matrix
There are various ways to construct a matrix. When we construct a matrix
directly with data elements, the matrix content is filled along the column
orientation by default.
• When you call this function, it waits for the user to click a point within a
graph and returns the exact coordinates of that point.
• Roosevelt’s portion of the picture is in rows 84 through 163 and columns 135
through 177.
• Row numbers in pixmap objects increase from the top of the picture to the
bottom. We set all the pixels in that range to 1.0.
mtrush2 <- mtrush1
mtrush2@grey[84:163,135:177] <- 1
plot(mtrush2)
We do this by adding random noise to the picture.
• we generate random noise and then take a weighted average of the target
pixels and the noise.
• The parameter q controls the weight of the noise, with larger q values
producing more blurring.
• The random noise itself is a sample from U(0,1), the uniform distribution on
the interval (0,1).
newimg@grey <- (1-q) * img@grey + q * randomnoise
mtrush3 <- blurpart(mtrush1,84:163,135:177,0.65)
plot(mtrush3)
Filtering on Matrices
Filtering can be done with matrices, just as with vectors.
Example: x<- matrix(c(2,4,3,1,5,7),nrow=3,ncol=2)
x
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
Input: x[x[,2] >= 3,]
Output: [,1] [,2]
[1,] 4 5
[2,] 3 7
Input: j <- x[,2] >= 3
j
Output: [1] FALSE TRUE TRUE
• We look at the vector x[,2], which is the second column of x, and determine
which of its elements are greater than or equal to 3. The result, assigned to j,
is a Boolean vector.
• Now use j in x
Input: x[j,]
Output: [,1] [,2]
[1,] 4 5
[2,] 3 7
• For performance purposes, it’s worth noting again that the computation of j
here is a completely vectorized operation, since all of the following are true:
The object x[,2] is a vector.
The operator >= compares two vectors.
The number 3 was recycled to a vector of 3s.
• The filtering criterion can be based on a variable separate from the one to
which the filtering will be applied.
• You can also apply vector operations to matrix.
Input : which(x > 2)
Output: [1] 2 3 5 6
• From a vector-indexing point of view, elements 2, 3, 5, and 6 of x are larger
than 2. This means element 5 is the element in row 2, column 2 of x, which
we see has the value 5, which is indeed greater than 2.
Applying Functions to Matrix Rows and
Columns
• One of the most famous and most used features of R is the apply() family
• apply() family consists of lapply(), sapply(), and tapply().
• apply(), which instructs R to call a user-specified function on each of the
rows or each of the columns of a matrix.
Using the apply() Function:
• This is the general form of apply for matrices:
Syntax: apply(m, dimcode, f, fargs)
where the arguments are as follows:
m is the matrix.
dimcode is the dimension, equal to 1 if the function applies to rows or 2
for columns.
f is the function to be applied.
fargs is an optional set of arguments to be supplied to f.
Example: z<- matrix(c(1,4,2,5,3,6), nrow=3,ncol=2)
z
Output: [,1] [,2]
[1,] 1 5
[2,] 4 3
[3,] 2 6
Example1: Input: apply(z,2,mean)
Output: [1] 2.333333 4.666667
1+4+2/3=2.33 5+3+6/3=4.66
Example2: Input: apply(z,1,mean)
Output: [1] 3.0 3.5 4.0
1+5/2= 3.0 4+3/2=3.5 6+2/2=4.0
Example3: Input: apply(z, 2, sort)
Output: [,1] [,2]
[1,] 1 3
[2,] 2 5
[3,] 4 6
Example4: Input: apply(z, 1, sort)
Output: [,1] [,2] [,3]
[1,] 1 3 2
[2,] 5 4 6
Example: z<- matrix(c(1,4,2,5,3,6), nrow=3,ncol=2)
z
[,1] [,2]
[1,] 1 5
[2,] 4 3
[3,] 2 6
f <- function(x) x/c(2,8)
y <- apply(z,1,f)
y
[,1] [,2] [,3]
[1,] 0.500 2.000 1.00
[2,] 0.625 0.375 0.75
• You can use the matrix transpose function t() to change it if necessary, as
follows:
Input: t(apply(z,1,f))
[,1] [,2]
[1,] 0.5 0.625
[2,] 2.0 0.375
[3,] 1.0 0.750
• suppose we have a matrix of 1s and 0s and want to create a vector as the
corresponding element of the vector will be either 1 or 0, depending on
whether the majority of the first d elements in that row is 1 or 0.
• Here, d will be a parameter that we may wish to vary.
copymaj<- function(rw,d) {
maj <- sum(rw[1:d]) / d
return(if(maj > 0.5) 1 else 0)
}
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 1 1 1 1 0
[3,] 1 0 0 1 1
[4,] 0 1 1 1 0
Input1: apply(x,1,copymaj,3)
Output: [1] 1 1 0 1 # 1 0 1 1 0
Here 1+0+1/3 = 2/3 = 0.66 which is greater than 0.5 else is prints 1
Input2: apply(x,1,copymaj,2)
Output:[1] 0 1 0 0 # 1 0 1 1 0
Here 1+0/2 = 1/2 = 0.5 is greater than 0.5 else is prints 0
• What happened in the case of row 1 of x.
• row 1 of x consisted of (1,0,1,1,0), the first d elements of which were (1,0,1).
• A majority of those three elements were 1s, so copymaj() returned a 1, and
thus the first element of the output of apply() was a 1.
• R moves closer and closer to parallel processing, functions like apply() will
become more and more important.
• The clusterApply() function in the snow package gives R some parallel-
processing capability by distributing the submatrix data to various network
nodes, with each node basically applying the given function on its submatrix.
Extended Example: Finding Outliers
• In statistics, outliers are data points that differ greatly from most of the other
observations.
• As such, they are treated either as suspect (erroneous) or unrepresentative.
Many methods have been devised to identify outliers.
• We have retail sales data in a matrix rs. Each row of data is for a different
store, and observations within a row are daily sales figures. Here’s the code:
Input: findols(rs)
• Since this function will be applied to each row of our sales matrix, Our
function findol() does that, in lines 4 and 5. (Note that we’ve defined one
function within another here).
• In the expression xrow-mdn, we are subtracting a number that is a one-
element vector from a vector that generally will have a length greater than 1.
• Thus, recycling is used to extend mdn to conform with xrow before the
subtraction.
• In line 5, we use the R function which.max(). Instead of finding the
maximum value in a vector, which the max() function does, which.max()
tells us where that maximum value occurs—i.e, the index where it occurs.
This is just what we need.
• [10]……. 98
• [67]………98
Adding and Deleting Matrix Rows
and Columns
• Matrices are of fixed length and dimensions, so we cannot add or delete rows
or columns.
• However, matrices can be reassigned, and thus we can achieve the same
effect as if we had directly done additions or deletions.
Changing the Size of a Matrix
• Recall how we reassign vectors to change their size:
x
[1] 12 5 13 16 8 # length is 5
Input: x <- c(x,20) # append 20
x
Output: [1] 12 5 13 16 8 20 # length is 6
Input: x <- c(x[1:3],20,x[4:6]) # insert 20
x
Output: [1] 12 5 13 20 16 8 20 # length is 7
Input: x <- x[-2:-4] # delete elements 2 through 4
x
Output: [1] 12 16 8 20 # length is 4
• We didn’t literally change the length of x but instead created a new vector
from x and then assigned x to that new vector.
• assignment x[2] <- 12 is actually a reassignment.
• Analogous operations can be used to change the size of a matrix. These are
the rbind() (row bind) and cbind() (column bind) functions which help you
to add rows or columns to a matrix.
Input: one <- c(1,1,1,1)
one
Output: [1] 1 1 1 1
z
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 1 0
[3,] 3 0 1
[4,] 4 0 0
Input: cbind(one, z)
[1,]1 1 1 1
[2,]1 2 1 0
[3,]1 3 0 1
[4,]1 4 0 0
Input: q <- cbind(c(1,2),c(3,4))
q
Output: [,1] [,2]
[1,] 1 3
[2,] 2 4
Input: m <- matrix(1:6,nrow=3)
m
Output: [,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Input: m <- m[c(1,3),]
m
Output: [,1] [,2]
[1,] 1 4
[2,] 3 6
Basic Commands/ functions in R
Example: Input: z <- matrix(1:8,nrow=4)
z
Output: [,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
1. length()
Input: length(z)
Output: [1] 8
2. class() 5. nrow()
Input: class(z) Input: nrow(z)
Output: [1] "matrix" Output: [1] 4
3. attributes()
6. ncol(z)
Input: ncol(z)
Input: attributes(z)
Output: [1] 2
Output: $dim
[1] 4 2
4. dim()
Input: dim(z)
Output: [1] 4 2
Avoiding Unintended Dimension Reduction
• In statistics, dimension reduction is a good thing to perform on the data, with
many statistical procedures aimed to do it well.
• Let’s say if we are working with a dataset that consists of 10 variables and can
reduce that number to 3.
• The name dimension reduction that we may sometimes wish to avoid. Say we
have a four-row matrix and extract a row from it:
Example: z
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
Input: r <- z[2,]
r
Output: [1] 2 6
• R has displayed r as a vector format, not a matrix format. r is a vector of
length 2, rather than a 1-by-2 matrix.
Input: attributes(z)
Output: $dim
[1] 4 2
Input: attributes(r)
Output: NULL
Input: str(z)
Output: int [1:4, 1:2] 1 2 3 4 5 6 7 8
Input: str(r)
Output: int [1:2] 2 6
• Here, z has row and column numbers, while r does not.
• str() tells us that z has indices ranging in 1:4 and 1:2, for rows and columns,
while r’s indices range in 1:2.
• By these basic functions, we confirm that r is a vector, not a matrix.
• Suppose that your code extracts a submatrix from a given matrix and then
does some matrix operations on the submatrix.
• If the submatrix has only one row, R will make it a vector, which could ruin
your computation.
R has a way to suppress this dimension reduction: the drop argument.
Here’s an example, using the matrix z from above:
Example: r <- z[2, drop=TRUE]
r
Output: [,1] [,2]
[1,] 2 6
Input: dim(r)
Output: [1] 1 2
Now r is a 1-by-2 matrix, not a two-element vector.
• If you have a vector that you wish to be treated as a matrix, you
can use the as.matrix() function, as follows:
Example: u
[1] 1 2 3
Input: v <- as.matrix(u)
attributes(u)
Output: NULL
Input: attributes(v)
Output: $dim
[1] 3 1
Naming Matrix Rows and Columns
• The natural way to refer to rows and columns in a matrix is via the row and
column names. However, you can also give names to these entities. Here’s an
example:
Example: z
[,1] [,2]
[1,] 1 3
[2,] 2 4
Syntax : colnames() Syntax: rownames()
Input: colnames(z) Input: rownames(z)
NULL
Input: colnames(z) <- c("a","b") Input: rownames(z) <- c("a","b")
z z
Output: a b Output: [,1] [,2]
[a,] 1 3
[1,] 1 3
[b,] 2 4
[2,] 2 4 Input: rownames(z)
Input: colnames(z) Output: [1] "a" "b"
Output: [1] "a" "b" Input: z["a",]
Input: z[,"a"] Output: [1] 1 3
Output: [1] 1 2
Higher-Dimensional Arrays
• In statistics, a matrix in R has rows corresponding to observations, and
columns corresponding to variables. The matrix is then a two-dimensional
data structure.
• But suppose we also have data taken at different times, one data point per
person per variable per time.
• Time then becomes the third dimension, in addition to rows and columns. In
R, such data sets are called arrays.
• Consider students and test scores. Each test consists of two parts, so we
record two scores for a student for each test.
• Now assume we have only three students.
Input: firsttest Input: secondtest
Output: [,1] [,2] Output: [,1] [,2]
[1,] 46 30 Student1 [1,] 46 43 Student1
[2,] 21 25 Student2 [2,] 41 35 Student2
[3,] 50 50 Student3 [3,] 50 50 Student3
• Combine both tests into one data structure, and store them in tests.
• Now we’ll arrange it to have two “layers”—one layer per test—with three
rows and two columns within each layer.
• We’ll store firsttest in the first layer and secondtest in the second layer. We
use R’s array function to create the data structure:
tests <- array(data=c(firsttest,secondtest),dim=c(3,2,2))
• In the argument dim=c(3,2,2), we are specifying two layers, each consisting
of three rows and two columns. This then becomes an attribute of the data
structure:
Input: attributes(tests)
Output: $dim
[1] 3 2 2
R’s print function for arrays displays the data layer by layer:
tests
,,1
[,1] [,2]
[1,] 46 30
[2,] 21 25
[3,] 50 48
,,2
[,1] [,2]
[1,] 46 43
[2,] 41 35
[3,] 50 49
• we built our three-dimensional array by combining two matrices, we can build four-
dimensional arrays by combining two or more three-dimensional arrays.
• One of the most common uses of arrays is in calculating tables.
Matrix Concatenation
• Matrix concatenation refers to the merging of rows or columns of an existing matrix.
Concatenation of a row:
• The concatenation of a row to a matrix is done using rbind().
Example: # Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
cat("The 3x3 matrix:\n")
print(A)
The 3x3 matrix:
[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 9
# Creating another 1x3 matrix
B = matrix( c(10, 11, 12), nrow = 1, ncol = 3 )
cat("The 1x3 matrix:\n")
print(B)
The 1x3 matrix:
[, 1] [, 2] [, 3]
[1, ] 10 11 12
# Add a new row using rbind()
C = rbind(A, B)
cat("After concatenation of a row:\n")
print(C)
After concatenation of a row:
[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 9
[4, ] 10 11 12
Concatenation of a column:
The concatenation of a column to a matrix is done using cbind().
# Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)
The 3x3 matrix:
[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 9
# Creating another 3x1 matrix
B = matrix( c(10, 11, 12), nrow = 3, ncol = 1, byrow = TRUE )
cat("The 3x1 matrix:\n")
print(B)
The 3x1 matrix:
[, 1]
[1, ] 10
[2, ] 11
[3, ] 12
# Add a new column using cbind()
C = cbind(A, B)
cat("After concatenation of a column:\n")
print(C)
After concatenation of a column:
[, 1] [, 2] [, 3] [, 4]
[1, ] 1 2 3 10
[2, ] 4 5 6 11
[3, ] 7 8 9 12
R allows creation of various different types of matrices with the use of arguments passed to
the matrix() function.
Matrix where all rows and columns are filled by a single constant ‘k’:
To create such a matrix the syntax is given below:
Syntax: matrix(k, m, n)
Where Parameters:
k: the constant
m: no of rows
n: no of columns
Example: # Matrix having 3 rows and 3 columns
# filled by a single constant 5
print(matrix(5, 3, 3))
Output: [,1] [,2] [,3]
[1,] 5 5 5
[2,] 5 5 5
[3,] 5 5 5
Diagonal matrix:
A diagonal matrix is a matrix in which the entries outside the main diagonal are all zero.
To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters:
k: the constants/array
m: no of rows
n: no of columns
Example: # Diagonal matrix having 3 rows and 3 columns
# filled by array of elements (5, 3, 3)
print(diag(c(5, 3, 3), 3, 3))
Output: [,1] [,2] [,3]
[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3
Identity matrix:
A square matrix in which all the elements of the principal diagonal are ones and all other
elements are zeros. To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters:
k: 1
m: no of rows
n: no of columns
Example:# Identity matrix having
# 3 rows and 3 columns
print(diag(1, 3, 3))
Output: [,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1