03 Matrices
03 Matrices
R Programming
Matrices
Matrices
Often R can work out the number of rows in a matrix given the
number of columns and the elements, or the the number of
columns in a matrix given the number of rows and the
elements. In such cases it is not necessary to specify both the
number of rows and the number of columns.
> x = matrix(1:6, nc = 2)
> nrow(x)
[1] 3
> ncol(x)
[1] 2
> dim(x)
[1] 3 2
Creating Matrices from Rows and Columns
> dimnames(x)
[[1]]
[1] "First" "Second"
[[2]]
[1] "A" "B" "C"
> rownames(x)
[1] "First" "Second"
> colnames(x)
[1] "A" "B" "C"
Extracting Matrix Elements
> s = 0
> for(i in 1:nrow(x))
for(j in 1:ncol(x))
s = s + x[i, j]
> s = sum(x)
Matrix Subsets
The functions row and col return matrices indicating the row
and column of each element. This can be used to extract or
change submatrices.
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> x + x^2
[,1] [,2]
[1,] 2 12
[2,] 6 20
Combining Vectors and Matrices
> x + 1
[,1] [,2]
[1,] 2 4
[2,] 3 5
> x + 1:2
[,1] [,2]
[1,] 2 4
[2,] 4 6
Combining Vectors and Matrices
Some checks are carried out to try to make sure that operations
are sensible. This can produce warnings.
> x + 1:3
[,1] [,2]
[1,] 2 6
[2,] 4 5
Warning message:
In x + 1:3 :
longer object length is not a multiple of shorter
object length
> rm = numeric(nrow(x))
> for(i in 1:nrow(x))
rm[i] = mean(x[i,])
apply(matrix, 1, summary)
apply(matrix, 2, summary)
The first subscript of the result ranges over the set of values
returned by the summary function while the second specifies
which row or column the summary function is being computed
for.
> mortality
Father's Education (Years)
Region <=8 9-11 12 13-15 >=16
Northeast 25.3 25.3 18.2 18.3 16.3
North Central 32.1 29.0 18.8 24.3 19.0
South 38.8 31.0 19.3 15.7 16.8
West 25.4 21.1 20.3 24.0 17.5
Creating The Matrix
> mortality =
matrix(c(25.3, 25.3, 18.2, 18.3, 16.3,
32.1, 29.0, 18.8, 24.3, 19.0,
38.8, 31.0, 19.3, 15.7, 16.8,
25.4, 21.1, 20.3, 24.0, 17.5),
nrow = 4, byrow = TRUE,
dimnames = list(
Region =
c("Northeast", "North Central",
"South", "West"),
"Father's Education (Years)" =
c("<=8", "9-11", "12",
"13-15", ">=16")))
An “Overall + Rows + Columns” Model
yi j = µ + αi + β j + εi j
> r = mortality
> mu = mean(r)
> r = r - mu
> mu
[1] 22.825
Sweeping Out Row Effects
> alpha
Northeast North Central South
-2.145 1.815 1.495
West
-1.165
Sweeping Out Column Effects
> beta
<=8 9-11 12 13-15 >=16
7.575 3.775 -3.675 -2.250 -5.425
The Residuals
> r
Father's Education (Years)
Region <=8 9-11 12 13-15 >=16
Northeast -2.955 0.845 1.195 -0.13 1.045
North Central -0.115 0.585 -2.165 1.91 -0.215
South 6.905 2.905 -1.345 -6.37 -2.095
West -3.835 -4.335 2.315 4.59 1.265
Remaining Effects?
To see that all the effects have been swept out of the residuals
we can compute the row and column means of the residuals.
> twoway =
function(y)
{
mu = mean(y)
y = y - mu
alpha = apply(y, 1, mean)
y = sweep(y, 1, alpha)
beta = apply(y, 2, mean)
y = sweep(y, 2, beta)
list(overall = mu, rows = alpha,
cols = beta, residuals = y)
}
Commenting
.
.
.
(Generalized) Outer Products
f (xi , y j ).
> x = c(-1, 0, 1)
> which(x >= 0)
[1] 2 3
> which(x == 0)
[1] 2
> which.min(x)
[1] 1
> which.max(x)
[1] 3
Nearest Neighbour Matches
> x = 0:10/10
> y = round(runif(4), 4)
> y
[1] 0.1485 0.9679 0.6533 0.9685
> nearest(x, y)
[1] 0.1 1.0 0.7 1.0
Matrix Transposes
A %*% B
Systems of Linear Equations
n(n + 1)
1+2+···+n =
2
Let’s examine the more general problem of finding a formula
for the sum of the kth powers of the first n positive integers.
Snk = 1k + 2k + · · · + nk
Snk ≤ n| k + nk + k k
{z· · · + n} = n × n = n
k+1
n terms
12 1k+1
1 1 ··· c0 S1k
1 2 22 ··· 2k+1 c1 S2k
= .
.. .. .. ..
. . . .
1 k+2 (k + 2)2 ··· (k + 2)k+1 ck+1 Sk+2,k
> sumpow =
function(k)
solve(outer(1:(k+2), 0:(k+1), "^"),
cumsum(seq(k+2)^k))
> sumpow(1)
[1] 0.0 0.5 0.5
which corresponds to
n n2 n(n + 1)
Sn1 = + = .
2 2 2
The Formula for k = 2
For k = 2,
> sumpow(2)
[1] 1.471046e-15 1.666667e-01 5.000000e-01
[4] 3.333333e-01
For k = 3,
> sumpow(3)
[1] 0.00 0.00 0.25 0.50 0.25
The formula is
n2 n3 n4 n2 (1 + 2n + n2 ) n2 (n + 1)2
Sn3 = + + = = .
4 2 4 4 4
Regression Analysis
y = Xβ + ε.
βb = (X0 X)−1 X0 y
> n = nrow(x)
> p = ncol(x)
> betahat = solve(t(x) %*% x, t(x) %*% y)
> epsilonhat = y - x %*% betahat
> sigmahat2 = sum(epsilonhat^2) / (n - p)
> D = sigmahat2 * solve(t(x) %*% x)