SC1

Statistical Computing 1

Dense Matrices


Some examples of dense matrix operations

Matrix is a two dimensional data structure in R. We can specify the values of entries in a matrix by column by default, or by row if byrow=T is set. Additionally, the column and row can have names, and can be renamed without changing values of matrix.

x <- matrix(1:6, 2, 3)
class(x)
## [1] "matrix" "array"
attributes(x)
## $dim
## [1] 2 3
dim(x)
## [1] 2 3
x2 <- matrix(1:6, 2, 3, dimnames = list(c('A', 'B'), c('K', 'L', 'M')))
x2
##   K L M
## A 1 3 5
## B 2 4 6
colnames(x2)
## [1] "K" "L" "M"
rownames(x2)
## [1] "A" "B"
colnames(x2) <- c('C1', 'C2', 'C3')
rownames(x2) <- c('R1', 'R2')
x2
##    C1 C2 C3
## R1  1  3  5
## R2  2  4  6

If one obtains a row of a matrix, the result is given as a vector, not matrix. One can avoid this behaviour by adding drop=F when indexing.

x[1,]
## [1] 1 3 5
x[1, ,drop=F]
##      [,1] [,2] [,3]
## [1,]    1    3    5

R also has higher dimensional data structures, which is known as arrays.

y <- array(1:6, c(2,4,2))
dim(y)
## [1] 2 4 2
y[,,1]
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    1
## [2,]    2    4    6    2
y[,,2]
##      [,1] [,2] [,3] [,4]
## [1,]    3    5    1    3
## [2,]    4    6    2    4

Solve linear systems

If one would like to solve a linear system \(Ax = b\), there are two ways in R to write the code. One way is to invert the matrix \(A\) first and then multiply by \(b\), i.e. solve(A) %*% b. The other way is to use solve directly to solve for \(x\). We run an experiment with a Hilbert matrix of size 9. Hilbert matrices are known to be close to singular. We use the package Matrix here only to access the Hilbert matrix function. We will discuss the Matrix package in more detail when we discuss sparse matrices.

n <- 9
library(Matrix); A <- as.matrix(Hilbert(n))
c(rankMatrix(A), rankMatrix(A, tol=1e-9), det(A))
## [1] 9.000000e+00 7.000000e+00 9.720256e-43
x <- matrix(rnorm(n), nrow=n, ncol=1)
b <- A %*% x
x1 <- solve(A) %*% b
x2 <- solve(A, b)
c(norm(x-x1, type='1'), norm(x-x2, type='1')) # compare 1-norm of error
## [1] 7.971969e-05 2.535504e-05
c(norm(b - A %*% x1, type='1'), norm(b - A %*% x2, type='1')) # compare residuals
## [1] 6.159001e-07 4.996004e-16

As can be seen from this example, solving directly is more accurate, not to mention faster. Unless a linear system needs to be repeated solved for different vectors, the matrix involved should never be inverted in R.

Numerical stability, finite precision arithmetic

In R and many other programming packages, a floating point number is stored as a “double” precision number. According to the IEEE754 standard, a double number consists of 64 binary bits arranged as follows:

This means that there are largest and smallest finite numbers in R:

c(2^(-1075), 2^(-1074), 2^1023, 2^1024, 2^1024 / 2^1024)
## [1]  0.000000e+00 4.940656e-324 8.988466e+307           Inf           NaN

With \(2^{-53}\approx 1.11 \times 10^{-16}\), we can expect an error of order \(10^{-16}\) in numerical representations (and therefore computation) of double numbers. For example,

1 + 1e-15
## [1] 1
print(1 + 1e-15, digits=22)
## [1] 1.000000000000001110223
print(1, digits=22)
## [1] 1

This finite precision causes numerical errors when one performs arithmetic operation on floating point number:

0.1 + 0.2 - 0.3
## [1] 5.551115e-17

In addition to floating point representation, R also has an integer data type “long”, e.g. 1L means 1 stored as a long integer.

1 == 1L
## [1] TRUE
identical(1, 1L)
## [1] FALSE

The equality comparison == is somewhat problematic especially when comparing double floating point numbers, but all.equal ignores errors up to a tolerence level, defaulted to be 1.5e-8.

0.1 + 0.2 - 0.3 == 0
## [1] FALSE
all.equal(0.1+0.2-0.3, 0)
## [1] TRUE
all.equal(1L, 1)
## [1] TRUE

Due to finite precision, matrix operations will contain numerical errors, with matrix multiplication generally having larger errors due to many operations of addition and scalar multiplication.

A <- matrix(rnorm(100), 10, 10); B = matrix(rnorm(100), 10, 10); C = rnorm(10)

summary(c((A+B) - A - B))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -2.220e-16  0.000e+00  0.000e+00  1.527e-18  0.000e+00  2.220e-16
summary(c(A%*%B%*%C - A%*%(B%*%C)))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.776e-15 -8.327e-16 -2.220e-16 -4.663e-16  0.000e+00  4.441e-16

The R function solve gives the inverse of a matrix for solve(A), or solves a linear system \(Ax=b\) for solve(A,b). If one needs to solve \(Ax=b\), then solve(A,b) generally speaking has lower numerical error than solve(A) %*% b, which is equivalent to computing \(A^{-1} b\).

A <- rbind(c(10,7,8,7), c(7,5,6,5), c(8,6,10,9), c(7,5,9,10))
b1 <- c(32,23,33,31); b2 = b1 + rnorm(4)*0.1
c( A %*% (solve(A, b1)) - b1 )
## [1] 0 0 0 0
c( A %*% (solve(A) %*% b1) - b1 )
## [1] 1.591616e-12 1.136868e-12 1.364242e-12 1.136868e-12
c( A %*% (solve(A, b2)) - b2 )
## [1]  0.000000e+00 -1.421085e-14  0.000000e+00 -7.105427e-15
c( A %*% (solve(A) %*% b2) - b2 )
## [1] 2.792433e-12 2.017941e-12 2.536638e-12 2.110312e-12

The R function eigen gives a list of two outputs, eigenvalues and eigenvectors of the input matrix. If one knows the input matrix is symmetric, hence will have real eigenvalues, one should specify this in the input argument to eigen. As can be seen in the following examples, if the eigenvalues of a matrix differ by magnitudes, the values of the eigenvectors have significant numerical error.

n <- 10
A1 <- matrix(rnorm(n*n), nrow=n, ncol=n); A2 = A1 + t(A1)
eA <- eigen(A2, symmetric=TRUE)
summary(abs(eA$values))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5075  2.1746  4.7045  4.0863  5.7194  7.0618
c(norm(eA$vectors %*% diag(eA$values) %*% t(eA$vectors) - A2, type='1'),
  norm(eA$vectors %*% t(eA$vectors) - diag(rep(1, n)), type='1'))
## [1] 6.555867e-14 7.025640e-15
n <- 100
A1 <- matrix(rnorm(n*n), nrow=n, ncol=n); A2 = A1 + t(A1)
eA <- eigen(A2, symmetric=TRUE)
summary(abs(eA$values))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.07452  6.01975 11.96072 12.28661 18.38060 27.97392
c(norm(eA$vectors %*% diag(eA$values) %*% t(eA$vectors) - A2, type='1'),
  norm(eA$vectors %*% t(eA$vectors) - diag(rep(1, n)), type='1'))
## [1] 4.356751e-12 3.281390e-13
n <- 1000
A1 <- matrix(rnorm(n*n), nrow=n, ncol=n); A2 = A1 + t(A1)
eA <- eigen(A2, symmetric=TRUE)
summary(abs(eA$values))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.02038 17.63020 36.04885 37.94617 56.77748 89.31396
c(norm(eA$vectors %*% diag(eA$values) %*% t(eA$vectors) - A2, type='1'),
  norm(eA$vectors %*% t(eA$vectors) - diag(rep(1, n)), type='1'))
## [1] 1.245884e-10 3.743312e-12