0% found this document useful (0 votes)
8 views32 pages

Class Hierarchies and Method Organization For Matrices

The document presents an overview of class hierarchies and method organization for matrices in the R Matrix package, focusing on a three-fold classification system. It discusses the goals of the Matrix package, including optimized numerical linear algebra and support for sparse matrices, and describes the structure of matrix classes through a 3D visualization. The talk highlights the importance of systematic testing and multiple dispatch methods for managing matrix operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views32 pages

Class Hierarchies and Method Organization For Matrices

The document presents an overview of class hierarchies and method organization for matrices in the R Matrix package, focusing on a three-fold classification system. It discusses the goals of the Matrix package, including optimized numerical linear algebra and support for sparse matrices, and describes the structure of matrix classes through a 3D visualization. The talk highlights the importance of systematic testing and multiple dispatch methods for managing matrix operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Class Hierarchies and Method Organization for Matrices

Martin Maechler

ETH Zurich
Switzerland
[email protected]

DSC 2007, Auckland


Feb 15, 2007

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 1 / 30
Outline

1 Preliminaries

2 Matrix Class Organization


Matrix: Goals
3D space of Matrix classes
Matrix class inheritance – “3-fold” tree

3 Matrix Methods — multiple dispatch on multiple arguments


Methods for “high-level” super classes

4 Systematic Testing of multi-argument Methods

5 Conclusions

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 2 / 30
Acknowledgements

Based on collaborative work with Douglas Bates (U.Wisconsin),


implemented in the R package Matrix

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 3 / 30
Preliminaries

S4 classes, methods and testing in the R Matrix package


not the first talk on this theme
More descriptive title is
Thoughts, Decisions and Experiences in Development and Testing of a
Large Class Hierarchy with Multiple Inheritance and Multiple Dispatch

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 4 / 30
Goals of Matrix package

1 interface to lapack= state-of-the-art numerical linear algebra for


dense matrices
making use of special structure for symmetric or triangular matrices
(e.g. when solving linear systems)
setting and keep such properties alows more optimized code in these
cases.
2 Sparse matrices for large designs: regression, mixed models, etc
3 . . . . . . [omitted in this talk]
Hence, quite a few different classes for matrices.

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 5 / 30
many Matrix classes . . .
> length(allCl <- getClasses("package:Matrix"))

[1] 89

> ## Those called "...Matrix" :


> length(M.Cl <- grep("Matrix$",allCl, value = TRUE))

[1] 70
i.e., many . . . , each inheriting from root class ”Matrix”
> str(subs <- showExtends(getClassDef("Matrix")@subclasses,
+ printTo=FALSE))

List of 2
$ what: chr [1:76] "compMatrix" "triangularMatrix" "dMatrix" "iMatr
$ how : chr [1:76] "directly" "directly" "directly" "directly" ...

> ## even more... : All those above and these in addition:


> subs$what[ ! (subs$what %in% M.Cl)]

[1] "Cholesky" "pCholesky" "BunchKaufman"


[4] "pBunchKaufman"
Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 6 / 30
3-way Partitioning of “Matrix space”

Logical organization of our Matrices: Three ( 3 ) main “class


classification” for our Matrices, i.e.,
three “orthogonal” partitions of “Matrix space”, and every Matrix object’s
class corresponds to an intersection of these three partitions.
i.e., in R’s S4 class system: We have three independent inheritence
schemes for every Matrix, and each such Matrix class is simply defined to
contain three virtual classes (one from each partitioning scheme), e.g,
setClass("dgCMatrix",
contains = c("CsparseMatrix", "dsparseMatrix", "generalMatr
validity = function(..) .....)

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 7 / 30
3-way Partitioning of Matrix space — 2

The three partioning schemes are


1 Content type: Classes dMatrix, lMatrix, nMatrix, (iMatrix,
zMatrix) for entries of type double, logical, pattern (and not yet
integer and complex) Matrices.
nMatrix only stores the location of non-zero matrix entries (where as
logical Matrices can also have NA entries!)
2 structure: general, triangular, symmetric, diagonal Matrices
3 sparsity: denseMatrix, sparseMatrix

First two schemes: a slight generalization from lapack for dense matrices.

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 8 / 30
3D space of Matrix classes
three-way partitioning of Matrix classes visualized in 3D space, dropping
the final Matrix, e.g., "d" instead of "dMatrix":
> d1 <- c("d", "l", "n")
> d2 <- c("general", "symmetric", "triangular", "diagonal")
> d3 <- c("dense", c("Csparse", "Tsparse", "Rsparse"))
> clGrid <- expand.grid(dim1 = d1, dim2 = d2, dim3 = d3,
+ KEEP.OUT.ATTRS = FALSE)
> clGr <- data.matrix(clGrid)
> library(scatterplot3d)
used for visualization:
● ● ●

● ● ●

● ● ●

● ● ●

Rsparse ● ●


● ●

● ● ●

● ● ●

Tsparse ● ●


● ●
dim3

dim2
diagonal
● ● ●

● ● ●

Csparse ● ●


triangular

● ● ●

symmetric
dense ● ● ●

general
d l n
dim1
Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 9 / 30
3D space of Matrix classes

● ● ●

● ● ●

● ● ●

● ● ●

Rsparse ● ●


● ●

● ● ●

● ● ●

Tsparse ● ●


● ●
dim3

dim2
diagonal
● ● ●

● ● ●

Csparse ● ●


triangular

● ●

symmetric

dense ● ● ●

general
d l n
dim1

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 10 / 30
3-fold classification — Matrix naming scheme
1 “Factual” classes: Matrix objects are of those; the above “points in
3D space”
2 Virtual classes: e.g. the above coordinate axes categories.

Superclasses of factual ones


cannot have objects of, but —importantly— many methods for these
virtual classes.
Factual classes follow a “simple” terse naming convention:
> str(M3cl <- grep("^...Matrix$",M.Cl, value = TRUE))
chr [1:47] "corMatrix" "ddiMatrix" "dgCMatrix" ...
> substring(M3cl,1,3)
[1] "cor" "ddi" "dgC" "dge" "dgR" "dgT" "dpo" "dpp" "dsC" "dsp"
[11] "dsR" "dsT" "dsy" "dtC" "dtp" "dtr" "dtR" "dtT" "ldi" "lgC"
[21] "lge" "lgR" "lgT" "lsC" "lsp" "lsR" "lsT" "lsy" "ltC" "ltp"
[31] "ltr" "ltR" "ltT" "ngC" "nge" "ngR" "ngT" "nsC" "nsp" "nsR"
[41] "nsT" "nsy" "ntC" "ntp" "ntr" "ntR" "ntT"
> M3cl <- M3cl[M3cl != "corMatrix"] # corMatrix not desired in follo
Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 11 / 30
Matrix 3d space: filled

● ● ●

● ● ●

● ● ●

● ● ●

Rsparse ● ●


● ●

● ● ●

● ● ●

Tsparse ● ●


● ●
dim3

dim2
diagonal
● ● ●

● ● ●

Csparse ● ●


triangular

● ●

symmetric

dense ● ● ●

general
d l n
dim1

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 12 / 30
Matrix 3d space: filled (2)

● ● ●

dtR ● ● ●

dsR
● ●


● ●

Rsparse dgR

dtT


● ●

dsT
● ●


● ●

Tsparse dgT

dtC


● ●
dim3

dim2
dsC

ddi


diagonal

Csparse dgC

dtp
dtr


triangular

dsy
dpo
dpp
dsp
● ●

symmetric

dense dge
● ● ●

general
d l n
dim1

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 13 / 30
Matrix 3d space: filled (3)

● ● ●

dtR ●

ltR ● ●

dsR

lsR


● ●

Rsparse dgR

lgR

dtT●

ltT
● ●

dsT

lsT


● ●

Tsparse dgT

lgT

dtC●

ltC

● ●
dim3

dim2
dsC

lsC
ddi


ldi

diagonal

Csparse dgC

lgC

dtp
dtr●

ltp
ltr

triangular

dsy
dpo
dpp
dsp

lsy
lsp
● ●

symmetric
dense dge

lge
● ●

general
d l n
dim1

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 14 / 30
Matrix 3d space: filled (4)

● ● ●

dtR ●

ltR ●

ntR

dsR

lsR

● nsR

● ●

Rsparse dgR

lgR

dtT● ngR
ltT

ntT

dsT

lsT

● nsT

● ●

Tsparse dgT

lgT

dtC● ngT
ltC

ntC

dim3

dim2
dsC

lsC
ddi

● nsC
ldi

diagonal
● ●

Csparse dgC

lgC

dtp
dtr● ngC
ltp
ltr

triangular ntp
ntr

dsy
dpo
dpp
dsp

lsy
lsp

nsy symmetric
nsp ●

dense dge

lge

nge general

d l n
dim1

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 15 / 30
Matrix classes – 3 fold inheritance tree
Matrix

dense* d* l* comp* n* sparse* i* z*

diagonal* ddense* ldense* ndense* triangular* symmetric* dsparse* Csparse* lsparse* Tsparse* general* Rsparse* nsparse*

ddi* ldi* dtp* dtr* dsy* dsp* ltr* dge* ltp* ntr* lsy* lsp* ntp* nsy* nsp* dtC* dsC* lge* nge* ltC* dtT* dtR* lsC* dsT* ltT* dsR* ntC* lsT* dgC* ltR* nsC* lsR* lgC* ntT* nsT* dgT* lgT* dgR* ngC* lgR* ngT* ntR* nsR* ngR* p*

pCholesky pBunchKaufman Cholesky BunchKaufman dpo* dpp*

74 nodes with 151 edges

cor*

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 16 / 30
Subgraph of Full inheritence graph

> allN <- nodes(trMatrix)


> "%w/o%" <- function(x,y) x[!x %in% y] #-- x without y
> hier1 <- paste(c("diagonal", "triangular", "symmetric", "general",
+ "Matrix", sep = ’’) # composites can be factorized:
> ## dropping 1st ‘‘dim’’ hierarchy -- much less edges:
> (trM1 <- subGraph(allN %w/o% hier1, trMatrix))
A graphNEL graph with directed edges
Number of Nodes = 69
Number of Edges = 101
> plotRag(mRagraph(trM1), subArgs=.optRagargs(adj = 0.5))

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 17 / 30
Subgraph of Full inheritence graph

Matrix

dense* d* l* n* sparse* i* z*

ddense* ldense* ndense* dsparse* Tsparse* Rsparse* lsparse* nsparse* p* Csparse*

dge* dsy* dsp* dtr* dtp* ddi* lge* lsy* lsp* ltr* ltp* ldi* ntp* nge* nsy* nsp* ntr* dgT* dtT* dsT* dtR* dsR* dgR* lsT* ltT* lgT* dtC* dsC* dgC* nsT* lsR* ltR* lgR* ngT* ntT* ntR* nsR* ngR* lgC* lsC* ltC* nsC* ngC* ntC*

dpo* dpp* BunchKaufman Cholesky pCholesky pBunchKaufman

69 nodes with 101 edges

cor*

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 18 / 30
Top 2 levels of hierarchy — distance-wise
> defMatrix <- getClassDef("Matrix")
> ## Distances of subclasses:
> table(subDist <- sapply(defMatrix@subclasses, slot, "distance"))
1 2 3 4 5
9 16 44 6 1
> sub12 <- defMatrix@subclasses[subDist <= 2] # but not unique!
Matrix
> trM_top12 <- subGraph(c("Matrix", unique(names(sub12))), trMatrix)
> plotRag(mRagraph(trM_top12)) ## first and second level virtual cla

sparse* d* comp* l* n* dense* triangular* i* z*

Rsparse* Tsparse* Csparse* dsparse* lsparse* nsparse* general* ddense* symmetric* ldense* ndense* diagonal*

23 nodes with 29 edges


Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 19 / 30
Top 2 levels of hierarchy — distance-wise

Matrix

sparse* d* comp* l* n* dense* triangular* i* z*

Rsparse* Tsparse* Csparse* dsparse* lsparse* nsparse* general* ddense* symmetric* ldense* ndense* diagonal*

23 nodes with 29 edges

p*

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 20 / 30
Top 2 levels of sparse sub-hierarchy:
'sparseMatrix' classes −− sub graph
35 nodes with 61 edges

sparse*

dsparse* Tsparse* Csparse* lsparse* Rsparse* nsparse* p*

dgT* dtT* dsT* dgC* dtC* dsC* dgR* dtR* dsR* lgT* ltT* lsT* ngT* ntT* nsT* lgC* ltC* lsC* lgR* ltR* lsR* ngC* ntC* nsC* ngR* ntR* nsR*

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 21 / 30
Top 2 levels of hierarchy:
'sparseMatrix' classes −− in "neato" layout
35 nodes with 61 edges

ltR* lsR*
lgR*
lsC*
lgC* ltC* lsparse*
dgR* Rsparse*
dtR*
dsR* ntR*
lsT* ngR*
dgC* Csparse* lgT* nsR*
sparse* ltT*
dsC* dsparse* ntC*
dtC* nsparse*
nsC* ngC* Tsparse*
dgT* ngT*
dsT*dtT* p* nsT*
ntT*

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 22 / 30
Methods for Matrix classes

1 Many important functions apply to pairs of matrices, e.g.,


Matrix multiplication: %*%, crossprod tcrossprod
Matrix solve(A, B)
Group methods Ops := { Arith, Compare, Logic},
Coercion (as( · , ·)), not important per se, but see below.
We do not want to implement n2 (≈ 50 ∗ 25 = 1250) methods per

2

generic for n matrix classes, namely for each possible pair of matrix
classes, but define methods for “high-level virtual super classes”
3 the higher the better (less pairwise combinations)
4 the higher the worse (the less specific, the less OO)

How to find a good compromise between 3. and 4. ?


Systematic to ensure all pairwise combinations are covered (and
properly so)?

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 23 / 30
Methods for “high-level” super classes

Typically want methods defined for high-level (i.e. high-up) super classes
one “bail-out” method for the mother class "Matrix", e.g., for the
expand() generic:
expand( Cholesky(crossprod(A)) )
Error: not-yet-implemented method for expand(<dCHMsimpl>).
->> Ask the package authors to implement the missing feature.
implementation - often “via”
method for one or a few specific classes, e.g., "dgeMatrix",
general method then may be as simple as
function(x) { x <- as(x, "dgeMatrix"); callGeneric() }

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 24 / 30
Methods for super classes

Consequence: need enough “as(hfromi , htoi)” methods,


namely at least for all “htoi” classes.
But this (would) require(s) SetAs(hfromi, htoi) for (too) many pairs.

−→ Goal (only recently and partly implemented for “Matrix”):


Coercions should also only “go via” high-level super classes. E.g.
as( ., "dMatrix") or as( ., "sparseMatrix"), instead of many
specific coercions such as
as( ., "dgeMatrix"), as( ., "dgCMatrix").

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 25 / 30
Methods for super classes – dispatch ambiguities

Have multiple inheritance, every object inheriting one “branch” from the
3-fold hierarchy.
> (lmat <- 1 == crossprod(Matrix(c(0,2,0,0:1,0), 4,3)))
3 x 3 sparse Matrix of class "lsCMatrix"

[1,] . . .
[2,] . . .
[3,] . . |
Warning messages:
1: Ambiguous method selection for "==", target "numeric#dsCMatrix" (
numeric#dMatrix
numeric#sparseMatrix
in: .findInheritedMethods(classes, fdef, mtable)
2: Ambiguous method selection for "==", target "dsCMatrix#numeric" (
dMatrix#numeric
sparseMatrix#numeric
in: .findInheritedMethods(classes, fdef, mtable)

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 26 / 30
Methods for super classes – dispatch ambiguities

> dsC.gr <- class2Graph("dsCMatrix", fullNames=FALSE, bottomUp=TRU


Matrix
> plot(mRagraph( dsC.gr ))

sparse* d* comp*

Csparse* dsparse* symmetric*

dsC*

Hence often more than one path “up” the inheritance graph. JMC’s “How
S4 Methods Work” introduces (integer) distances by counting the total
number of edges needed to go backwards, and thus neatly disambiguates
method dispatch – in some cases only:
Here, sparse* (= sparseMatrix) and d* (= dMatrix)
both have distance 2 from dsC* −→ unnecessary warning.

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 27 / 30
Propsal: Customizable method dispatch

Currently, class (inheritance) has two functions:


1 classes contains super classes −→ slot inheritance
2 method dispatch computes distances by counting edges in graph.
Currently leads to (too) many ambiguities.

RFC: Instead of counting edges, i.e., adding edges with weights 1, allow
to edge weights to be specified, e.g., in the setClass(*, contains=
...) statements.
Your idea on this ?

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 28 / 30
Propsal: Customizable method dispatch

Currently, class (inheritance) has two functions:


1 classes contains super classes −→ slot inheritance
2 method dispatch computes distances by counting edges in graph.
Currently leads to (too) many ambiguities.

RFC: Instead of counting edges, i.e., adding edges with weights 1, allow
to edge weights to be specified, e.g., in the setClass(*, contains=
...) statements.
Your idea on this ?

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 28 / 30
Propsal: Customizable method dispatch

Currently, class (inheritance) has two functions:


1 classes contains super classes −→ slot inheritance
2 method dispatch computes distances by counting edges in graph.
Currently leads to (too) many ambiguities.

RFC: Instead of counting edges, i.e., adding edges with weights 1, allow
to edge weights to be specified, e.g., in the setClass(*, contains=
...) statements.
Your idea on this ?

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 28 / 30
Systematic Testing of multi-argument Methods

Want (semi-) automated testing of “all possible” method dispatches


for (many!) single-argument methods
2 arguments: testing should exercise all n2 pairs for many generic


functions (including group methods, i.e., "Arith"metic etc)


For as(.,.) using (“newish”) canCoerce( ., .) utility.
Sub-setting and -assignment: “x[ i, j, drop]” has four (!) arguments
on which to dispatch.
ongoing process for “Matrix”. Current success (and frustration :-):
Every new test battery reveals bugs

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 29 / 30
Messages to take home

1 Multiple inheritance class hierarchies + multiple dispatch — the S4


features — need careful method organization.
2 Methods mainly to be defined for (virtual) super classes.
(Ambiguity warnings in method dispatch maybe undesirable)
(Disambiguation via inheritance distances may need to allow further
customization than currently (documented in JMC’s report “How S4
Methods Work”))
3 Automated testing of correct dispatch seems feasible though
non-trivial.
4 new small R package classGraph will soon become available

Martin Maechler (ETH Zurich) Class Hierarchies etc. . . for Matrices DSC 2007 30 / 30

You might also like