Matrix (Mathematics) : For Other Uses, See - "Matrix Theory" Redirects Here. For The Physics Topic, See
Matrix (Mathematics) : For Other Uses, See - "Matrix Theory" Redirects Here. For The Physics Topic, See
The m rows are horizontal and the n columns are vertical. Each element of a matrix is often
denoted by a variable with two subscripts. For example, a2,1 represents the element at the second
row and first column of a matrix A.
The individual items in an m × n matrix A, often denoted by ai,j, where max i = m and max j = n,
are called its elements or entries.[4] Provided that they have the same size (each matrix has the
same number of rows and the same number of columns as the other), two matrices can be added
or subtracted element by element (see Conformable matrix). The rule for matrix multiplication,
however, is that two matrices can be multiplied only when the number of columns in the first
equals the number of rows in the second (i.e., the inner dimensions are the same, n for Am,n × Bn,p).
Any matrix can be multiplied element-wise by a scalar from its associated field. A major
application of matrices is to represent linear transformations, that is, generalizations of linear
functions such as f(x) = 4x. For example, the rotation of vectors in three-dimensional space is a
linear transformation, which can be represented by a rotation matrix R: if v is a column vector (a
matrix with only one column) describing the position of a point in space, the product Rv is a
column vector describing the position of that point after a rotation. The product of two
transformation matrices is a matrix that represents the composition of two transformations.
Another application of matrices is in the solution of systems of linear equations. If the matrix is
square, it is possible to deduce some of its properties by computing its determinant. For example,
a square matrix has an inverse if and only if its determinant is not zero. Insight into the geometry
of a linear transformation is obtainable (along with other information) from the matrix's
eigenvalues and eigenvectors.
Applications of matrices are found in most scientific fields. In every branch of physics, including
classical mechanics, optics, electromagnetism, quantum mechanics, and quantum
electrodynamics, they are used to study physical phenomena, such as the motion of rigid bodies.
In computer graphics, they are used to manipulate 3D models and project them onto a 2-
dimensional screen. In probability theory and statistics, stochastic matrices are used to describe
sets of probabilities; for instance, they are used within the PageRank algorithm that ranks the
pages in a Google search.[5] Matrix calculus generalizes classical analytical notions such as
derivatives and exponentials to higher dimensions. Matrices are used in economics to describe
systems of economic relationships.
A major branch of numerical analysis is devoted to the development of efficient algorithms for
matrix computations, a subject that is centuries old and is today an expanding area of research.
Matrix decomposition methods simplify computations, both theoretically and practically.
Algorithms that are tailored to particular matrix structures, such as sparse matrices and near-
diagonal matrices, expedite computations in finite element method and other computations.
Infinite matrices occur in planetary theory and in atomic theory. A simple example of an infinite
matrix is the matrix representing the derivative operator, which acts on the Taylor series of a
function.
Contents
1 Definition
o 1.1 Size
2 Notation
3 Basic operations
o 3.1 Addition, scalar multiplication and transposition
o 3.2 Matrix multiplication
o 3.3 Row operations
o 3.4 Submatrix
4 Linear equations
5 Linear transformations
6 Square matrix
o 6.1 Main types
6.1.1 Diagonal and triangular matrix
6.1.2 Identity matrix
6.1.3 Symmetric or skew-symmetric matrix
6.1.4 Invertible matrix and its inverse
6.1.5 Definite matrix
6.1.6 Orthogonal matrix
o 6.2 Main operations
6.2.1 Trace
6.2.2 Determinant
6.2.3 Eigenvalues and eigenvectors
7 Computational aspects
8 Decomposition
9 Abstract algebraic aspects and generalizations
o 9.1 Matrices with more general entries
o 9.2 Relationship to linear maps
o 9.3 Matrix groups
o 9.4 Infinite matrices
o 9.5 Empty matrices
10 Applications
o 10.1 Graph theory
o 10.2 Analysis and geometry
o 10.3 Probability theory and statistics
o 10.4 Symmetries and transformations in physics
o 10.5 Linear combinations of quantum states
o 10.6 Normal modes
o 10.7 Geometrical optics
o 10.8 Electronics
11 History
o 11.1 Other historical usages of the word "matrix" in mathematics
12 See also
13 Notes
14 References
o 14.1 Physics references
o 14.2 Historical references
15 External links
Definition[edit]
A matrix is a rectangular array of numbers or other mathematical objects for which operations
such as addition and multiplication are defined.[6] Most commonly, a matrix over a field F is a
rectangular array of scalars each of which is a member of F.[7][8] Most of this article focuses on
real and complex matrices, that is, matrices whose elements are real numbers or complex
numbers, respectively. More general types of entries are discussed below. For instance, this is a
real matrix:
The numbers, symbols or expressions in the matrix are called its entries or its elements. The
horizontal and vertical lines of entries in a matrix are called rows and columns, respectively.
Size[edit]
The size of a matrix is defined by the number of rows and columns that it contains. A matrix
with m rows and n columns is called an m × n matrix or m-by-n matrix, while m and n are called
its dimensions. For example, the matrix A above is a 3 × 2 matrix.
Matrices with a single row are called row vectors, and those with a single column are called
column vectors. A matrix with the same number of rows and columns is called a square matrix.
A matrix with an infinite number of rows or columns (or both) is called an infinite matrix. In
some contexts, such as computer algebra programs, it is useful to consider a matrix with no rows
or no columns, called an empty matrix.
Notation[edit]
Matrices are commonly written in box brackets or parentheses:
The specifics of symbolic matrix notation vary widely, with some prevailing trends. Matrices are
usually symbolized using upper-case letters (such as A in the examples above), while the
corresponding lower-case letters, with two subscript indices (for example, a11, or a1,1), represent
the entries. In addition to using upper-case letters to symbolize matrices, many authors use a
special typographical style, commonly boldface upright (non-italic), to further distinguish
matrices from other mathematical objects. An alternative notation involves the use of a double-
underline with the variable name, with or without boldface style, (for example, ).
The entry in the i-th row and j-th column of a matrix A is sometimes referred to as the i,j, (i,j), or
(i,j)th entry of the matrix, and most commonly denoted as ai,j, or aij. Alternative notations for that
entry are A[i,j] or Ai,j. For example, the (1,3) entry of the following matrix A is 5 (also denoted
a13, a1,3, A[1,3] or A1,3):
Sometimes, the entries of a matrix can be defined by a formula such as ai,j = f(i, j). For example,
each of the entries of the following matrix A is determined by aij = i − j.
In this case, the matrix itself is sometimes defined by that formula, within square brackets or
double parentheses. For example, the matrix above is defined as A = [i-j], or A = ((i-j)). If matrix
size is m × n, the above-mentioned formula f(i, j) is valid for any i = 1, ..., m and any j = 1, ..., n.
This can be either specified separately, or using m × n as a subscript. For instance, the matrix A
above is 3 × 4 and can be defined as A = [i − j] (i = 1, 2, 3; j = 1, ..., 4), or A = [i − j]3×4.
Some programming languages utilize doubly subscripted arrays (or arrays of arrays) to represent
an m-×-n matrix. Some programming languages start the numbering of array indexes at zero, in
which case the entries of an m-by-n matrix are indexed by 0 ≤ i ≤ m − 1 and 0 ≤ j ≤ n − 1.[9] This
article follows the more common convention in mathematical writing where enumeration starts
from 1.
An asterisk is occasionally used to refer to whole rows or columns in a matrix. For example, ai,∗
refers to the ith row of A, and a∗,j refers to the jth column of A. The set of all m-by-n matrices is
denoted 𝕄(m, n).
Basic operations[edit]
External video
There are a number of basic operations that can be applied to modify matrices, called matrix
addition, scalar multiplication, transposition, matrix multiplication, row operations, and
submatrix.[11]
Familiar properties of numbers extend to these operations of matrices: for example, addition is
commutative, that is, the matrix sum does not depend on the order of the summands:
A + B = B + A.[12] The transpose is compatible with addition and scalar multiplication, as
expressed by (cA)T = c(AT) and (A + B)T = AT + BT. Finally, (AT)T = A.
Matrix multiplication[edit]
Multiplication of two matrices is defined if and only if the number of columns of the left matrix
is the same as the number of rows of the right matrix. If A is an m-by-n matrix and B is an n-by-
p matrix, then their matrix product AB is the m-by-p matrix whose entries are given by dot
product of the corresponding row of A and the corresponding column of B:
where 1 ≤ i ≤ m and 1 ≤ j ≤ p.[13] For example, the underlined entry 2340 in the product is
calculated as (2 × 1000) + (3 × 100) + (4 × 10) = 2340:
Matrix multiplication satisfies the rules (AB)C = A(BC) (associativity), and (A + B)C = AC +
BC as well as C(A + B) = CA+CB (left and right distributivity), whenever the size of the
matrices is such that the various products are defined.[14] The product AB may be defined without
BA being defined, namely if A and B are m-by-n and n-by-k matrices, respectively, and m ≠ k.
Even if both products are defined, they need not be equal, that is, generally
AB ≠ BA,
that is, matrix multiplication is not commutative, in marked contrast to (rational, real, or
complex) numbers whose product is independent of the order of the factors. An example of two
matrices not commuting with each other is:
whereas
Besides the ordinary matrix multiplication just described, there exist other less frequently used
operations on matrices that can be considered forms of multiplication, such as the Hadamard
product and the Kronecker product.[15] They arise in solving matrix equations such as the
Sylvester equation.
Row operations[edit]
These operations are used in a number of ways, including solving linear equations and finding
matrix inverses.
Submatrix[edit]
The minors and cofactors of a matrix are found by computing the determinant of certain
submatrices.[18][19]
A principal submatrix is a square submatrix obtained by removing certain rows and columns.
The definition varies from author to author. According to some authors, a principal submatrix is
a submatrix in which the set of row indices that remain is the same as the set of column indices
that remain.[20][21] Other authors define a principal submatrix as one in which the first k rows and
columns, for some number k, are the ones that remain;[22] this type of submatrix has also been
called a leading principal submatrix.[23]
Linear equations[edit]
Main articles: Linear equation and System of linear equations
Matrices can be used to compactly write and work with multiple linear equations, that is, systems
of linear equations. For example, if A is an m-by-n matrix, x designates a column vector (that is,
n×1-matrix) of n variables x1, x2, ..., xn, and b is an m×1-column vector, then the matrix equation
Using matrices, this can be solved more compactly than would be possible by writing out all the
equations separately. If n = m and the equations are independent, this can be done by writing
where A−1 is the inverse matrix of A. If A has no inverse, solutions if any can be found using its
generalized inverse.
Linear transformations[edit]
Main articles: Linear transformation and Transformation matrix
The vectors represented by a 2-by-2 matrix correspond to the sides of a unit square transformed
into a parallelogram.
Matrices and matrix multiplication reveal their essential features when related to linear
transformations, also known as linear maps. A real m-by-n matrix A gives rise to a linear
transformation Rn → Rm mapping each vector x in Rn to the (matrix) product Ax, which is a
vector in Rm. Conversely, each linear transformation f: Rn → Rm arises from a unique m-by-n
matrix A: explicitly, the (i, j)-entry of A is the ith coordinate of f(ej), where ej = (0,...,0,1,0,...,0) is
the unit vector with 1 in the jth position and 0 elsewhere. The matrix A is said to represent the
linear map f, and A is called the transformation matrix of f.
can be viewed as the transform of the unit square into a parallelogram with vertices at (0, 0), (a,
b), (a + c, b + d), and (c, d). The parallelogram pictured at the right is obtained by multiplying A
with each of the column vectors and in turn. These vectors define the vertices of the unit square.
The following table shows a number of 2-by-2 matrices with the associated linear maps of R2.
The blue original is mapped to the green grid and shapes. The origin (0,0) is marked with a black
point.
Horizontal shear with Reflection through Squeeze mapping Scaling by a Rotation by π/6
m = 1.25. the vertical axis with r = 3/2 factor of 3/2 = 30°
Under the 1-to-1 correspondence between matrices and linear maps, matrix multiplication
corresponds to composition of maps:[25] if a k-by-m matrix B represents another linear map g : Rm
→ Rk, then the composition g ∘ f is represented by BA since
The last equality follows from the above-mentioned associativity of matrix multiplication.
The rank of a matrix A is the maximum number of linearly independent row vectors of the
matrix, which is the same as the maximum number of linearly independent column vectors.[26]
Equivalently it is the dimension of the image of the linear map represented by A.[27] The rank–
nullity theorem states that the dimension of the kernel of a matrix plus the rank equals the
number of columns of the matrix.[28]
Square matrix[edit]
Main article: Square matrix
A square matrix is a matrix with the same number of rows and columns. An n-by-n matrix is
known as a square matrix of order n. Any two square matrices of the same order can be added
and multiplied. The entries aii form the main diagonal of a square matrix. They lie on the
imaginary line that runs from the top left corner to the bottom right corner of the matrix.
Main types[edit]
Diagonal matrix
Identity matrix[edit]
The identity matrix In of size n is the n-by-n matrix in which all the elements on the main
diagonal are equal to 1 and all other elements are equal to 0, for example,
It is a square matrix of order n, and also a special kind of diagonal matrix. It is called an identity
matrix because multiplication with it leaves a matrix unchanged:
A nonzero scalar multiple of an identity matrix is called a scalar matrix. If the matrix entries
come from a field, the scalar matrices form a group, under matrix multiplication, that is
isomorphic to the multiplicative group of nonzero elements of the field.
A square matrix A that is equal to its transpose, that is, A = AT, is a symmetric matrix. If instead,
A is equal to the negative of its transpose, that is, A = −AT, then A is a skew-symmetric matrix.
In complex matrices, symmetry is often replaced by the concept of Hermitian matrices, which
satisfy A∗ = A, where the star or asterisk denotes the conjugate transpose of the matrix, that is,
the transpose of the complex conjugate of A.
By the spectral theorem, real symmetric matrices and complex Hermitian matrices have an
eigenbasis; that is, every vector is expressible as a linear combination of eigenvectors. In both
cases, all eigenvalues are real.[29] This theorem can be generalized to infinite-dimensional
situations related to matrices with infinitely many rows and columns, see below.
A square matrix A is called invertible or non-singular if there exists a matrix B such that
AB = BA = In ,[30][31]
where In is the n×n identity matrix with 1s on the main diagonal and 0s elsewhere. If B exists, it
is unique and is called the inverse matrix of A, denoted A−1.
Definite matrix[edit]
f (x) = xTA x
has a positive value for every nonzero vector x in Rn. If f (x) takes only yields negative values
then A is negative-definite; if f does produce both negative and positive values then A is
indefinite.[32] If the quadratic form f yields only non-negative values (positive or zero), the
symmetric matrix is called positive-semidefinite (or if only non-positive values, then negative-
semidefinite); hence the matrix is indefinite precisely when it is neither positive-semidefinite nor
negative-semidefinite.
A symmetric matrix is positive-definite if and only if all its eigenvalues are positive, that is, the
matrix is positive-semidefinite and it is invertible.[33] The table at the right shows two possibilities
for 2-by-2 matrices.
Allowing as input two different vectors instead yields the bilinear form associated to A:
BA (x, y) = xTAy.[34]
Orthogonal matrix[edit]
An orthogonal matrix is a square matrix with real entries whose columns and rows are
orthogonal unit vectors (that is, orthonormal vectors). Equivalently, a matrix A is orthogonal if
its transpose is equal to its inverse:
which entails
Main operations[edit]
Trace[edit]
The trace, tr(A) of a square matrix A is the sum of its diagonal entries. While matrix
multiplication is not commutative as mentioned above, the trace of the product of two matrices is
independent of the order of the factors:
tr(AB) = tr(BA).
It follows that the trace of the product of more than two matrices is independent of cyclic
permutations of the matrices, however this does not in general apply for arbitrary permutations
(for example, tr(ABC) ≠ tr(BAC), in general). Also, the trace of a matrix is equal to that of its
transpose, that is,
tr(A) = tr(AT).
Determinant[edit]
A linear transformation on R2 given by the indicated matrix. The determinant of this matrix is
−1, as the area of the green parallelogram at the right is 1, but the map reverses the orientation,
since it turns the counterclockwise orientation of the vectors to a clockwise one.
The determinant det(A) or |A| of a square matrix A is a number encoding certain properties of
the matrix. A matrix is invertible if and only if its determinant is nonzero. Its absolute value
equals the area (in R2) or volume (in R3) of the image of the unit square (or cube), while its sign
corresponds to the orientation of the corresponding linear map: the determinant is positive if and
only if the orientation is preserved.
The determinant of 2-by-2 matrices is given by
The determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more lengthy Leibniz
formula generalises these two formulae to all dimensions.[35]
The determinant of a product of square matrices equals the product of their determinants:
Adding a multiple of any row to another row, or a multiple of any column to another column,
does not change the determinant. Interchanging two rows or two columns affects the determinant
by multiplying it by −1.[37] Using these operations, any matrix can be transformed to a lower (or
upper) triangular matrix, and for such matrices the determinant equals the product of the entries
on the main diagonal; this provides a method to calculate the determinant of any matrix. Finally,
the Laplace expansion expresses the determinant in terms of minors, that is, determinants of
smaller matrices.[38] This expansion can be used for a recursive definition of determinants (taking
as starting case the determinant of a 1-by-1 matrix, which is its unique entry, or even the
determinant of a 0-by-0 matrix, which is 1), that can be seen to be equivalent to the Leibniz
formula. Determinants can be used to solve linear systems using Cramer's rule, where the
division of the determinants of two related square matrices equates to the value of each of the
system's variables.[39]
Av = λv
[42]
Computational aspects[edit]
Matrix calculations can be often performed with different techniques. Many problems can be
solved by both direct algorithms or iterative approaches. For example, the eigenvectors of a
square matrix can be obtained by finding a sequence of vectors xn converging to an eigenvector
when n tends to infinity.[44]
To choose the most appropriate algorithm for each specific problem, it is important to determine
both the effectiveness and precision of all the available algorithms. The domain studying these
matters is called numerical linear algebra.[45] As with other numerical situations, two main aspects
are the complexity of algorithms and their numerical stability.
Determining the complexity of an algorithm means finding upper bounds or estimates of how
many elementary operations such as additions and multiplications of scalars are necessary to
perform some algorithm, for example, multiplication of matrices. For example, calculating the
matrix product of two n-by-n matrix using the definition given above needs n3 multiplications,
since for any of the n2 entries of the product, n multiplications are necessary. The Strassen
algorithm outperforms this "naive" algorithm; it needs only n2.807 multiplications.[46] A refined
approach also incorporates specific features of the computing devices.
In many practical situations additional information about the matrices involved is known. An
important case are sparse matrices, that is, matrices most of whose entries are zero. There are
specifically adapted algorithms for, say, solving linear systems Ax = b for sparse matrices A,
such as the conjugate gradient method.[47]
An algorithm is, roughly speaking, numerically stable, if little deviations in the input values do
not lead to big deviations in the result. For example, calculating the inverse of a matrix via
Laplace expansion (Adj (A) denotes the adjugate matrix of A)
may lead to significant rounding errors if the determinant of the matrix is very small. The norm
of a matrix can be used to capture the conditioning of linear algebraic problems, such as
computing a matrix's inverse.[48]
Although most computer languages are not designed with commands or libraries for matrices, as
early as the 1970s, some engineering desktop computers such as the HP 9830 had ROM
cartridges to add BASIC commands for matrices. Some computer languages such as APL were
designed to manipulate matrices, and various mathematical programs can be used to aid
computing with matrices.[49]
Decomposition[edit]
Main articles: Matrix decomposition, Matrix diagonalization, Gaussian elimination, and
Montante's method
There are several methods to render matrices into a more easily accessible form. They are
generally referred to as matrix decomposition or matrix factorization techniques. The interest of
all these techniques is that they preserve certain properties of the matrices in question, such as
determinant, rank or inverse, so that these quantities can be calculated after applying the
transformation, or that certain matrix operations are algorithmically easier to carry out for some
types of matrices.
The LU decomposition factors matrices as a product of lower (L) and an upper triangular
matrices (U).[50] Once this decomposition is calculated, linear systems can be solved more
efficiently, by a simple technique called forward and back substitution. Likewise, inverses of
triangular matrices are algorithmically easier to calculate. The Gaussian elimination is a similar
algorithm; it transforms any matrix to row echelon form.[51] Both methods proceed by multiplying
the matrix by suitable elementary matrices, which correspond to permuting rows or columns and
adding multiples of one row to another row. Singular value decomposition expresses any matrix
A as a product UDV∗, where U and V are unitary matrices and D is a diagonal matrix.
An example of a matrix in Jordan normal form. The grey blocks are called Jordan blocks.
and the power of a diagonal matrix can be calculated by taking the corresponding powers of the
diagonal entries, which is much easier than doing the exponentiation for A instead. This can be
used to compute the matrix exponential eA, a need frequently arising in solving linear differential
equations, matrix logarithms and square roots of matrices.[54] To avoid numerically ill-
conditioned situations, further algorithms such as the Schur decomposition can be employed.[55]
This article focuses on matrices whose entries are real or complex numbers. However, matrices
can be considered with much more general types of entries than real or complex numbers. As a
first step of generalization, any field, that is, a set where addition, subtraction, multiplication and
division operations are defined and well-behaved, may be used instead of R or C, for example
rational numbers or finite fields. For example, coding theory makes use of matrices over finite
fields. Wherever eigenvalues are considered, as these are roots of a polynomial they may exist
only in a larger field than that of the entries of the matrix; for instance they may be complex in
case of a matrix with real entries. The possibility to reinterpret the entries of a matrix as elements
of a larger field (for example, to view a real matrix as a complex matrix whose entries happen to
be all real) then allows considering each square matrix to possess a full set of eigenvalues.
Alternatively one can consider only matrices with entries in an algebraically closed field, such as
C, from the outset.
More generally, matrices with entries in a ring R are widely used in mathematics.[57] Rings are a
more general notion than fields in that a division operation need not exist. The very same
addition and multiplication operations of matrices extend to this setting, too. The set M(n, R) of
all square n-by-n matrices over R is a ring called matrix ring, isomorphic to the endomorphism
ring of the left R-module Rn.[58] If the ring R is commutative, that is, its multiplication is
commutative, then M(n, R) is a unitary noncommutative (unless n = 1) associative algebra over
R. The determinant of square matrices over a commutative ring R can still be defined using the
Leibniz formula; such a matrix is invertible if and only if its determinant is invertible in R,
generalising the situation over a field F, where every nonzero element is invertible.[59] Matrices
over superrings are called supermatrices.[60]
Matrices do not always have all their entries in the same ring – or even in any ring at all. One
special but common case is block matrices, which may be considered as matrices whose entries
themselves are matrices. The entries need not be square matrices, and thus need not be members
of any ring; but their sizes must fulfil certain compatibility conditions.
Linear maps Rn → Rm are equivalent to m-by-n matrices, as described above. More generally,
any linear map f: V → W between finite-dimensional vector spaces can be described by a matrix
A = (aij), after choosing bases v1, ..., vn of V, and w1, ..., wm of W (so n is the dimension of V and
m is the dimension of W), which is such that
In other words, column j of A expresses the image of vj in terms of the basis vectors wi of W; thus
this relation uniquely determines the entries of the matrix A. The matrix depends on the choice
of the bases: different choices of bases give rise to different, but equivalent matrices.[61] Many of
the above concrete notions can be reinterpreted in this light, for example, the transpose matrix AT
describes the transpose of the linear map given by A, with respect to the dual bases.[62]
These properties can be restated in a more natural way: the category of all matrices with entries
in a field with multiplication as composition is equivalent to the category of finite dimensional
vector spaces and linear maps over this field.
More generally, the set of m×n matrices can be used to represent the R-linear maps between the
free modules Rm and Rn for an arbitrary ring R with unity. When n = m composition of these maps
is possible, and this gives rise to the matrix ring of n×n matrices representing the endomorphism
ring of Rn.
Matrix groups[edit]
Any property of matrices that is preserved under matrix products and inverses can be used to
define further matrix groups. For example, matrices with a given size and with a determinant of 1
form a subgroup of (that is, a smaller group contained in) their general linear group, called a
special linear group.[66] Orthogonal matrices, determined by the condition
MTM = I,
form the orthogonal group.[67] Every orthogonal matrix has determinant 1 or −1. Orthogonal
matrices with determinant 1 form a subgroup called special orthogonal group.
Every finite group is isomorphic to a matrix group, as one can see by considering the regular
representation of the symmetric group.[68] General groups can be studied using matrix groups,
which are comparatively well understood, by means of representation theory.[69]
Infinite matrices[edit]
It is also possible to consider matrices with infinitely many rows and/or columns[70] even if, being
infinite objects, one cannot write down such matrices explicitly. All that matters is that for every
element in the set indexing rows, and every element in the set indexing columns, there is a well-
defined entry (these index sets need not even be subsets of the natural numbers). The basic
operations of addition, subtraction, scalar multiplication, and transposition can still be defined
without problem; however matrix multiplication may involve infinite summations to define the
resulting entries, and these are not defined in general.
If R is any ring with unity, then the ring of endomorphisms of as a right R module is isomorphic
to the ring of column finite matrices whose entries are indexed by , and whose columns each
contain only finitely many nonzero entries. The endomorphisms of M considered as a left R
module result in an analogous object, the row finite matrices whose rows each only have
finitely many nonzero entries.
If infinite matrices are used to describe linear maps, then only those matrices can be used all of
whose columns have but a finite number of nonzero entries, for the following reason. For a
matrix A to describe a linear map f: V→W, bases for both spaces must have been chosen; recall
that by definition this means that every vector in the space can be written uniquely as a (finite)
linear combination of basis vectors, so that written as a (column) vector v of coefficients, only
finitely many entries vi are nonzero. Now the columns of A describe the images by f of individual
basis vectors of V in the basis of W, which is only meaningful if these columns have only finitely
many nonzero entries. There is no restriction on the rows of A however: in the product A·v there
are only finitely many nonzero coefficients of v involved, so every one of its entries, even if it is
given as an infinite sum of products, involves only finitely many nonzero terms and is therefore
well defined. Moreover, this amounts to forming a linear combination of the columns of A that
effectively involves only finitely many of them, whence the result has only finitely many
nonzero entries, because each of those columns does. Products of two matrices of the given type
is well defined (provided that the column-index and row-index sets match), is of the same type,
and corresponds to the composition of linear maps.
If R is a normed ring, then the condition of row or column finiteness can be relaxed. With the
norm in place, absolutely convergent series can be used instead of finite sums. For example, the
matrices whose column sums are absolutely convergent sequences form a ring. Analogously, the
matrices whose row sums are absolutely convergent series also form a ring.
Infinite matrices can also be used to describe operators on Hilbert spaces, where convergence
and continuity questions arise, which again results in certain constraints that must be imposed.
However, the explicit point of view of matrices tends to obfuscate the matter,[71] and the abstract
and more powerful tools of functional analysis can be used instead.
Empty matrices[edit]
An empty matrix is a matrix in which the number of rows or columns (or both) is zero.[72][73]
Empty matrices help dealing with maps involving the zero vector space. For example, if A is a 3-
by-0 matrix and B is a 0-by-3 matrix, then AB is the 3-by-3 zero matrix corresponding to the null
map from a 3-dimensional space V to itself, while BA is a 0-by-0 matrix. There is no common
notation for empty matrices, but most computer algebra systems allow creating and computing
with them. The determinant of the 0-by-0 matrix is 1 as follows from regarding the empty
product occurring in the Leibniz formula for the determinant as 1. This value is also consistent
with the fact that the identity map from any finite dimensional space to itself has determinant 1, a
fact that is often used as a part of the characterization of determinants.
Applications[edit]
There are numerous applications of matrices, both in mathematics and other sciences. Some of
them merely take advantage of the compact representation of a set of numbers in a matrix. For
example, in game theory and economics, the payoff matrix encodes the payoff for two players,
depending on which out of a given (finite) set of alternatives the players choose.[74] Text mining
and automated thesaurus compilation makes use of document-term matrices such as tf-idf to
track frequencies of certain words in several documents.[75]
under which addition and multiplication of complex numbers and matrices correspond to each
other. For example, 2-by-2 rotation matrices represent the multiplication with some complex
number of absolute value 1, as above. A similar interpretation is possible for quaternions[76] and
Clifford algebras in general.
Early encryption techniques such as the Hill cipher also used matrices. However, due to the
linear nature of matrices, these codes are comparatively easy to break.[77] Computer graphics uses
matrices both to represent objects and to calculate transformations of objects using affine
rotation matrices to accomplish tasks such as projecting a three-dimensional object onto a two-
dimensional screen, corresponding to a theoretical camera observation.[78] Matrices over a
polynomial ring are important in the study of control theory.
Chemistry makes use of matrices in various ways, particularly since the use of quantum theory to
discuss molecular bonding and spectroscopy. Examples are the overlap matrix and the Fock
matrix used in solving the Roothaan equations to obtain the molecular orbitals of the Hartree–
Fock method.
Graph theory[edit]
At the saddle point (x = 0, y = 0) (red) of the function f(x,−y) = x2 − y2, the Hessian matrix is
indefinite.
It encodes information about the local growth behaviour of the function: given a critical point
x = (x1, ..., xn), that is, a point where the first partial derivatives of ƒ vanish, the function has a
local minimum if the Hessian matrix is positive definite. Quadratic programming can be used to
find global minima or maxima of quadratic functions closely related to the ones attached to
matrices (see above).[82]
Another matrix frequently used in geometrical situations is the Jacobi matrix of a differentiable
map f: Rn → Rm. If f1, ..., fm denote the components of f, then the Jacobi matrix is defined as [83]
If n > m, and if the rank of the Jacobi matrix attains its maximal value m, f is locally invertible at
that point, by the implicit function theorem.[84]
Partial differential equations can be classified by considering the matrix of coefficients of the
highest-order differential operators of the equation. For elliptic partial differential equations this
matrix is positive definite, which has decisive influence on the set of possible solutions of the
equation in question.[85]
The finite element method is an important numerical method to solve partial differential
equations, widely applied in simulating complex physical systems. It attempts to approximate the
solution to some equation by piecewise linear functions, where the pieces are chosen with
respect to a sufficiently fine grid, which in turn can be recast as a matrix equation.[86]
Probability theory and statistics[edit]
Two different Markov chains. The chart depicts the number of particles (of a total of 1000) in
state "2". Both limiting values can be determined from the transition matrices, which are given
by (red) and (black).
Stochastic matrices are square matrices whose rows are probability vectors, that is, whose entries
are non-negative and sum up to one. Stochastic matrices are used to define Markov chains with
finitely many states.[87] A row of the stochastic matrix gives the probability distribution for the
next position of some particle currently in the state that corresponds to the row. Properties of the
Markov chain like absorbing states, that is, states that any particle attains eventually, can be read
off the eigenvectors of the transition matrices.[88]
Statistics also makes use of matrices in many different forms.[89] Descriptive statistics is
concerned with describing data sets, which can often be represented as data matrices, which may
then be subjected to dimensionality reduction techniques. The covariance matrix encodes the
mutual variance of several random variables.[90] Another technique using matrices are linear least
squares, a method that approximates a finite set of pairs (x1, y1), (x2, y2), …, (xN, yN), by a linear
function
yi ≈ axi + b, i = 1, …, N
which can be formulated in terms of matrices, related to the singular value decomposition of
matrices.[91]
Random matrices are matrices whose entries are random numbers, subject to suitable probability
distributions, such as matrix normal distribution. Beyond probability theory, they are applied in
domains ranging from number theory to physics.[92][93]
The first model of quantum mechanics (Heisenberg, 1925) represented the theory's operators by
infinite-dimensional matrices acting on quantum states.[96] This is also referred to as matrix
mechanics. One particular example is the density matrix that characterizes the "mixed" state of a
quantum system as a linear combination of elementary, "pure" eigenstates.[97]
Another matrix serves as a key tool for describing the scattering experiments that form the
cornerstone of experimental particle physics: Collision reactions such as occur in particle
accelerators, where non-interacting particles head towards each other and collide in a small
interaction zone, with a new set of non-interacting particles as the result, can be described as the
scalar product of outgoing particle states and a linear combination of ingoing particle states. The
linear combination is given by a matrix known as the S-matrix, which encodes all information
about the possible interactions between particles.[98]
Normal modes[edit]
Geometrical optics[edit]
Geometrical optics provides further matrix applications. In this approximative theory, the wave
nature of light is neglected. The result is a model in which light rays are indeed geometrical rays.
If the deflection of light rays by optical elements is small, the action of a lens or reflective
element on a given light ray can be expressed as multiplication of a two-component vector with a
two-by-two matrix called ray transfer matrix: the vector's components are the light ray's slope
and its distance from the optical axis, while the matrix encodes the properties of the optical
element. Actually, there are two kinds of matrices, viz. a refraction matrix describing the
refraction at a lens surface, and a translation matrix, describing the translation of the plane of
reference to the next refracting surface, where another refraction matrix applies. The optical
system, consisting of a combination of lenses and/or reflective elements, is simply described by
the matrix resulting from the product of the components' matrices.[101]
Electronics[edit]
Traditional mesh analysis and nodal analysis in electronics lead to a system of linear equations
that can be described with a matrix.
The behaviour of many electronic components can be described using matrices. Let A be a 2-
dimensional vector with the component's input voltage v1 and input current i1 as its elements, and
let B be a 2-dimensional vector with the component's output voltage v2 and output current i2 as its
elements. Then the behaviour of the electronic component can be described by B = H · A, where
H is a 2 x 2 matrix containing one impedance element (h12), one admittance element (h21) and two
dimensionless elements (h11 and h22). Calculating a circuit now reduces to multiplying matrices.
History[edit]
Matrices have a long history of application in solving linear equations but they were known as
arrays until the 1800s. The Chinese text The Nine Chapters on the Mathematical Art written in
10th–2nd century BCE is the first example of the use of array methods to solve simultaneous
equations,[102] including the concept of determinants. In 1545 Italian mathematician Gerolamo
Cardano brought the method to Europe when he published Ars Magna.[103] The Japanese
mathematician Seki used the same array methods to solve simultaneous equations in 1683.[104]
The Dutch Mathematician Jan de Witt represented transformations using arrays in his 1659 book
Elements of Curves (1659).[105] Between 1700 and 1710 Gottfried Wilhelm Leibniz publicized the
use of arrays for recording information or solutions and experimented with over 50 different
systems of arrays.[103] Cramer presented his rule in 1750.
The term "matrix" (Latin for "womb", derived from mater—mother[106]) was coined by James
Joseph Sylvester in 1850,[107] who understood a matrix as an object giving rise to a number of
determinants today called minors, that is to say, determinants of smaller matrices that derive
from the original one by removing columns and rows. In an 1851 paper, Sylvester explains:
I have in previous papers defined a "Matrix" as a rectangular array of terms, out of which
different systems of determinants may be engendered as from the womb of a common
parent.[108]
Arthur Cayley published a treatise on geometric transformations using matrices that were not
rotated versions of the coefficients being investigated as had previously been done. Instead he
defined operations such as addition, subtraction, multiplication, and division as transformations
of those matrices and showed the associative and distributive properties held true. Cayley
investigated and demonstrated the non-commutative property of matrix multiplication as well as
the commutative property of matrix addition.[103] Early matrix theory had limited the use of arrays
almost exclusively to determinants and Arthur Cayley's abstract matrix operations were
revolutionary. He was instrumental in proposing a matrix concept independent of equation
systems. In 1858 Cayley published his A memoir on the theory of matrices[109][110] in which he
proposed and demonstrated the Cayley–Hamilton theorem.[103]
An English mathematician named Cullis was the first to use modern bracket notation for
matrices in 1913 and he simultaneously demonstrated the first significant use of the notation A =
[ai,j] to represent a matrix where ai,j refers to the ith row and the jth column.[103]
The modern study of determinants sprang from several sources.[111] Number-theoretical problems
led Gauss to relate coefficients of quadratic forms, that is, expressions such as x2 + xy − 2y2, and
linear maps in three dimensions to matrices. Eisenstein further developed these notions,
including the remark that, in modern parlance, matrix products are non-commutative. Cauchy
was the first to prove general statements about determinants, using as definition of the
determinant of a matrix A = [ai,j] the following: replace the powers ajk by ajk in the polynomial
where Π denotes the product of the indicated terms. He also showed, in 1829, that the
eigenvalues of symmetric matrices are real.[112] Jacobi studied "functional determinants"—later
called Jacobi determinants by Sylvester—which can be used to describe geometric
transformations at a local (or infinitesimal) level, see above; Kronecker's Vorlesungen über die
Theorie der Determinanten[113] and Weierstrass' Zur Determinantentheorie,[114] both published in
1903, first treated determinants axiomatically, as opposed to previous more concrete approaches
such as the mentioned formula of Cauchy. At that point, determinants were firmly established.
Many theorems were first established for small matrices only, for example the Cayley–Hamilton
theorem was proved for 2×2 matrices by Cayley in the aforementioned memoir, and by Hamilton
for 4×4 matrices. Frobenius, working on bilinear forms, generalized the theorem to all
dimensions (1898). Also at the end of the 19th century the Gauss–Jordan elimination
(generalizing a special case now known as Gauss elimination) was established by Jordan. In the
early 20th century, matrices attained a central role in linear algebra, [115] partially due to their use
in classification of the hypercomplex number systems of the previous century.
The inception of matrix mechanics by Heisenberg, Born and Jordan led to studying matrices with
infinitely many rows and columns.[116] Later, von Neumann carried out the mathematical
formulation of quantum mechanics, by further developing functional analytic notions such as
linear operators on Hilbert spaces, which, very roughly speaking, correspond to Euclidean space,
but with an infinity of independent directions.
The word has been used in unusual ways by at least two authors of historical importance.
Bertrand Russell and Alfred North Whitehead in their Principia Mathematica (1910–1913) use
the word "matrix" in the context of their axiom of reducibility. They proposed this axiom as a
means to reduce any function to one of lower type, successively, so that at the "bottom" (0 order)
the function is identical to its extension:
"Let us give the name of matrix to any function, of however many variables, that does not
involve any apparent variables. Then, any possible function other than a matrix derives
from a matrix by means of generalization, that is, by considering the proposition that the
function in question is true with all possible values or with some value of one of the
arguments, the other argument or arguments remaining undetermined".[117]
For example, a function Φ(x, y) of two variables x and y can be reduced to a collection of
functions of a single variable, for example, y, by "considering" the function for all possible
values of "individuals" ai substituted in place of variable x. And then the resulting collection of
functions of the single variable y, that is, ∀ai: Φ(ai, y), can be reduced to a "matrix" of values by
"considering" the function for all possible values of "individuals" bi substituted in place of
variable y:
Alfred Tarski in his 1946 Introduction to Logic used the word "matrix" synonymously with the
notion of truth table as used in mathematical logic.[118]