Tensor Prod
Tensor Prod
KEITH CONRAD
1. Introduction
Let R be a commutative ring and M and N be R-modules. (We always work with rings
having a multiplicative identity and modules are assumed to be unital: 1 · m = m for all
m ∈ M .) The direct sum M ⊕ N is an addition operation on modules. We introduce here a
product operation M ⊗R N , called the tensor product. We will start off by describing what
a tensor product of modules is supposed to look like. Rigorous definitions are in Section 3.
Tensor products first arose for vector spaces, and this is the only setting where they
occur in physics and engineering, so we’ll describe tensor products of vector spaces first.
Let V and W be vector spaces over a field K, and choose bases {ei } for V and {fj } for
W . The tensor product V ⊗K W is defined to be the K-vector space with a basis of formal
symbols ei ⊗ fj (we declare these new P symbols to be linearly independent by definition).
Thus V ⊗K W is the formal sums i,j cij ei ⊗ fj with cij ∈ K. Elements of V ⊗K W are
called tensors. For v ∈ V and w ∈ W , define v ⊗ w to be the element of V ⊗K W obtained
by writing v and w in terms of the original bases of V and W and then expanding out v ⊗ w
as if ⊗ were a noncommutative product (allowing scalars to be pulled out).
For example, let V = W = R2 = Re1 + Re2 , where {e1 , e2 } is the standard basis. (We
use the same basis for both copies of R2 .) Then R2 ⊗R R2 is a 4-dimensional space with
basis e1 ⊗ e1 , e1 ⊗ e2 , e2 ⊗ e1 , and e2 ⊗ e2 . If v = e1 − e2 and w = e1 + 2e2 , then
(1.1) v ⊗ w = (e1 − e2 ) ⊗ (e1 + 2e2 ) := e1 ⊗ e1 + 2e1 ⊗ e2 − e2 ⊗ e1 − 2e2 ⊗ e2 .
Does v ⊗ w depend on the choice of a basis of R2 ? As a test, pick another basis, say
e01 = e1 + e2 and e02 = 2e1 − e2 . Then v and w can be written as v = − 13 e01 + 23 e02 and
w = 53 e01 − 31 e02 . By a formal calculation,
1 0 2 0 5 0 1 0 5 1 10 2
v ⊗ w = − e1 + e2 ⊗ e1 − e2 = − e01 ⊗ e01 + e01 ⊗ e02 + e02 ⊗ e01 − e02 ⊗ e02 ,
3 3 3 3 9 9 9 9
and if you substitute into this last linear combination the definitions of e01 and e02 in terms
of e1 and e2 , expand everything out, and collect like terms, you’ll return to the sum on the
right side of (1.1). This suggests that v ⊗ w has a meaning in R2 ⊗R R2 that is independent
of the choice of a basis, although proving that might look daunting.
In the setting of modules, a tensor product can be described like the case of vector spaces,
but the properties that ⊗ is supposed to satisfy have to be laid out in general, not just on a
basis (which may not even exist): for R-modules M and N , their tensor product M ⊗R N
(read as “M tensor N ” or “M tensor N over R”) is an R-module spanned – not as a basis,
but just as a spanning set1 – by all symbols m ⊗ n, with m ∈ M and n ∈ N , and these
1Recall a spanning set for an R-module is a subset whose finite R-linear combinations fill up the module.
They always exist, since the entire module is a spanning set.
1
2 KEITH CONRAD
2. Bilinear Maps
We already described the elements of M ⊗R N as sums (1.4) subject to the rules (1.2)
and (1.3). The intention is that M ⊗R N is the “freest” object satisfying (1.2) and (1.3).
The essence of (1.2) and (1.3) is bilinearity. What does that mean?
A function B : M × N → P , where M , N , and P are R-modules, is called bilinear if it is
linear (that is, R-linear) in each argument when the other one fixed:
(2.1) B(m1 + m2 , n) = B(m1 , n) + B(m2 , n), B(rm, n) = rB(m, n),
has a 2-dimensional
Pn image, so B(e1 , e1 ) + B(e2 , e2 ) 6= B(v, w) for all v and w in Rn .
(Similarly, i=1 B(ei , ei ) is the n × n identity matrix, which is not of the form B(v, w).)
M ×N linear
#
composite is bilinear!
Q
We will construct the tensor product of M and N as a solution to a universal mapping
problem: find an R-module T and bilinear map b : M × N → T such that every bilinear
map on M × N is the composite of the bilinear map b and a unique linear map out of T .
;T
b
M ×N ∃ linear?
bilinear
#
P
This is analogous to the universal mapping property of the abelianization G/[G, G] of a
group G: homomorphisms G −→ A with abelian A are “the same” as homomorphisms
G/[G, G] −→ A because every homomorphism f : G → A is the composite of the canonical
homomorphism π : G → G/[G, G] with a unique homomorphism fe: G/[G, G] → A.
G/[G, G]
:
π
G fe
$
f
A
Definition 3.1. The tensor product M ⊗R N is an R-module equipped with a bilinear map
⊗ B
M × N −−→ M ⊗R N such that for each bilinear map M × N −−→ P there is a unique
L
linear map M ⊗R N −−→ P making the following diagram commute.
M8 ⊗R N
⊗
M ×N L
&
B
P
6 KEITH CONRAD
While the functions in the universal mapping property for G/[G, G] are all group ho-
momorphisms (out of G and G/[G, G]), functions in the universal mapping property for
M ⊗R N are not all of the same type: those out of M × N are bilinear and those out of
M ⊗R N are linear: bilinear maps out of M × N turn into linear maps out of M ⊗R N .
The definition of the tensor product involves not just a new module M ⊗R N , but also a
special bilinear map to it, ⊗ : M × N −→ M ⊗R N . This is similar to the universal mapping
property for the abelianization G/[G, G], which requires not just G/[G, G] but also the
homomorphism π : G −→ G/[G, G] through which all homomorphisms from G to abelian
groups factor. The universal mapping property requires fixing this extra information.
Before building a tensor product, let’s show any two tensor products are essentially the
b b0
same. Let R-modules T and T 0 , and bilinear maps M ×N −−→ T and M ×N −−→ T 0 , satisfy
b
the universal mapping property of the tensor product. From universality of M × N −−→ T ,
b0
the map M × N −−→ T 0 factors uniquely through T : a unique linear map f : T → T 0 makes
(3.1) ;T
b
M ×N f
b0 #
T0
b0 b
commute. From universality of M × N −−→ T 0 , the map M × N −−→ T factors uniquely
through T 0 : a unique linear map f 0 : T 0 → T makes
(3.2) 0
;T
b0
M ×N f0
$
b
T
;T
b
f
b0 / T0
M ×N
f0
#
b
T
TENSOR PRODUCTS 7
M ×N f 0 ◦f
#
b
T
From universality of (T, b), a unique linear map T → T fits in (3.3). The identity map
works, so f 0 ◦ f = idT . Similarly, f ◦ f 0 = idT 0 by stacking (3.1) and (3.2) together in the
other order. Thus T and T 0 are isomorphic R-modules by f and also f ◦ b = b0 , which means
f identifies b with b0 . So two tensor products of M and N can be identified with each other
in a unique way compatible6 with the distinguished bilinear maps to them from M × N .
Theorem 3.2. A tensor product of M and N exists.
Proof. Consider M × N simply as a set. We form the free R-module on this set:
M
FR (M × N ) = Rδ(m,n) .
(m,n)∈M ×N
M ×N `
'
B
P
commutes. We want to show ` makes sense as a function on M ⊗R N , which means showing
ker ` contains D. From the bilinearity of B,
B(m + m0 , n) = B(m, n) + B(m, n0 ), B(m, n + n0 ) = B(m, n) + B(m, n0 ),
rB(m, n) = B(rm, n) = B(m, rn),
so
`(δ(m+m0 ,n) ) = `(δ(m,n) ) + `(δ(m0 ,n) ), `(δ(m,n+n0 ) ) = `(δ(m,n) ) + `(δ(m,n0 ) ),
r`(δ(m,n) ) = `(δ(rm,n) ) = `(δ(m,rn) ).
Since ` is linear, these conditions are the same as
`(δ(m+m0 ,n) ) = `(δ(m,n) + δ(m0 ,n) ), `(δ(m,n+n0 ) ) = `(δ(m,n) + δ(m,n0 ) ),
`(rδ(m,n) ) = `(δ(rm,n) ) = `(δ(m,rn) ).
Therefore the kernel of ` contains all the generators of the submodule D, so ` induces a
linear map L : FR (M × N )/D → P where L(δ(m,n) + D) = `(δ(m,n) ) = B(m, n), which
means the diagram
FR (M × N )/D
(m,n)7→δ(m,n) +D
7
M ×N L
(
B
P
commutes. Since FR (M × N )/D = M ⊗R N and δ(m,n) + D = m ⊗ n, the above diagram is
(3.4) M8 ⊗R N
⊗
M ×N L
&
B
P
and that shows every bilinear map B out of M × N comes from a linear map L out of
M ⊗R N such that L(m ⊗ n) = B(m, n) for all m ∈ M and n ∈ N .
L
It remains to show the linear map M ⊗R N −−→ P in (3.4) is the only one that makes
(3.4) commute. We go back to the definition of M ⊗R N as a quotient of the free module
TENSOR PRODUCTS 9
Having shown a tensor product of M and N exists,7 its essential uniqueness lets us
call M ⊗R N “the” tensor product rather than “a” tensor product. Don’t forget that the
construction involves not only the module M ⊗R N but also the distinguished bilinear map
⊗
M × N −−→ M ⊗R N given by (m, n) 7→ m ⊗ n, through which all bilinear maps out of
M × N factor. We call this distinguished map the canonical bilinear map from M × N to
the tensor product. Elements of M ⊗R N are called tensors, and will be denoted by the
letter t. Tensors in M ⊗R N that have the form m ⊗ n are called elementary tensors. (Other
names for elementary tensors are simple tensors, decomposable tensors, pure tensors, and
monomial tensors.) Just as elements of the free R-module FR (A) on a set A are usually not
of the form δa but are linear combinations of these, elements of M ⊗R N are usually
not elementary tensors8 but are linear combinations of elementary tensors. In fact each
tensor is a sum of elementary tensors since r(m ⊗ n) = (rm) ⊗ n. This shows all elements
of M ⊗R N have the form (1.4).
That every tensor is a sum of elementary tensors, but need not be an elementary tensor
itself, is a feature that confuses people who are learning about tensor products. One source
of the confusion is that in the direct sum M ⊕ N every element is a pair (m, n), so why
shouldn’t every element of M ⊗R N have the form m ⊗ n? Here are two related ideas to
keep in mind, so it seems less strange that not all tensors are elementary.
• The R-module R[X, Y ] is a tensor product of R[X] and R[Y ] (see Example 4.12)
and, as Eisenbud and Harris note in their book on schemes [5, p. 39], the study
of polynomials in two variables is more than the study of polynomials of the form
f (X)g(Y ). That is, most polynomials in R[X, Y ] are not f (X)g(Y ), but they are
all a sum of such products (and in fact they are sums of monomials aij X i Y j ).
7What happens if R is a noncommutative ring? If M and N are left R-modules and B is bilinear on
M × N then for all m ∈ M , n ∈ N , and r and s in R, rsB(m, n) = rB(m, sn) = B(rm, sn) = sB(rm, n) =
srB(m, n). Usually rs 6= sr, so asking that rsB(m, n) = srB(m, n) for all m and n puts us in a delicate
situation! The correct tensor product M ⊗R N for noncommutative R uses a right R-module M , a left R-
module N , and a “middle-linear” map B where B(mr, n) = B(m, rn). In fact M ⊗R N is not an R-module
but just an abelian group! While we won’t deal with tensor products over a noncommutative ring, they are
important. They appear in the construction of induced representations of groups.
8For R 6= 0, an explicit example of a nonelementary tensor in R2 ⊗ R2 will be provided in Example 4.11.
R
We essentially already met one in Example 2.1 when we saw e1 e> >
1 + e2 e2 6= vw
>
for all v and w in Rn .
10 KEITH CONRAD
• The role of elementary tensors among all tensors is like that of separable solutions
f (x)g(y) to a 2-variable PDE among all solutions.9 Solutions to a PDE may not be
separable. First we determine the separable solutions and then the general solution
as a sum (perhaps an infinite sum) of separable solutions.
From now on forget the explicit construction of M ⊗R N as the quotient of an enormous
free module FR (M × N ). It will confuse you more than it’s worth to try to think about
M ⊗R N in terms of its construction. What is more important to remember is the universal
mapping property of the tensor product, which we will start using systematically in the
next section. To get used to the bilinearity of ⊗, let’s prove two simple results.
Theorem 3.3. Let M and N be R-modules with respective spanning sets {xi }i∈I and
{yj }j∈J . The tensor product M ⊗R N is spanned linearly by the elementary tensors xi ⊗ yj .
P
Proof.PAn elementary tensor in M ⊗R N has the form m ⊗ n. Write m = i ai xi and
n = j bj y j , where the ai ’s and bj ’s are 0 for all but finitely many i and j. From the
bilinearity of ⊗,
X X X
m⊗n= ai xi ⊗ bj yj = ai bj xi ⊗ yj
i j i,j
is a linear combination of the tensors xi ⊗ yj . So every elementary tensor is a linear
combination of the particular elementary tensors xi ⊗ yj . Since every tensor is a sum of
elementary tensors, the xi ⊗ yj ’s span M ⊗R N as an R-module.
Example 3.4. Let e1 , . . . , ek be the standard basis of Rk . The R-module Rk ⊗R Rk is
linearly spanned by the k 2 elementary tensors ei ⊗ ej . We will see later (Theorem 4.9) that
these elementary tensors are a basis of Rk ⊗R Rk , which for R a field is consistent with the
physicist’s “definition” of tensor products of vector spaces from Section 1 using bases.
Theorem 3.5. In M ⊗R N , m ⊗ 0 = 0 and 0 ⊗ n = 0.
Proof. This is just like the proof that a · 0 = 0 in a ring: since m ⊗ n is additive in n with m
fixed, m ⊗ 0 = m ⊗ (0 + 0) = m ⊗ 0 + m ⊗ 0. Subtracting m ⊗ 0 from both sides, m ⊗ 0 = 0.
That 0 ⊗ n = 0 follows by a similar argument.
Example 3.6. If A is a finite abelian group, Q ⊗Z A = 0 since every elementary tensor is
0: for a ∈ A, let na = 0 for some positive integer n. Then in Q ⊗Z A, r ⊗ a = n(r/n) ⊗ a =
r/n ⊗ na = r/n ⊗ 0 = 0. Every tensor is a sum of elementary tensors, and every elementary
tensor is 0, so all tensors are 0. (For instance, (1/3) ⊗ (5 mod 7) = 0 in Q ⊗Z Z/7Z. Thus
we can have m ⊗ n = 0 without m or n being 0.)
To show Q ⊗Z A = 0, we don’t need A to be finite, but rather that each element of A has
finite order. The group Q/Z has that property, so Q ⊗Z (Q/Z) = 0. By a similar argument,
Q/Z ⊗Z Q/Z = 0.
Since M ⊗R N is spanned additively by elementary tensors, each linear (or just additive)
function out of M ⊗R N is determined on all tensors from its values on elementary tensors.
This is why linear maps on tensor products are in practice described only by their values
on elementary tensors. It is similar to describing a linear map between finite free modules
9In Brad Osgood’s notes on the Fourier transform [19, pp. 343-344], he writes about functions of the
form f1 (x1 )f2 (x2 ) · · · fn (xn ) “If you really want to impress your friends and confound your enemies, you can
invoke tensor products in this context. [ . . . ] People run in terror from the ⊗ symbol. Cool.”
TENSOR PRODUCTS 11
using a matrix. The matrix directly tells you only the values of the map on a particular
basis, but this information is enough to determine the linear map everywhere.
However, there is a key difference between basis vectors and elementary tensors: ele-
mentary tensors have lots of linear relations. A linear map out of R2 is determined by its
values on (1, 0), (2, 3), (8, 4), and (−1, 5), but those values are not independent: they have
to satisfy every linear relation the four vectors satisfy because a linear map preserves linear
relations. Similarly, a random function on elementary tensors generally does not extend
to a linear map on the tensor product: elementary tensors span the tensor product of two
modules, but they are not linearly independent.
Functions of elementary tensors can’t be created out of a random function of two variables.
For instance, the “functions” M ⊗R M → M where m ⊗ m0 7→ m + m0 and m ⊗ m0 7→ m
make no sense since m ⊗ m0 = (−m) ⊗ (−m0 ) but m + m0 is usually not −m − m0 and m is
usually not −m. The only good way to create linear maps out of M ⊗R N is by the universal
mapping property of M ⊗R N (it creates linear maps from bilinear maps), since all linear
relations among elementary tensors – from the obvious to the obscure – are built into the
universal mapping property. A lot of practice with this is in Section 4. Understanding how
the universal mapping property of M ⊗R N can be used to compute examples and to prove
properties of tensor products is the best way to get used to tensor products; if you can’t
construct functions out of M ⊗R N , then you don’t understand M ⊗R N .
The tensor product can be extended to allow more than two factors. Given k modules
M1 , . . . , Mk , there is a module M1 ⊗R · · · ⊗R Mk that is universal for k-multilinear maps: it
⊗
admits a k-multilinear map M1 × · · · × Mk −−→ M1 ⊗R · · · ⊗R Mk and every k-multilinear
map out of M1 × · · · × Mk factors through this by composition with a unique linear map
out of M1 ⊗R · · · ⊗R Mk :
M1 ⊗5 R · · · ⊗R Mk
⊗
M1 × · · · × M k ∃ unique linear
multilinear
)
P
The image of (m1 , . . . , mk ) in M1 ⊗R · · · ⊗R Mk is written m1 ⊗ · · · ⊗ mk . This k-fold tensor
product can be constructed as a quotient of the free module FR (M1 × · · · × Mk ). It can also
be constructed using tensor products of modules two at a time:
(· · · ((M1 ⊗R M2 ) ⊗R M3 ) ⊗R · · · ) ⊗R Mk .
The canonical k-multilinear map to this R-module from M1 × · · · × Mk is (m1 , . . . , mk ) 7→
(· · · ((m1 ⊗ m2 ) ⊗ m3 ) · · · ) ⊗ mk . This is not the same construction of the k-fold tensor
product using FR (M1 × · · · × Mk ), but it satisfies the same universal mapping property and
thus can serve the same purpose (all constructions of a tensor product of M1 , . . . , Mk are
isomorphic to each other in a unique way compatible with the distinguished k-multilinear
maps to them from M1 × · · · × Mk ).
The module M1 ⊗R · · · ⊗R Mk is spanned additively by all m1 ⊗ · · · ⊗ mk . Important
examples of the k-fold tensor product are tensor powers M ⊗k of a single R-module M :
M ⊗0 = R, M ⊗1 = M, M ⊗2 = M ⊗R M, M ⊗3 = M ⊗R M ⊗R M,
and so on. (The formula M ⊗0 = R is a convention, like a0 = 1.)
12 KEITH CONRAD
M8 ⊗R N
⊗
M ×N L
&
B
P
of M × N , ki=1 B(mi , ni ) = `j=1 B(m0j , n0j ). The justification is along the lines
P P
of the previous two answers and is left to the reader. For example, the condition
Pk Pk
i=1 mi ⊗ ni = 0 means i=1 B(mi , ni ) = 0 for all bilinear maps B on M × N .
(5) For a bilinear map B : M × N → P , its bilinearity is (2.1) and (2.2), which say
B(m1 + m2 , n) = B(m1 , n) + B(m2 , n), B(rm, n) = rB(m, n),
B(m, n1 + n2 ) = B(m, n1 ) + B(m, n2 ), B(m, rn) = rB(m, n).
For the associated linear map L : M ⊗R N → P , the bilinearity of B is the same as
L((m1 + m2 ) ⊗ n) = L(m1 ⊗ n) + L(m2 ⊗ n), L((rm) ⊗ n) = rL(m ⊗ n),
L(m ⊗ (n1 + n2 )) = L(m ⊗ n1 ) + L(m ⊗ n2 ), L(m ⊗ (rn)) = rL(m ⊗ n).
Since (m1 + m2 ) ⊗ n = m1 ⊗ n + m2 ⊗ n, m ⊗ (n1 + n2 ) = m ⊗ n1 + m ⊗ n2 ,
(rm) ⊗ n = r(m ⊗ n), and m ⊗ (rn) = r(m ⊗ n), the four conditions on L above are
special cases of L(t + t0 ) = L(t) + L(t0 ) and L(rt) = rL(t), which is exactly what it
means for L to be linear.
(6) Tensors are used in physics and engineering (stress, elasticity, electromagnetism,
metrics, diffusion MRI), where they transform in a multilinear way under a change
in coordinates. The treatment of tensors in physics is discussed in Section 7.
(7) There isn’t a simple picture of a tensor (even an elementary tensor) analogous to
how a vector is an arrow. Some physical manifestations of tensors are in the previous
answer, but they won’t help you understand tensor products of modules.
Nobody is comfortable with tensor products at first. Two quotes by Cathy O’Neil and
Johan de Jong10 nicely capture the phenomenon of learning about them:
• O’Neil: After a few months, though, I realized something. I hadn’t gotten any better
at understanding tensor products, but I was getting used to not understanding them.
It was pretty amazing. I no longer felt anguished when tensor products came up; I
was instead almost amused by their cunning ways.
• de Jong: It is the things you can prove that tell you how to think about tensor
products. In other words, you let elementary lemmas and examples shape your
intuition of the mathematical object in question. There’s nothing else, no magical
intuition will magically appear to help you “understand” it.
Remark 3.7. Hassler Whitney, who first defined tensor products beyond the setting of
vector spaces, called abelian groups A and B a group pair relative to the abelian group C
if there is a Z-bilinear map A × B → C and wrote [27, p. 499] that “any such group pair
may be defined by choosing a homomorphism” A ⊗Z B → C. So the idea that ⊗Z solves a
universal mapping problem is essentially due to Whitney.
10See http://mathbabe.org/2011/07/20/what-tensor-products-taught-me-about-living-my-life/.
14 KEITH CONRAD
Z/aZ × Z/bZ f
B
(
Z/dZ
commute, so f (x⊗y) = xy. In particular, f (x⊗1) = x, so f is onto. Therefore Z/aZ⊗Z Z/bZ
has size at least d, so the size is d and we’re done.
Example 4.2. The abelian group Z/3Z ⊗Z Z/5Z is 0. Such collapsing in a tensor product
often bothers people when they first see it, but it’s means something concrete: each Z-
bilinear map B : Z/3Z × Z/5Z → A to an abelian group A is identically 0. That’s easy to
show directly: 3B(a, b) = B(3a, b) = B(0, b) = 0 and 5B(a, b) = B(a, 5b) = B(a, 0) = 0, so
B(a, b) is killed by 3Z + 5Z = Z. Thus B(a, b) is killed by 1, which means B(a, b) = 0.
In Z/aZ ⊗Z Z/bZ all tensors are elementary tensors: x ⊗ y = xy(1 ⊗ 1) and a sum of
multiples of 1 ⊗ 1 is again a multiple, so Z/aZ ⊗Z Z/bZ = Z(1 ⊗ 1) = {x ⊗ 1 : x ∈ Z}.
Note how the map f : Z/aZ ⊗Z Z/bZ → Z/dZ in the proof of Theorem 4.1 was created
from the bilinear map B : Z/aZ × Z/bZ → Z/dZ and the universal mapping property of
tensor products. To define a linear map out of M ⊗R N sending all elementary tensors
m ⊗ n to specific places, always back up and start by defining a bilinear map out of M × N
sending (m, n) to the place you want m ⊗ n to go. Make sure you show that map is bilinear!
Then the universal mapping property of the tensor product gives you a linear map out of
M ⊗R N sending m ⊗ n to the place where (m, n) goes. As an anonymous student once
wrote, “If you don’t know what to do on a tensor products problem, build a well-chosen
bilinear map out of M × N because there’s basically nothing else you can do.”
Theorem 4.3. For ideals I and J in R, there is a unique R-module isomorphism
R/I ⊗R R/J ∼ = R/(I + J)
where x ⊗ y 7→ xy. In particular, taking I = J = 0, R ⊗R R ∼
= R by x ⊗ y 7→ xy.
For R = Z and nonzero I and J, this is Theorem 4.1.
TENSOR PRODUCTS 15
Proof. Start with the function R/I × R/J → R/(I + J) given by (x mod I, y mod J) 7→
xy mod I + J. This is well-defined and bilinear, so from the universal mapping property of
the tensor product we get a linear map f : R/I ⊗R R/J → R/(I + J) making the diagram
R/I ⊗R R/J
7
⊗
R/I × R/J f
(R/I) × M f
(r,m)7→rm '
M/IM
To create an inverse map, start with the function M → (R/I) ⊗R M given by m 7→ 1 ⊗ m.
This is linear in m (check!) and kills IM (generators for IM are products im for i ∈ I
16 KEITH CONRAD
Proof. The result is clear if F or F 0 is 0, so let them both be nonzero modules (hence R 6= 0
and I and J are nonempty). By Theorem 3.3, {ei ⊗ e0j } spans F ⊗R F 0 as an R-module.
To show this spanning set is linearly independent, suppose i,j cij ei ⊗ e0j = 0, where
P
all but finitely many cij are 0. We want to show every cij is 0. Pick two basis vectors
ei0 and e0j0 in F and F 0 . To show the coefficient ci0 j0 is 0, consider the bilinear function
F × F 0 → R by (v, w) 7→ vi0 wj0 , where v = i vi ei and w = j wj e0j . (Here vi and wj are
P P
coordinates in R.) By the universal mapping property of tensor products there is a linear
map f0 : F ⊗R F 0 → R such that f0 (v ⊗ w) = vi0 wj0 on each elementary tensor v ⊗ w.
F9 ⊗R F 0
⊗
F × F0 f0
&
(v,w)7→ai0 bj0
R
In particular, f0 (ei0 ⊗ e0j0 ) = 1 and f0 (ei ⊗ e0j ) = 0 for (i, j) 6= (i0 , j0 ). Applying f0 to the
equation i,j cij ei ⊗ e0j = 0 in F ⊗R F 0 tells us ci0 j0 = 0 in R. Since i0 and j0 are arbitrary,
P
all the coefficients are 0.
Theorem 4.9 can be interpreted in terms of bilinear maps out of F × F 0 . It says that all
bilinear maps out of F × F 0 are determined by their values on the pairs (ei , e0j ), and that
each assignment of values to these pairs extends in a unique way to a bilinear map out of
F × F 0 . (The uniqueness of the extension is connected to the linear independence of the
elementary tensors ei ⊗ e0j .) This is the bilinear analogue of the existence and uniqueness
of a linear extension of a function from a basis of a free module to the whole module.
Example 4.10. Let K be a field and V and W be nonzero vector spaces over K with finite
dimension. There are bases for V and W , say {e1 , . . . , eP
m } for V and {f1 , . . . , fn } for W .
Every element of V ⊗K W can be written in the form i,j cij ei ⊗ fj for unique cij ∈ K.
In fact, this holds even for infinite-dimensional vector spaces, since Theorem 4.9 had no
assumption that bases were finite. This justifies the description on the first page of tensor
products of vector spaces using bases.
Example 4.11. For R 6= 0, let F be a finite free R-module of rank n ≥ 2 with basis
{e1 , . . . , en }. In F ⊗R F , e1 ⊗ e1 + e2 ⊗ e2 is an example of a tensor that is provably
not an elementary tensor. An elementary tensor in F ⊗R F has the form
n
X n
X n
X
(4.1) ai ei ⊗ bj ej = ai bj ei ⊗ ej .
i=1 j=1 i,j=1
Theorem 4.14. Let F be a free R-module with basis {ei }i∈I . For k ≥ 1, the kth tensor
power F ⊗k is free with basis {ei1 ⊗ · · · ⊗ eik }(i1 ,...,ik )∈I k .
Theorem 4.15. If M is an R-module and F is a free R-module withP basis {ei }i∈I , then
every element of M ⊗R F has a unique representation in the form i∈I mi ⊗ ei , where all
but finitely many mi equal 0.
These sums have finitely many terms (ri = 0 for all but finitely many i), from the definition
of direct sums. Thus f (g(t)) = t for all t ∈ M ⊗R F .
For the composition in the other order,
!
X X X
g(f ((mi )i∈I )) = g mi ⊗ ei = g(mi ⊗ ei ) = (. . . , 0, mi , 0, . . . ) = (mi )i∈I .
i∈I i∈I i∈I
TENSOR PRODUCTS 19
Remark 4.17. When f and g are additive functions you can check f (g(t)) = t for all
tensors t by only checking it on elementary tensors, but it would be wrong to think you
have proved injectivity of a linear map f : M ⊗R N → P by only looking at elementary
tensors.13 That is, if f (m ⊗ n) = f (m0 ⊗ n0 ) ⇒ m ⊗ n = m0 ⊗ n0 , then it is not always true
that f (t) = f (t0 ) ⇒ t = t0 for all t and t0 in M ⊗R N , since injectivity of a linear map is
not an additive property.14 This is the main reason that proving that a linear map out of
a tensor product is injective can require technique. As a special case, if you want to prove
a linear map out of a tensor product is an isomorphism, it might be easier to construct an
inverse map and check the composite in both orders is the identity than to show the original
map is injective and surjective.
Theorem 4.18. If M is a nonzero finitely generated R-module then M ⊗k 6= 0 for all k.
Proof. Necessarily R 6= 0. Write M = Rx1 + · · · + Rxd for minimal d ≥ 1. Set N =
Rx1 +· · ·+Rxd−1 (N = 0 if d = 1), so M = N +Rxd and xd 6∈ N . Set I = {r ∈ R : rxd ∈ N },
so I is an ideal in R and 1 6∈ I, so I is a proper ideal. When we write an element m of M
in the form n + rx with n ∈ N and r ∈ R, n and r may not be well-defined from m but
the value of r mod I is well-defined: if n + rx = n0 + r0 x then (r − r0 )x = n0 − n ∈ N , so
r ≡ r0 mod I. Therefore the function M k → R/I given by
(n1 + r1 xd , . . . , nk + rk xd ) 7→ r1 · · · rd mod I
is well-defined and multilinear (check!), so there is an R-linear map M ⊗k → R/I such that
xd ⊗ · · · ⊗ xd 7→ 1 mod I 6= 0. That shows M ⊗k 6= 0.
| {z }
k terms
Theorem 4.21. Let R be a domain with fraction field K and V be a K-vector space. There
is an R-module isomorphism K ⊗R V ∼
= V , where x ⊗ v 7→ xv.
By Theorem 4.5, K ⊗K V ∼ = V by x ⊗ v 7→ xv, but Theorem 4.21 is different because the
scalars in the tensor product are from R.
Proof. Multiplication is a function K × V → V . It is R-bilinear, so the universal mapping
property of tensor products says there is an R-linear function f : K ⊗R V → V where
f (x ⊗ v) = xv on elementary tensors. That says the diagram
K8 ⊗R V
⊗
K ×V f
&
V
commutes, where the lower diagonal map is scalar multiplication. Since f (1 ⊗ v) = v, f is
onto.
To show f is one-to-one, first we show every tensor in K ⊗R V is elementary with 1 in
the first component. For an elementary tensor x ⊗ v, write x = a/b with a and b in R, and
b 6= 0. Then
a 1 1 ab 1 a a
x ⊗ v = ⊗ v = ⊗ av = ⊗ v = b ⊗ v = 1 ⊗ v = 1 ⊗ xv.
b b b b b b b
Notice how we moved x ∈ K across ⊗ even though x need not be in R: we used K-scaling
in V to create b and 1/b on the right side of ⊗ and bring b across ⊗ from right to left, which
cancels 1/b on the left side of ⊗. This has the effect of moving 1/b from left to right.
Thus all elementary tensors in K ⊗R V have the form 1 ⊗ v for some v ∈ V , so by adding,
every tensor is 1 ⊗ v for some v. Now we can show f has trivial kernel: if f (t) = 0 then,
writing t = 1 ⊗ v, we get v = 0, so t = 1 ⊗ 0 = 0.
Example 4.22. For V = K, K ⊗R K ∼ = K as R-modules by x ⊗ y 7→ xy on elementary
tensors. For example, Q ⊗Z Q ∼
= Q. If a field K is inside a field L then we can view L as a
K-vector space and K ⊗R L ∼= L as R-modules, e.g., Q ⊗Z R ∼ = R as Z-modules.
Theorem 4.23. Let R be a domain with fraction field K and V be a K-vector space. For
each nonzero R-module M inside K, M ⊗R V ∼ = V as R-modules by m ⊗ v 7→ mv. In
particular, I ⊗R K ∼
= K as R-modules for every nonzero ideal I in R.
Proof. The proof is largely like that for the previous theorem.15 Multiplication gives a
function M × V → V that is R-bilinear, so we get an R-linear map f : M ⊗R V → V where
f (m ⊗ v) = mv. To show f is onto, we can’t look at f (1 ⊗ v) as in the previous proof, since
1 is usually not in M . Instead we can just pick a nonzero m ∈ M . Then for all v ∈ V ,
f (m ⊗ (1/m)v) = v.
To show f is injective, first we show all tensors in M ⊗R V are elementary. This sounds
like our previous proof that all tensors in K ⊗R V are elementary, but M need not be K,
so our manipulations need to be more careful than before. (We can’t write (a/b) ⊗ v as
15Theorem 4.21 is just a special case of Theorem 4.23, but we worked it out separately first since the
technicalities are simpler.
TENSOR PRODUCTS 21
(1/b) ⊗ av, since 1/b usually won’t be in M .) Given a finite set of nonzero elementary
tensors mi ⊗ vi , each mi is nonzero. Write mi = ai /bi with nonzero ai and bi in R. Let
a ∈ R be the product of the ai ’s and ci = a/ai ∈ R, so a = ai ci = bi ci mi ∈ M . In V we can
write vi = bi ci wi for some wi ∈ V , so
mi ⊗ vi = mi ⊗ bi ci wi = bi ci mi ⊗ wi = a ⊗ wi .
P
The sum of these elementary tensors is a ⊗ i wi , which is elementary.
Now suppose t ∈ M ⊗R V is in the kernel of f . All tensors in M ⊗R V are elementary,
so we can write t = m ⊗ v. Then f (t) = 0 ⇒ mv = 0 in V , so m = 0 or v = 0, and thus
t = m ⊗ v = 0.
√ √ √
Example 4.24. Let R = Z[ 10] and K = Q( 10). The ideal I = (2, 10) in R is not
principal, so I ∼
6 R as R-modules. However, I ⊗R K ∼
= = R ⊗R K as R-modules since both
are isomorphic to K.
Theorem 4.25. Let R be a domain and F and F 0 be free R-modules. If x and x0 are
nonzero in F and F 0 , then x ⊗ x0 6= 0 in F ⊗R F 0 .
Proof. If we were working with vector spaces this would be trivial, since x and x0 are each
part of a basis of F and F 0 , so x ⊗ x0 is part of a basis of F ⊗R F 0 (Theorem 4.9). In a
free module over a commutative ring, a nonzero element need not be part of a basis, so our
proof needs to be a little more careful. We’ll still uses bases, just not ones that necessarily
include x or x0 .
Pick a basis {ei } for F and {e0j } for F 0 . Write x = i ai ei and x0 = j a0j e0j . Then
P P
x ⊗ x0 = i,j ai a0j ei ⊗ e0j in F ⊗R F 0 . Since x and x0 are nonzero, they each have some
P
nonzero coefficient, say ai0 and a0j0 . Then ai0 a0j0 6= 0 since R is a domain, so x ⊗ x0 has a
nonzero coordinate in the basis {ei ⊗ e0j } of F ⊗R F 0 . Thus x ⊗ x0 6= 0.
Remark 4.26. There is always a counterexample for Theorem 4.25 when R is not a domain.
Let F = F 0 = R and say ab = 0 with a and b nonzero in R. In R ⊗R R we have a ⊗ b =
ab(1 ⊗ 1) = 0.
Theorem 4.27. Let R be a domain with fraction field K and V be a K-vector space.
(1) For all R-modules M , there is an R-module isomorphism V ⊗R M ∼
= V ⊗R (M/Mtor ),
where Mtor is the torsion submodule of M .
(2) For R-modules M , if M is torsion then V ⊗R M = 0 and if M is not torsion and
V is nonzero then V ⊗R M 6= 0.
(3) If M is an R-module and N is a submodule such that M/N is a torsion R-module
then V ⊗R N ∼= V ⊗R M as R-modules by v ⊗ n 7→ v ⊗ n.
Proof. (1) The map V × M → V ⊗R (M/Mtor ) given by (v, m) 7→ v ⊗ m is R-bilinear, so
there is an R-linear map f : V ⊗R M → V ⊗R (M/Mtor ) where f (v ⊗ m) = v ⊗ m.
⊗
To go the other way, the canonical R-bilinear map V ×M −−→ V ⊗R M vanishes at (v, m)
for m ∈ Mtor : if rm = 0 for r 6= 0 then v ⊗ m = r(v/r) ⊗ m = v/r ⊗ rm = v/r ⊗ 0 = 0.
Thus we get an induced R-bilinear map V × (M/Mtor ) → V ⊗R M given by (v, m) 7→ v ⊗ m.
(The point is that an elementary tensor v ⊗ m in V ⊗R M only depends on m through its
coset mod Mtor .) The universal mapping property of the tensor product now gives us an
R-linear map g : V ⊗R (M/Mtor ) → V ⊗R M where g(v ⊗ m) = v ⊗ m.
The composites g ◦ f and f ◦ g are both R-linear and are the identity on elementary
tensors, so they are the identity on all tensors and thus f and g are inverse isomorphisms.
22 KEITH CONRAD
M ×N f
(m,n)7→n⊗m &
N ⊗R M
commutes.
Running through the above argument with the roles of M and N interchanged, there is a
unique linear map g : N ⊗R M → M ⊗R N where g(n ⊗ m) = m ⊗ n on elementary tensors.
We will show f and g are inverses of each other.
To show f (g(t)) = t for all t ∈ N ⊗R M , it suffices to check this when t is an elementary
tensor, since both sides are R-linear (or even just additive) in t and N ⊗R M is spanned
by its elementary tensors: f (g(n ⊗ m)) = f (m ⊗ n) = n ⊗ m. Therefore f (g(t)) = t for all
t ∈ N ⊗R M . The proof that g(f (t)) = t for all t ∈ M ⊗R N is similar. We have shown f
and g are inverses of each other, so f is an R-module isomorphism.
24 KEITH CONRAD
M × (N ⊕ P ) L
B
*
Q
commute. Being linear, L would be determined by its values on the direct summands, and
these values would be determined by the values of L on all pairs (m ⊗ n, 0) and (0, m ⊗ p)
by additivity. These values are forced by commutativity of (5.1) to be
L(m⊗n, 0) = L(b(m,(n, 0))) = B(m,(n, 0)) and L(0, m⊗p) = L(b(m,(0, p))) = B(m,(0, p)).
To construct L, the above formulas suggest the maps M × N → Q and M × P → Q
given by (m, n) →
7 B(m, (n, 0)) and (m, p) 7→ B(m, (0, p)). Both are bilinear, so there are
1 L 2 L
R-linear maps M ⊗R N −−−
→ Q and M ⊗R P −−−
→ Q where
L1 (m ⊗ n) = B(m, (n, 0)) and L2 (m ⊗ p) = B(m, (0, p)).
Define L on (M ⊗R N ) ⊕ (M ⊗R P ) by L(t1 , t2 ) = L1 (t1 ) + L2 (t2 ). (Notice we are defining
L not just on ordered pairs of elementary tensors, but on all pairs of tensors. We need L1
and L2 to be defined on the whole tensor product modules M ⊗R N and M ⊗R P .) The
map L is linear since L1 and L2 are linear, and (5.1) commutes:
L(b(m, (n, p))) = L(b(m, (n, 0) + (0, p)))
= L(b(m, (n, 0)) + b(m, (0, p)))
= L((m ⊗ n, 0) + (0, m ⊗ p)) by the definition of b
= L(m ⊗ n, m ⊗ p)
= L1 (m ⊗ n) + L2 (m ⊗ p) by the definition of L
= B(m, (n, 0)) + B(m, (0, p))
= B(m, (n, 0) + (0, p))
= B(m, (n, p)).
Now that we’ve shown (M ⊗R N ) ⊕ (M ⊗R P ) and the bilinear map b have the universal
mapping property of M ⊗R (N ⊕ P ) and the canonical bilinear map ⊗, there is a unique
26 KEITH CONRAD
(M ⊗R N ) ⊕ (M ⊗R P )
5
b
M × (N ⊕ P ) f
⊗ )
M ⊗R (N ⊕ P )
Ni ∼
M M
M ⊗R = (M ⊗R Ni )
i∈I i∈I
Proof. The case I = ∅ is vacuously L true, so let L I 6= ∅. We modify the proof when |I| = 2 in
Theorem 5.3. The map b : M L ×( i∈I N i ) → i∈I (M ⊗R Ni ) by b((m, (ni )i∈I )) = (m⊗ni )i∈I
is bilinear.
L We will show i∈I (M ⊗ R N i ) and b have the universal mapping property of
M ⊗R i∈I Ni and L ⊗.
Let B : M × ( i∈I Ni ) → Q be bilinear. For each i ∈ I the function M × Ni → Q where
(m, ni ) 7→ B(m, (. . . , 0, ni , 0, . . . )) is bilinear, so thereL is a linear map Li : M ⊗R Ni → Q
where
P L i (m⊗n i ) = B(m, (. . . , 0, ni , 0, . . . )). Define L : i∈I (M ⊗R Ni ) → Q by L((ti )i∈I ) =
L
i∈I i i(t ). All but finitely many t i equal 0, so the sum here makes sense, and L is linear.
It is left to the reader to check the diagram
L
(M ⊗R Ni )
6
i∈I
b
L
M× i∈I Ni L
)
B
Q
commutes. A map L making this diagram commute has its value on L(. . . , 0, m ⊗ ni , 0, . . . ) =
b(m, (. . . , 0, ni , 0, . . . )) determined by B, so L is unique. Thus L i∈I (M ⊗R Ni ) and the
bilinear map b to it have the universal mapping property of M ⊗R i∈I Ni and the canonical
TENSOR PRODUCTS 27
L
M× i∈I Ni f
⊗
(
L
M ⊗R i∈I Ni
commute. Sending (m, (ni )i∈I ) around the diagram both ways, f ((m⊗ni )i∈I ) = m⊗(ni )i∈I ,
so the inverse of f is an isomorphism with the effect m ⊗ (ni )i∈I 7→ (m ⊗ ni )i∈I .
Remark 5.5. The analogue of Theorem 5.4 for direct products of R-modules has coun-
terexamples. While there is a natural R-linear map
Y Y
(5.2) M ⊗R Ni → (M ⊗R Ni )
i∈I i∈I
is a function that is bilinear in Mk−1 and Mk when other coordinates are fixed. There is a
unique function
Φ
M1 × · · · × Mk−2 × (Mk−1 ⊗R Mk ) −−→ N
that is linear in Mk−1 ⊗R Mk when the other coordinates are fixed and satisfies
Proof. Assuming a function Φ exists satisfying (5.3) and is linear in the last coordinate
when other coordinates are fixed, its value everywhere is determined by additivity in the
Pp
last coordinate: write each tensor t ∈ Mk−1 ⊗R Mk in the form t = i=1 xi ⊗ yi , and then
p
!
X
Φ(m1 , . . . , mk−2 , t) = Φ m1 , . . . , mk−2 , xi ⊗ yi
i=1
p
X
= Φ(m1 , . . . , mk−2 , xi ⊗ yi )
i=1
Xp
= ϕ(m1 , . . . , mk−2 , xi , yi ).
i=1
It remains to show Φ exists with the desired properties.
Fix mi ∈ Mi for i = 1, . . . , k − 2. Define ϕm1 ,...,mk−2 : Mk−1 × Mk → N by
ϕm1 ,...,mk−2 (x, y) = ϕ(m1 , . . . , mk−2 , x, y).
By hypothesis ϕm1 ,...,mk−2 is bilinear in x and y, so from the universal mapping property of
the tensor product there is a linear map Φm1 ,...,mk−2 : Mk−1 ⊗R Mk → N such that
Φm1 ,...,mk−2 (x ⊗ y) = ϕm1 ,...,mk−2 (x, y) = ϕ(m1 , . . . , mk−2 , x, y).
Define Φ : M1 × · · · × Mk−2 × (Mk−1 ⊗R Mk ) → N by
Φ(m1 , . . . , mk−2 , t) = Φm1 ,...,mk−2 (t).
Since Φm1 ,...,mk−2 is a linear function on Mk−1 ⊗R Mk , Φ(m1 , . . . , mk−2 , t) is linear in t when
m1 , . . . , mk−2 are fixed.
If ϕ is multilinear in M1 , . . . , Mk we want to show Φ is multilinear in M1 , . . . , Mk−2 ,
Mk−1 ⊗R Mk . We already know Φ is linear in Mk−1 ⊗R Mk when the other coordinates are
fixed. To show Φ is linear in each of the other coordinates (fixing the rest), we carry out
the computation for M1 (the argument is similar for other Mi ’s): is
?
Φ(x + x0 , m2 , . . . , mk−2 , t) = Φ(x, m2 , . . . , mk−2 , t) + Φ(x0 , m2 , . . . , mk−2 , t)
?
Φ(rx, m2 , . . . , mk−2 , t) = rΦ(x, m2 , . . . , mk−2 , t)
when m2 , . . . , mk−2 , t are fixed in M2 , . . . , Mk−2 , Mk−1 ⊗R Mk ? In these two equations, both
sides are additive in t so it suffices to verify these two equations when t is an elementary
tensor mk−1 ⊗ mk . Then from (5.3), these two equations are true since we’re assuming ϕ is
linear in M1 (fixing the other coordinates).
Theorem 5.6 is not specific to functions that are bilinear in the last two coordinates: any
two coordinates can be used when the function is bilinear in those two coordinates. For
instance, let’s revisit the proof of associativity of the tensor product in Theorem 5.2 to see
why the construction of the functions fp in the proof of Theorem 5.3 is a special case of
Theorem 5.6. Define
ϕ : M × N × P → M ⊗R (N ⊗R P )
by ϕ(m, n, p) = m ⊗ (n ⊗ p). This function is trilinear, so Theorem 5.6 says we can replace
M × N with its tensor product: there is a bilinear function
Φ : (M ⊗R N ) × P → M ⊗R (N ⊗R P )
such that Φ(m ⊗ n, p) = m ⊗ (n ⊗ p). Since Φ is bilinear, there is a linear function
f : (M ⊗R N ) ⊗R P → M ⊗R (N ⊗R P )
TENSOR PRODUCTS 29
Example 5.11. Finite-dimensional K-vector spaces V and W have finite bases, so Theorem
5.9 says V ∨ ⊗K W ∼= HomK (V, W ) by sending each elementary tensor ϕ⊗w to the linear map
V → W given by the rule (ϕ⊗w)(v) = ϕ(v)w for all v ∈ V and W ⊗K V ∨ ∼ = HomK (V, W ) by
sending each elementary tensor w ⊗ ϕ to the linear map V → W where (w ⊗ ϕ)(v) = ϕ(v)w
for all v ∈ V . This is one of the most basic ways tensor products occur in linear algebra.
What is the isomorphism W ⊗K V ∨ → HomK (V, W ) really saying? For each w ∈ W and
ϕ ∈ V ∨ , we get a linear map V → W by v 7→ ϕ(v)w, whose image as v varies is the scalar
multiples of w (unless ϕ = 0). Since the expression ϕ(v)w is bilinear in ϕ and w, we can
regard the linear map V → W where v 7→ ϕ(v)w as defining an effect of w ⊗ ϕ on V , with
values in W , and all linear maps V → W are sums of such maps. This corresponds to the
fact that every matrix is a sum of matrices with at most one nonzero entry.
For instance, when V = W = K 2 with basis e1 = 10 and e2 = 01 , let ϕ ∈ K 2 by
When M and N are finite free R-modules, the isomorphisms in Corollary 5.8 and Theorem
5.9 lead to a basis-free description of M ⊗R N making no mention of universal mapping
properties. Identify M with M ∨∨ by double duality, so Theorem 5.9 with M ∨ in place of
M assumes the form
M ⊗R N ∼= HomR (M ∨ , N ),
where m ⊗ n acts as a linear map M ∨ → N by the rule (m ⊗ n)(ϕ) = ϕ(m)n. Since
N ∼= N ∨∨ by double duality, HomR (M ∨ , N ) ∼ = HomR (M ∨ , (N ∨ )∨ ) ∼
= BilR (M ∨ , N ∨ ; R) by
Corollary 5.8. Therefore
(5.4) M ⊗R N ∼ = BilR (M ∨ , N ∨ ; R),
where m ⊗ n acts as a bilinear map M ∨ × N ∨ → R by the rule (m ⊗ n)(ϕ, ψ) = ϕ(m)ψ(n).
Similarly, M ⊗k is isomorphic to the module of k-multilinear maps (M ∨ )k → R, with the
elementary tensor m1 ⊗· · ·⊗mk defining the map sending (ϕ1 , . . . , ϕk ) to ϕ1 (m1 ) · · · ϕk (mk ).
The definition of the tensor product of finite-dimensional vector spaces in [1, p. 65] and
[18, p. 35] is essentially (5.4).18 It is a good exercise to check these interpretations of
m ⊗ n as a member of HomR (M ∨ , N ) and BilR (M ∨ , N ∨ ; R) are identified with each other
by Corollary 5.8 and double duality.
Watch out! The isomorphism (5.4) is false for general modules M and N (where double
duality doesn’t hold). There is always a linear map M ⊗R N → BilR (M ∨ , N ∨ ; R) given on
elementary tensors by m ⊗ n 7→ [(ϕ, ψ) 7→ ϕ(m)ψ(n)], but it need not be an isomorphism.
Example 5.12. Let p be prime, R = Z/p2 Z, and M = R/pR. The R-modules M ⊗R M and
BilR (M ∨ , M ∨ ; R) are isomorphic to each other (and to M ), but the natural map M ⊗R M →
BilR (M ∨ , M ∨ ; R) is identically 0.
Example 5.13. Let R = Z and M = N = Q. Since Q ⊗Z Q ∼ = Q as Z-modules (Example
4.22) and Q∨ = HomZ (Q, Z) = 0, the left side of (5.4) is nonzero and the right side is 0.
18Using the first isomorphism in Corollary 5.8 and double duality, M ⊗ N ∼
= BilR (M, N ; R)∨ for finite
R
free M and N , where m ⊗ n in M ⊗R N corresponds to the function B 7→ B(m, n) in BilR (M, N ; R)∨ .
This is how tensor products of finite-dimensional vector spaces are defined in [10, p. 40], namely V ⊗K W
is defined to be the dual space to BilK (V, W ; K).
32 KEITH CONRAD
that act on column vectors by multiplication from the left. Then for v ∈ R3 and w ∈ R4 ,
v ⊗ w : R3 → R4 by x 7→ w(v> x) = (wv> )x. Since e0i e>
j has 1 in the (i, j) component and
0 elsewhere, the tensor t above corresponds under R3 ⊗R R4 ∼= Hom(R3 , R4 ) to the matrix
1 2 1
4 5 7
A=2 3 3 .
3 4 5
Let the columns of A be w1 , w2 , and w3 . The row reduced form of A is
1 0 3
0
1 −1 ,
0 0 0
0 0 0
which tells us w1 and w2 are linearly independent and w3 = 3w1 − w2 . Thus A has rank 2,
so t has rank 2: it is a sum of two elementary tensors. What could those two tensors be?!?
To find two elementary tensors with sum t, we use the proof of Theorem 5.14. Let
c1 , c2 : R3 → R be coefficient functions for A(v) = c1 (v)w1 + c2 (v)w2 as v runs over R3 .
Using the basis vectors e1 , e2 , e3 of R3 in the role of v, we have
A(e1 ) = w1 , A(e2 ) = w2 , A(e3 ) = 3w1 − w2 ,
so c1 = e∨
1 + 3e∨
3 = (e1 + 3e3 )∨ and c2 = e∨
2 − e∨
3 = (e2 − e3 )∨ . Thus in R3 ⊗R R4 ,
t = (e1 + 3e3 ) ⊗ w1 + (e2 − e3 ) ⊗ w2 ,
which is a sum of two elementary tensors. In more explicit form,
(5.7) t = (e1 + 3e3 ) ⊗ (e01 + 4e02 + 2e03 + 3e04 ) + (e2 − e3 ) ⊗ (2e01 + 5e02 + 3e03 + 4e04 ).
You can expand the right side of (5.7) to check you get (5.6), or in terms of matrices check
1 2
4 (1 0 3) + 5 (0 1 − 1).
A = w1 (e1 + 3e3 )> + w2 (e2 − e3 )> =
2 3
3 4
Example
Pn 5.16. If V is an n-dimensional vector space over K with a basis e1 , . . . , en , then
i=1 ei ⊗ei has tensor rank n: it is not a sum of fewer than n elementary tensors in V ⊗K V .
To see why, use the isomorphism V ⊗K V → HomK (V, V ) where v⊗v 0 7→ [w 7→ (v·w)v 0 ] on
elementary tensors, withPv·w being the dotPproduct with respect to the chosen Pbasis of V : for
v = ni=1 ai ei and w = nj=1 bi ei , v·w := ni=1 ai bi . Then ei ·w = bi and w = ni=1 (ei ·w)ei ,
P
21See https://mathoverflow.net/questions/102559.
TENSOR PRODUCTS 35
6. Base Extension
In algebra, there are many times a module over one ring is replaced by a related module
over another ring. For instance, in linear algebra it is useful to enlarge Rn to Cn , creating
in this way a complex vector space by letting the real coordinates be extended to complex
coordinates. In ring theory, irreducibility tests in Z[X] involve viewing a polynomial in
Z[X] as a polynomial in Q[X] or reducing the coefficients mod p to view it in (Z/pZ)[X].
We will see that all these passages to modules with new coefficients (Rn Cn , Z[X]
Q[X], Z[X] (Z/pZ)[X]) can be described in a uniform way using tensor products.
Let f : R → S be a homomorphism of commutative rings. We use f to consider a S-
module N as an R-module by rn := f (r)n. In particular, S itself is an R-module by
rs := f (r)s. Passing from N as an S-module to N as an R-module in this way is called
restriction of scalars.
Example 6.1. If R ⊂ S, f can be the inclusion map (e.g., R ,→ C or Q ,→ C). This is
how a C-vector space is thought of as an R-vector space or a Q-vector space.
Example 6.2. If S = R/I, f can be reduction modulo I: each R/I-module is also an
R-module by letting r act in the way that r mod I acts.
Here is a simple illustration of restriction of scalars.
Theorem 6.3. Let N and N 0 be S-modules. An S-linear map N → N 0 is also an R-linear
map when we treat N and N 0 as R-modules.
Proof. Let ϕ : N → N 0 be S-linear, so ϕ(sn) = sϕ(n) for all s ∈ S and n ∈ N . For r ∈ R,
ϕ(rn) = ϕ(f (r)n) = f (r)ϕ(n) = rϕ(n),
so ϕ is R-linear.
As a notational convention, since we will be going back and forth between R-modules
and S-modules a lot, we will write M (or M 0 , and so on) for R-modules and N (or N 0 , and
so on) for S-modules. Since N is also an R-module by restriction of scalars, we can form
the tensor product R-module M ⊗R N , where
r(m ⊗ n) = (rm) ⊗ n = m ⊗ rn,
with the third expression really being m ⊗ f (r)n since rn := f (r)n.
The idea of base extension is to reverse the process of restriction of scalars. For an R-
module M we want to create an S-module of products sm that matches the old meaning
of rm if s = f (r). This new S-module is called an extension of scalars or base extension. It
will be the R-module S ⊗R M equipped with a specific structure of an S-module.
Since S is a ring, not just an R-module, let’s try making S ⊗R M into an S-module by
(6.1) s0 (s ⊗ m) := s0 s ⊗ m.
Is this S-scaling on elementary tensors well-defined and does it extend to S-scaling on all
tensors?
Theorem 6.4. The additive group S ⊗R M has a unique S-module structure satisfying
(6.1), and this is compatible with the R-module structure in the sense that rt = f (r)t for all
r ∈ R and t ∈ S ⊗R M .
36 KEITH CONRAD
Proof. Suppose the additive group S ⊗R M has an S-module structure satisfying (6.1). We
will show the S-scaling on all tensors in S ⊗R M is determined by this. Each t ∈ S ⊗R M
is a finite sum of elementary tensors, say
t = s1 ⊗ m1 + · · · + sk ⊗ mk .
For s ∈ S,
st = s(s1 ⊗ m1 + · · · + sk ⊗ mk )
= s(s1 ⊗ m1 ) + · · · + s(sk ⊗ mk )
= ss1 ⊗ m1 + · · · + ssk ⊗ mk by (6.1),
so st is determined, although this formula for it is not obviously well-defined. (Does a
different expression for t as a sum of elementary tensors change st?)
Now we show there really is an S-module structure on S ⊗R M satisfying (6.1). Describing
the S-scaling on S ⊗R M means creating a function S × (S ⊗R M ) → S ⊗R M satisfying
the relevant scaling axioms:
(6.2) 1 · t = t, s(t1 + t2 ) = st1 + st2 , (s1 + s2 )t = s1 t + s2 t, s1 (s2 t) = (s1 s2 )t.
For each s0 ∈ S we consider the function S × M → S ⊗R M given by (s, m) 7→ (s0 s) ⊗ m.
This is R-bilinear, so by the universal mapping property of tensor products there is an
R-linear map µs0 : S ⊗R M → S ⊗R M where µs0 (s ⊗ m) = (s0 s) ⊗ m on elementary tensors.
Define a multiplication S × (S ⊗R M ) → S ⊗R M by s0 t := µs0 (t). This will be the scaling
of S on S ⊗R M . We check the conditions in (6.2):
(1) To show 1t = t means showing µ1 (t) = t. On elementary tensors, µ1 (s ⊗ m) =
(1 · s) ⊗ m = s ⊗ m, so µ1 fixes elementary tensors. Therefore µ1 fixes all tensors by
additivity.
(2) s(t1 + t2 ) = st1 + st2 since µs is additive.
(3) Showing (s1 + s2 )t = s1 t + s2 t means showing µs1 +s2 = µs1 + µs2 as functions on
S ⊗R M . Both sides are additive functions, so it suffices to check they agree on
elementary tensors s ⊗ m, where both sides have common value (s1 + s2 )s ⊗ m.
(4) To show s1 (s2 t) = (s1 s2 )t means µs1 ◦ µs2 = µs1 s2 as functions on S ⊗R M . Both
sides are additive functions of t, so it suffices to check they agree on elementary
tensors s ⊗ m, where both sides have common value (s1 s2 s) ⊗ m.
Let’s check the S-module structure on S ⊗R M is compatible with its original R-module
structure. For r ∈ R, if we treat r as f (r) ∈ S then scaling by f (r) on an elementary tensor
s ⊗ m has the effect f (r)(s ⊗ m) = f (r)s ⊗ m. Since f (r)s is the definition of rs (this is how
we make S into an R-module), f (r)s⊗m = rs⊗m = r(s⊗m). Thus f (r)(s⊗m) = r(s⊗m),
so f (r)t = rt for all t in S ⊗R M by additivity.
By exactly the same kind of argument, M ⊗R S with S on the right has a unique S-
module structure where s0 (m ⊗ s) = m ⊗ s0 s. So whenever we meet S ⊗R M or M ⊗R S, we
know they are S-modules in a specific way. Moreover, these two S-modules are naturally
isomorphic: by Theorem 5.1, there is an isomorphism ϕ : S ⊗R M → M ⊗R S of R-modules
where ϕ(s⊗m) = m⊗s. To show ϕ is in fact an isomorphism of S-modules, all we need to do
is check S-linearity since ϕ is known to be additive and a bijection. To show ϕ(s0 t) = s0 ϕ(t)
for all s0 and t, additivity of both sides in t means we may focus on the case t = s ⊗ m:
ϕ(s0 (s ⊗ m)) = ϕ((s0 s) ⊗ m) = m ⊗ s0 s = s0 (m ⊗ s) = s0 ϕ(s ⊗ m).
TENSOR PRODUCTS 37
22We saw S ⊗ R[X] and S[X] are isomorphic as R-modules in Example 4.16 when S ⊃ R, and it holds
R
f
now for all R −−→ S.
38 KEITH CONRAD
where ϕ(s ⊗ (mi )i∈I ) = (s ⊗ mi )i∈I . To show ϕ is an S-module isomorphism, we just have
to check ϕ is S-linear, since we already know ϕ is additive and a bijection. It is obvious that
ϕ(st) = sϕ(t) when t is an elementary tensor, and since both ϕ(st) and sϕ(t) are additive
in t the case of general tensors follows.
The analogue of Q Theorem 6.11
Q for direct products of R-modules is false. The natural
S-linear map S ⊗R i∈I Mi → i∈I (S ⊗R Mi ) need not be an isomorphism. Here are two
examples.
• Q ⊗Z i≥1 Z/pi Z is nonzero (Remark 5.5) but i≥1 (Q ⊗Z Z/pi Z) is 0.
Q Q
By Corollary 4.28, this implies di=1 ai yi ∈ Mtor , so di=1 rai yi = 0 in M for some nonzero
P P
r ∈ R. By linear independence of the yi ’s over R, every rai is 0, so every ai is 0 (R is a
domain). Thus every ci = ai /b is 0.
It remains to prove M has a linearly independent subset of size dimK (K ⊗R M ). Let
{e1 , . . . , ed } be a linearly independent subset of M , where d is maximal. (Since d ≤
40 KEITH CONRAD
While M has at most dimK (K ⊗R M ) linearly independent elements and this upper bound
is achieved, each spanning set has at least dimK (K ⊗R M ) elements but this lower bound is
not necessarily reached. For example, if R is not a field and M is a torsion module (e.g., R/I
for I a nonzero proper ideal) then K ⊗R M = 0 and M certainly doesn’t have a spanning
set of size 0 if M 6= 0. It is also not true that finiteness of dimK (K ⊗R M ) implies M is
finitely generated as an R-module. Take R = Z and M = Q, so Q ⊗Z M = Q ⊗Z Q ∼ =Q
(Example 4.22), which is finite-dimensional over Q but M is not finitely generated over Z.
The maximal number of linearly independent elements in an R-module M , for R a do-
main, is called the rank of M .23 This use of the word “rank” is consistent with its usage for
finite free modules as the size of a basis: if M is free with an R-basis of size n then K ⊗R M
has a K-basis of size n by Theorem 6.7.
Example 6.13. A nonzero ideal I in a domain R has rank 1. We can see this in two ways.
First, any two nonzero elements in I are linearly dependent over R, so the maximal number
of R-linearly independent elements in I is 1. Second, K ⊗R I ∼ = K as K-vector spaces (in
Theorem 4.23 we showed they are isomorphic as R-modules, but that isomorphism is also
K-linear; check!), so dimK (K ⊗R I) = 1.
Example 6.14. For a domain R with fraction field K, a finitely generated R-module M
has rank 0 if and only if it is a torsion module, since K ⊗R M = 0 if and only if M is torsion.
Since K ⊗R M ∼ = K ⊗R (M/Mtor ) as K-vector spaces (the isomorphism between them
as R-modules in Theorem 4.27 is easily checked to be K-linear – check!), M and M/Mtor
have the same rank.
We return to general R, no longer a domain, and see how to make the tensor product of
an R-module and S-module into an S-module.
Theorem 6.15. Let M be an R-module and N be an S-module.
(1) The additive group M ⊗R N has a unique structure of S-module such that s(m⊗n) =
m ⊗ sn for s ∈ S. This is compatible with the R-module structure in the sense that
rt = f (r)t for r ∈ R and t ∈ M ⊗R N .
(2) The S-module M ⊗R N is isomorphic to (S ⊗R M ) ⊗S N by sending m ⊗R n to
(1 ⊗R m) ⊗S n.
23When R is not a domain, this concept of rank for R-modules is not quite the right one.
TENSOR PRODUCTS 41
The point of part 2 is that it shows how the S-module structure on M ⊗R N can be
described as an ordinary S-module tensor product by base extending M to an S-module
S ⊗R M . Part 2 has both R-module and S-module tensor products, and it is the first
time that we must decorate the tensor product sign explicitly. Up to now it was actually
unnecessary, as all the tensor products were over R.
Writing S ⊗R M as M ⊗R S makes the isomorphism in part 2 notationally obvious, since
it becomes (M ⊗R S) ⊗S N ∼ = M ⊗R N ; this is similar to the “proof” of the chain rule in
differential calculus, dy/dx = (dy/du)(du/dx), by cancellation of du in the notation. This
kind of notational trick will be proved in greater generality in Theorem 6.25(3).
Proof. (1) This is similar to the proof of Theorem 6.4 (which is the special case N = S).
We just sketch the idea.
Since every tensor is a sum of elementary tensors, declaring how s ∈ S scales elementary
tensors in M ⊗R N determines its scaling on all tensors. To show the rule s(m⊗n) = m⊗sn
really corresponds to an S-module structure, for each s ∈ S we consider the function
M × N → M ⊗R N given by (m, n) 7→ m ⊗ sn. This is R-bilinear in m and n, so there is
an R-linear map µs : M ⊗R N → M ⊗R N such that µs (m ⊗ n) = m ⊗ sn on elementary
tensors. Define a multiplication S × (M ⊗R N ) → M ⊗R N by st := µs (t). It is left to the
reader to check that the maps µs on M ⊗R N , as s varies, satisfy the scaling axioms that
make M ⊗R N an S-module.
To check rt = f (r)t for r ∈ R and t ∈ M ⊗R N , both sides are additive in t so it suffices
to check equality when t = m ⊗n is an elementary tensor. In that case r(m ⊗n) = m⊗rn =
m ⊗ f (r)n = f (r)(m ⊗ n).
(2) Let M × N → (S ⊗R M ) ⊗S N by (m, n) 7→ (1 ⊗R m) ⊗S n. We want to check this is
R-bilinear. Biadditivity is clear. For R-scaling, we have
(1 ⊗R rm) ⊗S n = (r(1 ⊗R m)) ⊗S n = (f (r)(1 ⊗R m)) ⊗S n = f (r)((1 ⊗R m) ⊗S n)
and
(1 ⊗R m) ⊗S rn = (1 ⊗R m) ⊗S f (r)n = f (r)((1 ⊗R m) ⊗S n).
Now the universal mapping property of tensor products gives an R-linear map ϕ : M ⊗R N →
(S ⊗R M ) ⊗S N where ϕ(m ⊗R n) = (1 ⊗R m) ⊗S n. This is exactly the map we were looking
for, but we only know it is R-linear so far. It is also S-linear: ϕ(st) = sϕ(t). To check this,
it suffices by additivity of ϕ to focus on the case of an elementary tensor:
ϕ(s(m ⊗R n)) = ϕ(m ⊗R sn) = (1 ⊗R m) ⊗S sn = s((1 ⊗R m) ⊗S n) = sϕ(m ⊗R n).
To show ϕ is an isomorphism, we create an inverse map (S ⊗R M ) ⊗S N → M ⊗R N . The
function S × M × N → M ⊗R N given by (s, m, n) 7→ m ⊗ sn is R-trilinear, so by Theorem
5.6 there is an R-bilinear map B : (S ⊗R M ) × N → M ⊗R N where B(s ⊗ m, n) = m ⊗ sn.
This function is in fact S-bilinear:
B(st, n) = sB(t, n), B(t, sn) = sB(t, n).
To check these equations, the additivity of both sides of the equations in t reduces us to
case when t is an elementary tensor. Writing t = s0 ⊗ m,
B(s(s0 ⊗ m), n) = B(ss0 ⊗ m, n) = m ⊗ ss0 n = m ⊗ s(s0 n) = s(m ⊗ s0 n) = sB(s0 ⊗ m, n)
and
B(s0 ⊗ m, sn) = m ⊗ s0 (sn) = m ⊗ s(s0 n) = s(m ⊗ s0 n) = sB(s0 ⊗ m, n).
42 KEITH CONRAD
Now the universal mapping property of the tensor product for S-modules tells us there is
an S-linear map ψ : (S ⊗R M ) ⊗S N → M ⊗R N such that ψ(t ⊗ n) = B(t, n).
It is left to the reader to check ϕ ◦ ψ and ψ ◦ ϕ are identity functions, so ϕ is an S-module
isomorphism.
In addition to M ⊗R N being an S-module because N is, the tensor product N ⊗R M
in the other order has a unique S-module structure where s(n ⊗ m) = sn ⊗ m, and this is
proved in a similar way.
Example 6.16. For an S-module N , let’s show Rk ⊗R N ∼ = N k as S-modules. By Theorem
5.4, R ⊗R N ∼
k
= (R ⊗R N ) ∼ k
= N as R-modules, an explicit isomorphism ϕ : Rk ⊗R N → N k
k
being ϕ((r1 , . . . , rk ) ⊗ n) = (r1 n, . . . , rk n). Let’s check ϕ is S-linear: ϕ(st) = sϕ(t). Both
sides are additive in t, so we only need to check when t is an elementary tensor:
ϕ(s((r1 , . . . , rk ) ⊗ n)) = ϕ((r1 , . . . , rk ) ⊗ sn) = (r1 sn, . . . , rk sn) = sϕ((r1 , . . . , rk ) ⊗ n).
To reinforce the S-module isomorphism
(6.3) M ⊗R N ∼= (S ⊗R M ) ⊗S N
from Theorem 6.15(2), let’s write out the isomorphism in both directions on appropriate
tensors:
m ⊗R n 7→ (1 ⊗R m) ⊗S n, (s ⊗R m) ⊗S n 7→ m ⊗R sn.
Corollary 6.17. If M and M 0 are isomorphic R-modules, and N is an S-module, then
M ⊗R N and M 0 ⊗R N are isomorphic S-modules, as are N ⊗R M and N ⊗R M 0 .
Proof. We will show M ⊗R N ∼ = M 0 ⊗R N as S-modules. The other one is similar.
Let ϕ : M → M 0 be an R-module isomorphism. To write down an S-module isomorphism
M ⊗R N → M 0 ⊗R N , we will write down an R-module isomorphism that is also S-linear.
Let M × N → M 0 ⊗R N by (m, n) 7→ ϕ(m) ⊗ n. This is R-bilinear (check!), so we get
an R-linear map Φ : M ⊗R N → M 0 ⊗R N such that Φ(m ⊗ n) = ϕ(m) ⊗ n. This is also
S-linear: Φ(st) = sΦ(t). Since Φ is additive, it suffices to check this when t = m ⊗ n:
Φ(s(m ⊗ n)) = Φ(m ⊗ sn) = ϕ(m) ⊗ sn = s(ϕ(m) ⊗ n) = sΦ(m ⊗ n).
Using the inverse map to ϕ we get an R-linear map Ψ : M 0 ⊗R N → M ⊗R N that is also
S-linear, and a computation on elementary tensors shows Φ and Ψ are inverses of each
other.
Example 6.18. We can use tensor products to prove the well-definedness of ranks of finite
free R-modules when R 6= 0. Suppose Rm ∼
= Rn as R-modules. Pick a maximal ideal m in
∼
R (Zorn’s lemma) and R/m ⊗R R = R/m ⊗R Rn as R/m-vector spaces by Corollary 6.17.
m
V ×W L
&
B
U
commute, and on account of the fact that U and V ⊗R W are already K-vector spaces you
can check that L is in fact K-linear (and is the only K-linear map that can fit into the
above commutative diagram). Two solutions of a universal mapping property are uniquely
isomorphic to each other, so V ⊗R W ∼
= V ⊗K W . More specifically, using for B the canonical
K-bilinear map V × W → V ⊗K W implies that the diagram
V 8 ⊗R W
⊗R
V ×W v⊗R w7→v⊗K w
⊗K &
V ⊗K W
TENSOR PRODUCTS 45
∼
= N ⊗S ((S ⊗R M ) ⊕ (S ⊗R M 0 )) by Theorem 6.11
∼
= (N ⊗S (S ⊗R M )) ⊕ (N ⊗S (S ⊗R M 0 )) by Theorem 5.4
∼
= (N ⊗R M ) ⊕ (N ⊗R M 0 ) by part 1 and (6.3).
Of course one needs to trace through these isomorphisms to check the overall result has the
effect intended on elementary tensors, and it does (exercise).
The last part of Theorem 6.25
L extends to arbitrary direct sums: the natural R-module
isomorphism N ⊗R i∈I Mi ∼
L
= i∈I (N ⊗R Mi ) is also an S-module isomorphism.
TENSOR PRODUCTS 47
Theorem 6.26 could also be proved by showing the S-module S ⊗R (M ⊗R M 0 ) has the
universal mapping property of (S ⊗R M ) ⊗S (S ⊗R M 0 ) as a tensor product of S-modules.
That is left as an exercise.
Corollary 6.27. For R-modules M1 , . . . , Mk ,
S ⊗R (M1 ⊗R · · · ⊗R Mk ) ∼
= (S ⊗R M1 ) ⊗S · · · ⊗S (S ⊗ Mk )
as S-modules, where s ⊗S (m1 ⊗R · · · ⊗R mk ) 7→ s((1 ⊗R m1 ) ⊗S · · · ⊗S (1 ⊗R mk )). In
particular, S ⊗R (M ⊗R k ) ∼
= (S ⊗R M )⊗S k as S-modules.
Proof. Induct on k.
Example 6.28. For a real vector space V , C ⊗R (V ⊗R V ) ∼ = (C ⊗R V ) ⊗C (C ⊗R V ).
The middle tensor product sign on the right is over C, not R. Note that C ⊗R (V ⊗R V ) ∼
6 =
(C ⊗R V ) ⊗R (C ⊗R V ) when V 6= 0, as the two sides have different dimensions over R
(what are they?).
The base extension M S ⊗R M turns R-modules into S-modules in a systematic way.
So does M M ⊗R S, and this is essentially the same construction. This suggests there
should be a universal mapping problem about R-modules and S-modules that is solved by
base extension, and there is: it is the universal device for turning each R-linear map from
M to an S-module into an S-linear map of S-modules.
48 KEITH CONRAD
Theorem 6.29. Let M be an R-module. For every S-module N and R-linear map ϕ : M →
N , there is a unique S-linear map ϕS : S ⊗R M → N such that the diagram
m7→1⊗m
M / S ⊗R M
ϕ ϕS
z
N
commutes.
This says the single R-linear map M → S ⊗R M from M to an S-module explains all
other R-linear maps from M to S-modules using composition of it with S-linear maps from
S ⊗R M to S-modules.
Proof. Assume there is such an S-linear map ϕS . We will derive a formula for it on elemen-
tary tensors:
ϕS (s ⊗ m) = ϕS (s(1 ⊗ m)) = sϕS (1 ⊗ m) = sϕ(m).
This shows ϕS is unique if it exists.
To prove existence, consider the function S × M → N by (s, m) 7→ sϕ(m). This is R-
bilinear (check!), so there is an R-linear map ϕS : S⊗R M → N such that ϕS (s⊗m) = sϕ(m).
Using the S-module structure on S ⊗R M , ϕS is S-linear.
For ϕ in HomR (M, N ), ϕS is in HomS (S ⊗R M, N ). Because ϕS (1 ⊗ m) = ϕ(m), we can
recover ϕ from ϕS . But even more is true.
Theorem 6.30. Let M be an R-module and N be an S-module. The function ϕ 7→ ϕS is
an S-module isomorphism HomR (M, N ) → HomS (S ⊗R M, N ).
How is HomR (M, N ) an S-module? Values of these functions are in N , which is an S-
module, so S turns scales each function M → N to a new function M → N by just scaling
the values.
Proof. For ϕ and ϕ0 in HomR (M, N ), (ϕ + ϕ0 )S = ϕS + ϕ0S and (sϕ)S = sϕS by checking
both sides are equal on all elementary tensors in S ⊗R M . Therefore ϕ 7→ ϕS is S-linear.
Its injectivity is discussed above (ϕS determines ϕ).
For surjectivity, let h : S ⊗R M → N be S-linear. Set ϕ : M → N by ϕ(m) = h(1 ⊗ m).
Then ϕ is R-linear and ϕS (s ⊗ m) = sϕ(m) = sh(1 ⊗ m) = h(s(1 ⊗ m)) = h(s ⊗ m), so
h = ϕS since both are additive and are equal at all elementary tensors.
The S-module isomorphism
(6.4) HomR (M, N ) ∼
= HomS (S ⊗R M, N )
should be thought of as analogous to the R-module isomorphism
(6.5) HomR (M, HomR (N, P )) ∼
= HomR (M ⊗R N, P )
from Theorem 5.7, where − ⊗R N is left adjoint to HomR (N, −). (In (6.5), N and P are
R-modules, not S-modules! We’re using the same notation as in Theorem 5.7.) If we look
at (6.4), we see S ⊗R − is applied to M on the right but nothing special is applied to N
on the left. Yet there is something different about N on the two sides of (6.4). It is an
S-module on the right side of (6.4), but on the left side it is being treated as an R-module
(restriction of scalars). That changes N , but we have introduced no notation to reflect this.
We still just write it as N . Let’s now write ResS/R (N ) to denote N as an R-module. It is
TENSOR PRODUCTS 49
the same underlying additive group as N , but the scalars are now taken from R with the
rule rn = f (r)n. The appearance of (6.4) now looks like this:
M ⊗R ResS/R (N ) ∼
= (S ⊗R M ) ⊗S N,
Theorem 6.31. Let M be an R-module and N and P be S-modules. There is an S-module
isomorphism
HomS (M ⊗R N, P ) ∼
= HomR (M, ResS/R (HomS (N, P ))).
Example 6.32. Taking N = S, so M ⊗R N = M ⊗R S ∼
= S ⊗R M ,
HomS (S ⊗R M, P ) ∼
= HomR (M, ResS/R (P ))
HomR (M ⊗R N, P ) ∼
= HomR (M, HomR (N, P )).
These two consequences of Theorem 6.31 are results we have already seen, and in fact we
are going to use them in the proof, so they are together equivalent to Theorem 6.31.
Proof. Since M ⊗R N ∼
= (S ⊗R M ) ⊗S N as S-modules,
HomS (M ⊗R N, P ) ∼
= HomS ((S ⊗R M ) ⊗S N, P ).
HomS ((S ⊗R M ) ⊗S N, P ) ∼
= HomS (S ⊗R M, HomS (N, P )).
HomS (M ⊗R N, P ) ∼
= HomR (M, ResS/R (HomS (N, P ))).
7. Tensors in Physics
In physics and engineering, tensors are often defined not in terms of multilinearity, but
by the way tensors look in different coordinate systems. Here is a definition of a tensor
that can be found (more or less) in many physics textbooks. Let V be a vector space25
with dimension n ≥ 1. A tensor of rank 0 on V is a scalar. For k ≥ 1, a contravariant
tensor of rank k 26 (on V ) is an object T with nk components in every coordinate system
of V such that if {T i1 ,...,ik }1≤i1 ,...,ik ≤n and {Tei1 ,...,ik }1≤i1 ,...,ik ≤n are the components of T in
two coordinate systems of V then
X
(7.1) Tei1 ,...,ik = T j1 ,...,jk ai1 j1 · · · aik jk ,
1≤j1 ,...,jk ≤n
where (aij ) is the matrix is the matrix expressing the first coordinate system of V in terms
of the second. In short, a contravariant tensor of rank k is a “quantity that transforms by
the rule (7.1).”
What is being described here, with components, is just an element of V ⊗k . To see this,
note that a coordinate system means a choice of a basis of V . For each basis27 {e1 , . . . , en }
of V , in which T has components {T i1 ,...,ik }1≤i1 ,...,ik ≤n , make these numbers the coefficients
of the basis {ei1 ⊗ · · · ⊗ eik } of V ⊗k :
X
T i1 ,...,ik ei1 ⊗ · · · ⊗ eik .
1≤i1 ,...,ik ≤n
This belongs to V ⊗k . Let’s express P this sum in terms of a second basis (“coordinate system”)
{f1 , . . . , fn } of V . Writing ej = ni=1 aij fi , the above sum equals, after a notational change,
X
T j1 ,...,jk ej1 ⊗ · · · ⊗ ejk
1≤j1 ,...,jk ≤n
n n
!
X X X
= T j1 ,...,jk ai1 j1 fi1 ⊗ ··· ⊗ aik jk fik
1≤j1 ,...,jk ≤n i1 =1 ik =1
X X
= T j1 ,...,jk ai1 j1 · · · aik jk fi1 ⊗ · · · ⊗ fik
1≤i1 ,...,ik ≤n 1≤j1 ,...,jk ≤n
X
= Tei1 ,...,ik fi1 ⊗ · · · ⊗ fik by (7.1).
1≤i1 ,...,ik ≤n
So in physics, the components of a contravariant rank k tensor on V are the coefficients of
an element of V ⊗k in some basis28 of V ⊗k . In physics, dim V is usually 3 or 4.
Switching from tensor powers of V to tensor powers of its dual space V ∨ , we now want
to compare the representations of an element of (V ∨ )⊗` in coordinate systems built from
the two dual bases e∨ ∨ ∨ ∨ ∨
1 , . . . , en and f1 , . . . , fn of V . The formula we find will be similar to
(7.1), but with a crucial change.
25Physicists are interested only in real or complex vector spaces.
26This meaning of the term “rank of a tensor” as the number of indices is unrelated to the meaning of
“rank of a tensor” near the end of Section 5 in terms of a sum of elementary tensors.
27We really should speak of an ordered basis of V , since e ⊗ e 6= e ⊗ e .
1 2 2 1
28Strictly speaking we aren’t using every possible basis of V ⊗k but only bases of V ⊗k built as k-fold
elementary tensors from a basis of V .
TENSOR PRODUCTS 51
To align calculations with how they’re done in physics and differential geometry, from now
on write the dual bases of {e1 , . . . , en } and {f1 , . . . , fn } as {e1 , . . . , en } and {f 1 , . . . , f n },
not {e∨ ∨ ∨ ∨ i i
1 , . . . , en } and {f1 , . . . , fn }. So e (ej ) = f (fj ) = δij for all i and j.
For a basis {e1 , . . . , eP
n } of V and its dual basis {e1 , . . . , en } in V ∨ , general elements of V
and V ∨ are written as i=1 ci ei and ni=1 ci ei , respectively. A basis of V always has lower
n P
indices and its coefficients have upper indices, while a basis of V ∨ always has upper indices.
and its coefficients have lower indices. See the table below.
Space Basis Coeff. Element
ci
P i
V ei P c eii
V∨ ei ci ci e
The convention of lower indices for a basis of V and for coefficients of a basis in V ∨
and upper indices for a basis of P V ∨ and for coefficients of a basis in V is consistent since
the coefficients a of the vector ni=1 ai ei in V are the values of e1 , . . . , en on this vector:
i
coefficients of a basis are coordinate functions, and coordinate functions of a basis of V lie
in V ∨ while, by duality, coordinate functions of a basis of V ∨ lie in (V ∨ )∨ ∼ =V.
∨ ⊗`
Pick a mathematician’s tensor T ∈ (V ) and write it in the basis {e ⊗ · · · ⊗ ei` } as
i 1
X
(7.2) T= Ti1 ,...,i` ei1 ⊗ · · · ⊗ ei` ,
1≤i1 ,...,i` ≤n
where the coefficients have lower indices, not upper indices, to be consistent with the idea
that this is a dual object (lies in a tensor power of V ∨ ). To express T in terms of the second
basis {f i1 ⊗ · · · ⊗ f i` } of (V ∨ ⊗` j i ∨
Pn) , we want to express the e ’s in terms of the f ’s in V .
We already wrote ej = i=1 aij fi in V for all j, and it turns out that
n
X n
X
(7.3) ej = aij fi for all j =⇒ f j = aji ei for all j.
i=1 i=1
Indeed, the coefficient of ei is f j (ei ) = f j ( k aki fk ) = aji . We see transposed matrix entries
P
(aji ) on the right side of (7.3) in an essential way: j in (7.3) is the second index of aij and
the first index of aji . It is a fact of life that passing to the dual space involves a transpose.
Alas, (7.3) gives a change of basis formula in V ∨ for f j ’s in terms of ei ’s, which is not the
direction we need to transform the right side of (7.2) to a sum involving f i ’s: we want the
ej ’s in terms of the f i ’s, not the f j ’s in terms of the ei ’s. So we need an inverse on top of
the transposing.
Writing the inverse of the matrix (aij ) as (aij ), the following table summarizes how a
change of basis matrix changes to describe a related change of basis.
Space Start End Matrix
V f1 , . . . , fn e1 , . . . , en (aij ) – definition
V e1 , . . . , en f1 , . . . , fn (aij ) = (aij )−1
V ∨ e1 , . . . , en f 1 , . . . , f n (aji ) = (aij )>
V ∨ f 1 , . . . , f n e1 , . . . , en (aji ) = (aij )>
The third row of the table is (7.3) and the second and fourth rows of the table say
n
X n
X n
X
ij j
(7.4) ej = aij fi for all j =⇒ fj = a ei , and e = aji f i for all j.
i=1 i=1 i=1
52 KEITH CONRAD
Example 7.1. Let dim(V ) = 2 with bases {e1 , e2 } and {f1 , f2 }. Suppose that
e1 = f1 + 2f2 and e2 = 3f2 .
It follows by simple algebra that
2 1
f1 = e1 − e2 and f2 = e2 .
3 3
P2 P2
Write these as ej = i=1 aij fi and fj = i=1 aij ei for
1 0 ij 1 0
(7.5) (aij ) = and (a ) = ,
2 3 −2/3 1/3
which are inverses.
In V ∨ , to write {f 1 , f 2 } in terms of {e1 , e2 } set f 1 = ae1 + be2 . Evaluating both sides at
f1 and f2 we have 1 = ae1 (f1 ) + be2 (f1 ) = a + b(−2/3) and 0 = ae1 (f2 ) + be2 (f2 ) = b(1/3).
Thus b = 0 and a = 1, so f 1 = e1 . Similarly, f 2 = 2e1 + 3e2 and we can solve for e1 and e2 :
2 1
f 1 = e1 and f 2 = 2e1 + 3e2 =⇒ e1 = f 1 and e2 = − f 1 + f 2 .
3 3
Then f j = 2i=1 aji ei and ej = 2i=1 aji f i , with coefficients forming transposed matrices
P P
to (7.5).
The change of basis we need for (7.2) is ej ’s in terms of f i ’s, so isolate that part of (7.4):
n
X n
X
j
(7.6) ej = aij fi for all j =⇒ e = aji f i for all j.
i=1 i=1
As a reminder, (aji )
is the transpose of the inverse of the matrix (aij ).
Returning to (7.2),
X
T = Tj1 ,...,j` ej1 ⊗ · · · ⊗ ej`
1≤j1 ,...,j` ≤n
n n
!
X X X
= Tj1 ,...,j` aj1 i1 f i1 ⊗ ··· ⊗ aj` i` f i` by (7.6)
1≤j1 ,...,j` ≤n i1 =1 i` =1
X X
= Tj1 ,...,j` aj1 i1 · · · aj` i` f i1 ⊗ · · · ⊗ f i` .
1≤i1 ,...,i` ≤n 1≤j1 ,...,j` ≤n
when passing from the basis {e1 , . . . , en } to the basis {f 1 , . . . , f n } of V ∨ , is called a covariant
tensor of rank `. This is just an element of (V ∨ )⊗` , and (7.7) explains operationally how
different coordinate representations of this tensor are related to one another.
The rules (7.1) for components in V ⊗k and (7.7) for components in (V ∨ )⊗` are different,
and not just on account of the convention about indices being upper on tensor components
in (7.1) and lower on tensor components in (7.7). If we place (7.1) and (7.7) side by side
TENSOR PRODUCTS 53
and, to avoid being distracted by tensor index notational conventions, we temporarily make
all tensor-component indices lower and give the tensor components the same number of
indices (` = k, so we are in V ⊗k and (V ∨ )⊗k ), we obtain this:
X X
Tei1 ,...,ik = Tj1 ,...,jk ai1 j1 · · · aik jk , Tei1 ,...,ik = Tj1 ,...,jk aj1 i1 · · · ajk ik .
1≤j1 ,...,jk ≤n 1≤j1 ,...,jk ≤n
We did not lower the indices of aji in the second sum because its indices reflect something
serious: (aij ) is the matrix expressing a change of coordinates in V and (aji ) is the matrix
expressing the dual change of coordinates in V ∨ in the same direction (see (7.6)). The use of
aij or aji is the difference between the transformation rules in tensor powers of V and tensor
powers of V ∨ . Both of the transformation rules involve a multilinear change of coordinates
(as evidenced by the multiple products in the sums), but in the first rule the summation
indices appear in the multipliers air jr as the second index, while in the second rule the
summation indices appear in the multipliers ajr ir as the first index. This swap happens
because physicists always start a change of basis in V , and passing to the effect in V ∨
necessitates a transpose (and inverse). The reason for systematically using upper indices
on tensor components satisfying (7.1) and lower indices on tensor components satisfying
(7.7) is to know at a glance (with experience) what type of transformation rule the tensor
components will satisfy under a change in coordinates.
Here is some terminology about tensors that is used by physicists.
• A contravariant tensor of rank k, which is an indexed quantity T i1 ...ik that transforms
by (7.1), is also called a tensor of rank k with upper indices (easier to remember!).
• A covariant tensor of rank `, which is an indexed quantity Tj1 ...j` that transforms
by (7.7), is also called a tensor of rank ` with lower indices.
• An indexed quantity Tji11...j ...ik
`
that transforms by the rule
X
(7.8) Teji11...j
...ik
`
= Tqp11...q
...pk
`
ai1 p1 · · · aik pk aq1 j1 · · · aq` j`
1≤p1 ,...,pk ≤n
1≤q1 ,...,q` ≤n
is called a tensor of type (k, `) and rank k + `. This “quantity” is just an element of
V ⊗k ⊗ (V ∨ )⊗` written in terms of elementary tensor product bases produced from
a basis of V (check!). For instance, elements of V ⊗ V , V ⊗ V ∨ , and V ∨ ⊗ V ∨ are
rank 2 tensors. An element of V ⊗2 ⊗ V ∨ has rank 3 and its components are Tji1 i2 .
If we permute the order of the spaces in the tensor product from the conven-
tional “first every V , then every V ∨ ,” then the indexing rule on tensors needs to be
adapted: V ⊗ V ∨ ⊗ V is not the same space as V ⊗ V ⊗ V ∨ , so we shouldn’t write
its tensor components as Tji1 i2 . Write them as T i1j i2 , so that as we read indices
from left to right we see each index in the order its corresponding space appears in
V ⊗ V ∨ ⊗ V : upper indices for V and lower indices for V ∨ .
Example 7.2. To compare transformation rules for rank 2 tensors in V ⊗2 (type (2,0)),
(V ∨ )⊗2 (type (0,2)), and V ⊗ V ∨ (type (1,1)), let bases P {e1 , . . . , en } and {f1 , . . . , fn } of V
be related by numbers aij as in (7.3) and (7.4): ej = i aij fi for all j.
Case 1: (2,0)-tensors. By (7.1), in V ⊗2 we have j1 ,j2 T j1 j2 ej1 ⊗ej2 = i1 ,i2 Tei1 i2 fi1 ⊗fi2
P P
where
X
(7.9) Tei1 i2 = T j1 j2 ai j ai j .
1 1 2 2
j1 ,j2
54 KEITH CONRAD
with the matrix (aij ) being the inverse of (aij ), so ej = i aji f i for all j by (7.4).
P
Case 3: (1,1)-tensors. In V ⊗ V ∨ , j1 ,j2 Tjj21 ej1 ⊗ ej2 = i1 ,i2 Teii21 fi1 ⊗ f i2 where
P P
X j
(7.11) Teii21 = Tj21 ai1 j1 aj2 i2 .
j1 ,j2
The n2 components of such tensors relative to the basis {e1 , . . . , en } can be put into an
n × n matrix (T ij ), (Tij ), or (Tji ). We can rewrite (7.9), (7.10), and (7.11) so the sums on
the right look like formulas from multiplying 3 matrices:
ai1 j1 Tjj21 aj2 i2 .
X X X
Tei1 i2 = ai1 j1 T j1 j2 ai2 j2 , Tei1 i2 = aj1 i1 Tj1 j2 aj2 i2 , Teii21 =
j1 ,j2 j1 ,j2 j1 ,j2
By how indices in these sums repeat, the matrix of components of a tensor of rank 2 trans-
form as indicated in the table below.
Type Transformation Rule
(2,0) (T ij ) = (aij )(T ij )(aij )>
e
(0,2) (Teij ) = (aij )> (Tij )(aij )
(1,1) (Tei ) = (aij )(T i )(aij )−1
j j
The (0,2) case is how the matrix representation of a bilinear form changes after a change
of basis and the (1,1) case is how the matrix representation of a linear map of a vector
space to itself changes after a change of basis. This is why a bilinear form (such as an inner
product or a spacetime metric in relativity) is a (0,2)-tensor and a linear map of a vector
space to itself is a (1,1)-tensor. We saw the interpretation of (1,1)-tensors as linear maps
before, without coordinates: from Example 5.11, V ⊗ V ∨ ∼ = Hom(V, V ). Warning: the
“rank” of a linear map V → V (the dimension of its image) has nothing to do with its
“rank” in the above sense as a tensor in V ⊗ V ∨ , which is always 2.
While V and V ∨ are not literally the same, they are isomorphic. If we fix an isomorphism
between them and use it everywhere to replace V ∨ with V then the different spaces of rank
2 tensors can all be made to look like V ⊗2 , a process called “raising indices” since it turns
Tij and Tji into T ij . This is done very often in geometry and physics since Rn is treated as
isomorphic to its dual space using the standard dot product to identify (Rn )∨ with Rn .
Let’s compare how the mathematician and physicist think about a tensor:
• (Mathematician) Tensors belongs to a tensor space, which is a module – or more
often in geometry a vector space – defined by a multilinear universal mapping prop-
erty.
• (Physicist) “Tensors are systems of components organized by one or more indices
that transform according to specific rules under a set of transformations.”29
In a tensor product of vector spaces, mathematicians and physicists can check two tensors
t and t0 are equal in the same way: check t and t0 have the same components in one
coordinate system. (Physicists don’t deal with modules that aren’t vector spaces, so they
always have bases available.) The reason mathematicians and physicists consider this to be
29G. B. Arfken and H. J. Weber, Mathematical Methods for Physicists, 6th ed., p. 136.
TENSOR PRODUCTS 55
a sufficient test of equality is not the same. The mathematician thinks about the condition
t = t0 in a coordinate-free way and knows that to check t = t0 it suffices to check t and t0
have the same coordinates in one basis. The physicist considers the condition t = t0 to mean
(by definition!) that the components of t and t0 match in all coordinate systems, and the
multilinear transformation rule (7.7), or (7.8), on tensors implies that if the components of
t and t0 are equal in one coordinate system then they are equal in every coordinate system.
That’s why the physicist is content to look in just one coordinate system.
Consider Einstein’s description of tensors in a paper on general relativity [4, p. 157]:30
Let certain things (“tensors”) be defined with respect to any system of co-
ordinates by a number of functions of the co-ordinates, called the “compo-
nents” of the tensor. There are then certain rules by which these components
can be calculated for a new system of co-ordinates, if they are known for the
original system of co-ordinates, and if the transformation connecting the two
systems is known. The things hereafter called tensors are further character-
ized by the fact that the equations of transformation for their components
are linear and homogeneous. Accordingly, all the components in the new
system vanish, if they all vanish in the original system.
Einstein’s “linear and homogeneous” equations is what we call “multilinear” equations.
An operation on tensors (like the flip v ⊗ w 7→ w ⊗ v on V ⊗2 ) is checked to be well-
defined by the mathematician and physicist in different ways. The mathematician checks
the operation respects the universal mapping property that defines tensor products, while
the physicist checks the explicit formula for the operation on elementary tensors changes
in different coordinate systems by the tensor transformation rule (like (7.1)). The physicist
would say an operation on tensors makes sense because it transforms “tensorially” (like
a tensor), which in more expansive terms means that the formulas for the operation in
two different coordinate systems are related by a multilinear change of variables. However,
textbooks on classical mechanics and quantum mechanics that treat tensors don’t seem
to use the word “multilinear,” even though that word describes exactly what is going on.
Instead, these textbooks nearly always say that a tensor’s components transform by a
“definite rule” or a “specific rule,” which doesn’t seem to have an actual meaning; isn’t
every computational rule a specific rule? Graduate textbooks on general relativity are an
exception to this habit: [3], [16], and [26] all define tensors in terms of multilinearity.31
Mathematicians and students in mathematics may be baffled about how physicists can
think about tensors just in terms of components. Conceptual definitions in mathematics are
very nice, but the ugly component viewpoint of tensors is not only crucial to understanding
how tensors show up in physics, but they are also how tensors were handled in mathematics
through the first part of the 20th century32. Hassler Whitney, who we mentioned at the
end of Section 3 as the first person to extend tensor products from vector spaces to abelian
groups, expressed his frustration as follows [28, p. 114]: “I had to handle tensors; but how
could I when I was not permitted to see them, being only allowed to learn about their
changing costumes under changes of coordinates? I had somehow to grab the rascals, and
look straight at them.”
30See https://einsteinpapers.press.princeton.edu/vol6-trans/169.
31I thank Don Marolf for bringing this point to my attention.
32The earliest reference I know that describes tensors using multilinearity instead of components with a
transformation rule is [21, pp. 179], from 1923, which confuses V with its dual space.
56 KEITH CONRAD
The physical meaning of a vector is not just displacement, but linear displacement. For
instance, forces at a point combine in the same way that vectors add (this is an experi-
mental observation), so force is treated as a vector. The physical meaning of a tensor is
multilinear displacement.33. That means a quantity (mathematical or physical) whose de-
scription transforms under a change of coordinates in the same way as the components of
a tensor can be mathematically described as that type of tensor. Some examples of this in
physics, including the physical reasoning behind the effect of a change of coordinates, are
in [6, Chap. 31].
Example 7.3. The most basic example of a rank-2 tensor in mechanics is the stress tensor.
When a force is applied to a body the stress it imparts at a point may not be in the
direction of the force but in some other direction (compressing a piece of clay, say, can push
it out orthogonally to the direction of the force), and this effect is linear in the input, so
stress at a point is described by a linear transformation, and thus is a rank-2 tensor since
End(V ) ∼= V ∨ ⊗ V (Example 5.11). Since stress from an applied force can act in different
directions at different points, the stress tensor is not really a single tensor but rather is a
varying family of tensors at different points: stress is a tensor field, which is a generalization
of a vector field. A tensor field on a manifold M can be defined locally or globally:
• (Locally) A (k, `)-tensor field on M is a choice of element in each tensor product
Tp (M )⊗k ⊗R Tp (M )∨ ⊗` of tangent and cotangent spaces at the points p on M ,
varying smoothly with p,
• (Globally) A (k, `)-tensor field on M is an element of X(M )⊗k ⊗C ∞ (M ) X(M )∨ ⊗`
where X(M ) is the set of all vector fields on M viewed as a module over the ring
C ∞ (M ) of smooth functions on M [14, Sect. 7.2 ,7.3].
Tensors in many parts of physics (classical mechanics, electromagnetism, and relativity)
are always part of a tensor field, and in physics the word “tensor” often means “tensor
field”. A change of variables between local coordinate systems x = {xi } and y = {y i } in a
∂y i ∂xi
region of Rn involves partial derivatives ∂x j or (in the reverse direction) ∂y j : by the chain
P ∂yi ∂ P ∂xi ∂
rule, ∂x∂ j = i ∂x ∂
j ∂y i and ∂y j = i ∂y j ∂xi . Tensor transformation rules when working
∂y i i
∂x
with tensor fields occur with ∂x j and ∂y j , which vary from point to point, in the role of aij
and aij . For example, a tensor of rank 2 with upper indices is a doubly-indexed quantity
T ij (x) in each coordinate system x at a point, such that in a second coordinate system y at
the same point its components are
n
ij
X ∂y i ∂yj
(7.12) T (y) =
e T k` (x) k (x) ` (x),
∂x ∂x
k,`=1
3 (tensors first appear 42 minutes in, although some notation is introduced earlier) and
lecture 4. In lecture 5 tensor calculus (covariant differentiation of tensor fields) is introduced.
Physicists and engineers who think of tensors in terms of their components will say that
a physical law described with tensors is independent of coordinates because such a law in
one coordinate system has a similar description in any other coordinate system. What
often goes unmentioned is that these physical laws are multilinear relations among tensor
components, which is the reason for independence of coordinates since the components of
a tensor transform multilinearly when coordinates change. I have seen some references for
physicists or engineers go as far as to assert the converse: a physical law that is independent
of coordinates must be expressible in terms of tensors, but that’s wrong: tensor fields are not
the only concept in geometry that is independent of coordinates. For example, connections
and spinor fields are geometric structures on (Riemannian) manifolds that are not tensors.
Tensors play an essential role in quantum mechanics, but for rather different reasons
than we’ve already mentioned in physics. In classical mechanics, the states of a system
are modeled by the points on a finite-dimensional manifold, and when we combine two
systems the corresponding manifold is the direct product of the manifolds for the original
two systems. The states of a quantum system, on the other hand, are represented by
the nonzero vectors (really, the 1-dimensional subspaces) in a complex Hilbert space, such
as L2 (R6 ). (A point in R6 has three position and three momentum coordinates, which
is the classical description of a particle.) When we combine two quantum systems, its
corresponding Hilbert space is the tensor product of the original Hilbert spaces, essentially
because L2 (R6 × R6 ) = L2 (R6 ) ⊗C L2 (R6 ), which is the analytic34 analogue of R[X, Y ] ∼ =
R[X] ⊗R R[Y ]. Thus quantum states are related to tensors in a single tensor product of
Hilbert spaces, not to tensor fields. A video of a physicist introducing tensor products of
Hilbert spaces on YouTube is Frederic Schuller’s lecture 14 on quantum mechanics, where
he writes an elementary tensor as v w rather than v ⊗ w to avoid confusion with the use
of ⊗ in the notation of the vector space H1 ⊗C H2 .
The difference between a direct product of manifolds M ×N and a tensor product of vector
spaces H1 ⊗C H2 reflects mathematically some of the non-intuitive features of quantum
mechanics. Every point in M × N is a pair (x, y) where x ∈ M and y ∈ N , so we get
a direct link from a point in M × N to something in M and something in N . On the
other hand, most tensors in H1 ⊗C H2 are not elementary, and a non-elementary tensor
in H1 ⊗C H2 has no simple-minded description in terms of a pair of elements of H1 and
H2 . Quantum states in H1 ⊗C H2 that correspond to non-elementary tensors are called
entangled states, and they reflect the difficulty of trying to describe quantum phenomena
for a combined system (e.g., the two-slit experiment) in terms of quantum states of the
two original systems individually. I’ve been told that physics students who get used to
computing with tensors in relativity by learning to work with the “transform by a definite
rule” description of tensors find the role of tensors in quantum mechanics to be difficult to
learn, because the conceptual role of tensors there is so different. And probably it doesn’t
help students that physicists use “tensor” to mean both tensor fields and tensors.
Whether you want to think of tensors as objects in a space having a universal mapping
property, indexed quantities that satisfy a transformation rule, or a physical interpretation,
34This tensor product should be completed, having infinite sums of products f (x)g(y). There are
more subtleties. See http://www-users.math.umn.edu/∼garrett/m/v/nonexistence tensors.pdf and
https:// math.stackexchange.com/questions/2951879.
58 KEITH CONRAD
the advice of user mathwonk on the website PhysicsForums35 is worth remembering: It does
not matter what a “tensor” is, what matters is knowing what you are doing.
We’ll end this discussion of tensors in physics with a story. I was the math consultant
for the 4th edition of the American Heritage Dictionary of the English Language (2000).
The editors sent me all the words in the 3rd edition with mathematical definitions, and I
had to find and correct the errors. Early on I came across a word I had never heard of
before: dyad. It was defined in the 3rd edition as “an operator represented as a pair of
vectors juxtaposed without multiplication.” That’s a ridiculous definition, as it conveys no
meaning at all. I obviously had to fix this definition, but first I had to know what the word
meant! In a physics book36 a dyad is defined as “a pair of vectors, written in a definite
order ab.” This is just as useless, but the physics book also does something with dyads,
which gives a clue about what they really are. The product of a dyad ab with a vector c is
a(b · c), where b · c is the usual dot product (a, b, and c are all vectors in Rn ). This reveals
what a dyad is. Do you see it? Dotting with b is an element of the dual space (Rn )∨ , so
the effect of ab on c is reminiscient of the way V ⊗ V ∨ acts on V by (v ⊗ ϕ)(w) = ϕ(w)v.
A dyad is the same thing as an elementary tensor v ⊗ ϕ in Rn ⊗ (Rn )∨ . In the 4th edition
of the dictionary, I included two definitions for a dyad. For the general reader, a dyad is
“a function that draws a correspondence37 from any vector u to the vector (v · u)w and
is denoted vw, where v and w are a fixed pair of vectors and v · u is the scalar product
of v and u. For example, if v = (2, 3, 1), w = (0, −1, 4), and u = (a, b, c), then the dyad
vw draws a correspondence from u to (2a + 3b + c)w.” The more concise second definition
was: a dyad is “a tensor formed from a vector in a vector space and a linear functional
on that vector space.” Unfortunately, the definition of “tensor” in the dictionary is “A set
of quantities that obey certain transformation laws relating the bases in one generalized
coordinate system to those of another and involving partial derivative sums. Vectors are
simple tensors.” That is really the definition of a tensor field, and that sense of the word
tensor is incompatible with my concise definition of a dyad in terms of tensors.
More general than a dyad is a dyadic, which is a sum of dyads: ab+cd+. . . . So a dyadic
is a general tensor in Rn ⊗R (Rn )∨ ∼ = HomR (Rn , Rn ). In other words, a dyadic is an n × n
real matrix. The terminology of dyads and dyadics goes back to Gibbs [7, Chap. 3], who
championed the development of linear and multilinear algebra, including his indeterminate
product (that is, the tensor product), under the name “multiple algebra.”
References
[1] D. V. Alekseevskij, V. V. Lychagin, A. M. Vinogradov, “Geometry I,” Springer-Verlag, Berlin, 1991.
[2] N. Bourbaki, Livre II Algèbre Chapitre III (état 4) Algèbre Multilinéaire, http://archives-bourbaki.
ahp-numerique.fr/files/original/97a9fed708bdde4dc55547ab5a8ff943.pdf.
[3] S. Carroll, “Spacetime and Geometry: An Introduction to General Relativity,” Benjamin Cummings,
2003.
[4] A. Einstein, The Foundation of the General Theory of Relativity (English translation), pp. 146–199 in
“The Collected Papers of Albert Einstein, Vol. 6: The Berlin Years: Writings, 1914-1917,” Princeton
Univ. Press, Princeton, 1997. URL https://einsteinpapers.press.princeton.edu/vol6-trans/158.
[5] D. Eisenbud and J. Harris, “The Geometry of Schemes”, Springer-Verlag, New York, 2000.
[6] R. P. Feynman, Chap. 31 of “The Feynman Letures on Physics, Vol. II,” Millennium edition, http://www.
feynmanlectures.caltech.edu/II 31.html.
35See https://www.physicsforums.com/threads/christoffel-symbol-as-tensor.40177/.
36H. Goldstein, Classical Mechanics, 2nd ed., p. 194
37Yes, this terminology sucks. Blame the unknown editor at the dictionary for that one.
TENSOR PRODUCTS 59
[7] J. W. Gibbs, “Elements of Vector Analysis Arranged for the Use of Students in Physics,” Tuttle, More-
house & Taylor, New Haven, 1884. URL https://archive.org/details/elementsvectora00gibb.
[8] J. W. Gibbs, On Multiple Algebra, Proceedings of the American Association for the Advancement of
Science, 35 (1886). URL http://archive.org/details/onmultiplealgeb00gibbgoog.
[9] R. Grone, Decomposable Tensors as a Quadratic Variety, Proc. Amer. Math. Soc. 64 (1977), 227–230.
[10] P. Halmos, “Finite-Dimensional Vector Spaces,” Springer-Verlag, New York, 1974.
[11] J. Håstad, Tensor rank is NP-complete, J. Algorithms 11 (1990), 644–654.
[12] C. J. Hillar and L-K. Lim, Most tensor problems are NP-hard, J. ACM 60 (2013), Art. 45, 39 pp.
[13] J. Kun, How to Conquer Tensorphobia, https://jeremykun.com/2014/01/17/how-to-conquer-tensor
phobia/.
[14] J. M. Lee, “Manifolds and Differential Geometry,” Amer. Math. Soc., Providence, 2009.
[15] R. Hermann, “Ricci and Levi-Civita’s Tensor Analysis Paper” Math Sci Press, Brookline, 1975.
[16] C. W. Misner, K. S. Thorne, and J. A. Wheeler, “Gravitation,” W. H. Freeman and Co., San Francisco,
1973.
[17] F. J. Murray and J. von Neumann, On Rings of Operators, Annals Math. 37 (1936), 116–229.
[18] B. O’Neill, “Semi-Riemannian Geometry,” Academic Press, New York, 1983.
[19] B. Osgood, Chapter 8 of The Fourier Transform and its Applications, http://see.stanford.edu/
materials/lsoftaee261/chap8.pdf.
[20] H-J Petsche, “Hermann Grassmann – Biography,” Birkhäuser, Basel, 2009.
[21] G. Y. Rainich, Tensor analysis without coordinates, Proc. Natl. Acad. Sci. USA 9 (1923), 179-183.
[22] G. Ricci, T. Levi-Civita, Méthodes de Calcul Différentiel Absolu et Leurs Applications, Math. Annalen
54 (1901), 125–201.
[23] Y. Shitov, How hard is the tensor rank?, https://arxiv.org/pdf/1611.01559.pdf.
[24] F. J. Temple, “Cartesian Tensors,” Wiley, New York, 1960.
[25] W. Voigt, Die fundamentalen physikalischen Eigenschaften der Krystalle in elementarer Darstellung,
Verlag von Veit & Comp., Leipzig, 1898.
[26] R. Wald, “General Relativity,” Univ. Chicago Press, Chicago, 1984.
[27] H. Whitney, Tensor Products of Abelian Groups, Duke Math. Journal 4 (1938), 495–528.
[28] H. Whitney, Moscow 1935: topology moving toward America, pp. 97–117 in “A Century of Mathematics
in America, Part I,” Amer. Math. Soc., Providence, 1988.